Back to Community
Pre-processing in algo API. Can it be done?

Hello,

I'll be honest that I'm new to Quantopian and wanted to (shamelessly) use it to teach myself Python with no real regard for trying to find a winning algo. I work in credit risk modelling so know little to nothing about equities. However I got the bug, a single idea, so thought I'd give it a go. But now I think I'm stuck.

Within Research I've written code to build a dataset of history, which is then periodically updated (say every week, month, etc - not really decided yet), and then "mined" to find correlated, cointegrated, and multivariately normally distributed groups of securities that appear mispriced. The idea is that they will converge. I was hoping to copy this approach to the Algo API where there would be one big bit of pre-processing at the outset, appending data each week/month (as correlation, cointegration and multivariate normality testing results will change periodically), and mining for mispricing each day. Given the periodic appending and reprocessing I was hoping it would update itself ala Machine Learning.

However, from a look in tutorials, it looks like my Research pipelines employing start and end dates won't work in Algo API so I'm a bit stuck. Have I missed something quite simple or have I properly snookered myself (for those who don't like snooker read - screwed :)).

I've attached my research code (note at the end long=short and short=long - it was late at night [I've got a newborn and a 1 year old who seldom sleep] and I forgot to reverse the labels). Sorry it isn't pretty but hopefully highlights how there is a lot of data mining that I hope to automatically refresh (so machine learning I think) that I want to copy to ALGO.

Any help would be greatly appreciated.

Loading notebook preview...
Notebook previews are currently unavailable.
6 responses

Yeah.. to my knowledge, there's not a "missing semi-colon" type of fix to port the way you used the start and end dates into the algorithm IDE simply because of how calling a Pipeline is different in each. In the algorithm IDE, I've found over time that a lot of what I've tried to calculate on my own is already done but tucked away in some library/ dataset somewhere among the many available.

For all I know, there's a simple way to do this in algo IDE, but as of yet, I don't know of any--tl;dr yes, I think you've snookered yourself

Thanks Matt, although not the best result it is good to have my worry validated. Would this "cheat" (/annoying work around) work? Could I use the history function in the algo IDE but list all the securities in my universe? All it need is a starting dataset with closing trading day prices per row, per equity in each column. From then on I'm in.

Bit annoying having to copy (and format) the long list of tickers but there are worse things in life. I've got nothing else to do while my daughter is semi-asleep but not fully asleep at 3am.

Thanks

Haha.. yeah I can't sleep either. 3AM for me right now, but it's summer and I've got nothing to do until I start working again next week.

Sounds like that could work though... I've never messed around with the history dataset just because I'm lazy. I believe that quantopian.pipeline.data.Fundamentals.xyz_financial_ratio_asof_date works similarly, but I haven't really used that either.

As it seems you've also discovered, the history dataset only works when you call data on a specific security, which is why I stayed away from it. You could call a function that pulls all the tickers from the pipeline (which is supposed to only be a fancy pandas Dataframe, for which the docs are here) and then add whatever you calculate on to a column in the pipeline. That being said, despite Quantopian constantly reaffirming that it's simply a pandas Dataframe, I can't find any obvious way of calling Dataframe methods on it, and quite honestly, the Quantopian docs are more or less obsolete and don't have a lot of information besides introductory explanations.

I'm sure this is still somehow possible though and kudos to you if you can find it. I tried a few times during the year, but being my junior year of HS, I simply didn't have the time to find it between getting a healthy amount of sleep and my excessively stressful schedule.

BUT, you might not have to do this. If I'm honest, I got lazy and just glimpsed through your notebook before I responded initially. If you only need yesterday's close (I'm sure you already know this), it's already within data.USEquityPricing. Otherwise, it's a little more complicated than a simple import statement and reference.

The Quantopian employees are also decently active to my knowledge. You could try asking them about this, and I'm sure they'd be happy to help. I don't think that the guys that hang out on the forums have to do much moderating on the 15 posts that are made each week haha.

Edit:: Completely forgot about this. You can also try using a SMA or other factor (quantopian.pipeline.factors) to do a similar thing, but I don't know if this would work because I'm not exactly sure how you need to use it. It takes a numerical input and a window length in days by default; in the IDE, it's easily added to a pipeline and if you ever plan on entering the contest, you'll need to use the Optimize API, which works really most flawlessly with Pipeline data.

Very helpful Matt, I think I'll give it a bash and get in touch with the Quantopian people.

Good luck to you in your studies. Although British, I remember that time of my life (later couple of years at school which we call 6th form) - don't work too hard, it's a great time (before career, marriage, kids, etc. come along ... which is great obviously, in case my wife ends up reading this post).

Hahaha... thank you. Glad I could help out

Do limit orders close at the end of the day?