Back to Community
The Snail that is Notebook

I really want to give Quantopian a go. I see great things apparently achieved by others. I admire the software and the coding. I have looked at Alphalens, Pyfolio and Zipline on Github. I have even downloaded some of it to play with in the past.

I really, really would like to get into this. To play it seriously, to help discover some alpha, to make some submissions to the contest but Alphalens and even Thomas Wiecki's cut down and more easily understandable version are simply unusable during US opening hours.

I had more luck this morning in Europe but had to go out for the day.

What are other people doing? What are their experiences? Are you suffering the same frustrations? How are you getting around it?

Or are you mostly relying on the back tester where at least you can set a few back tests going and come back hours later to see what has happened?

Perhaps some of you are working off line? On your own data?

20 responses

Hi, good to read from you again, yes the notebooks are very slow, but to wait a few hours for Alphalens to finish is very often faster and easier than developing an algorithm to backtest.
I just let the notebooks run and relax with a book on finance in the meantime.

Hi @Zenothestoic I've come to the same conclusion. Sometimes, I have to wait hours to get my analyze done which is really frustrating. If others have any suggestions on how to increase speed, please give feedback. Lastly I (and I think others too) would be willing to pay a small fee if this would improve computational power.
Pieter-Jan

Here is a related topic from some time ago which raises similar issues : https://www.quantopian.com/posts/speed-please-2
Could someone of the Qunatopian team please provide us with a clear vision of how they will tackle the lack of computational power in the (near) future?

Is there any way to know when a Notebook is still alive and may eventually finish versus no chance of ever completing?
(This part of the problem is on Jupyter, not Quantopian)

Hitting the memory limitation is my biggest grudge with the research environment. I haven’t tried Thomas’ new NB yet though. Is it the cell calculating the period range of mean ICs that’s taking a long time?

What I find most frustrating is when training a factor over a long time (e.g. 10 years or longer), especially when having multiple NBs open reasearching multiple factors, and then to find that the kernel restarted for all of them because I’ve hit the memory ceiling. It would be interesting to know what the memory ceiling actually is for each user. Kaggle for example has much better transparency in this regard.

Surely there must be a way to use average utilisation stats, user growth rate projections, etc, and data science to do better resource capacity planning? Since there’s 15+ years worth of data, shouldn’t one expect to be able to train and test a reasonably complex factor (or combination of factors) on up to 15+ years worth of data in a reasonable time without having to always pray the NB doesn’t reach the memory ceiling? Is this an unreasonable expectation?

I’d also be willing to pay for more memory and speed. I do find it quite difficult to work productively in the current research environment, and I’ve always thought it’s been due to my often poor and inefficient code, but maybe there’s something more to it since it appears I’m not the only one with these struggles?

It would be interesting to hear feedback from the Q team regarding this, and if there is any plan to address any of these issues.

@Zenothestoic: To make sure, your complaint pertains to it sometime being quite slow, but not other times, right? So say it would always be as fast as the fastest you've seen it run, would that be usable for you? As a lot of data needs to be loaded and churned through, some delay is to be expected and unavoidable.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thomas
Let's just put it this way. If if takes YOU 35 minutes to run the notebook on a single factor given the (presumably huge) processing power and RAM available to the Quantopian team, then that is a fair cop. Then that is standard and normal. And just down to the fact you are running a great deal of data on 1500 or so stocks.

If that is the answer then there is nothing that can be done. Its just a factor of the sheer amount of data and the number of calculations. But in any event, leaving aside whether I can spare the time and effort to use the notebook, I much appreciated the thought and coding behind it.

Hi Thomas, do you have any updates whether Quantopian will improve computational power in the (near) future or is this not a priority?
Pieter-Jan

@Thomas, please lobby for more memory in Research as well please! :)

Dynamically increased memory (and CPU) whenever more is available would be nice. Maybe it's already implemented this way in your private cloud - I can't tell.

The lights are on but there's no one at home?

Hi Anthony,

You're right about the execution time of that notebook. It's slow, making it tough to iterate on ideas. I put together a notebook to analyze some of the functions in Thomas' notebook. My goal was to see if there were particular functions that were responsible for the slowness. Hopefully, you find the notebook informative. Spoiler: there's one function in particular that's quite slow, and another that will be slower between 3:30AM - 8:30AM ET, which lines up with when you said you were using Research.

Another thing to note is that it's not always obvious when your kernel crashes to due running out of memory. If your kernel runs out of memory, the notebook will appear as though it's still running (you'll see a [*] next to cells that hadn't yet completed). In general, it's a good idea to keep an eye on the memory meter. If the kernel runs out of memory, you'll have to restart the notebook and run it from the beginning. For Thomas' notebook in particular, you'll want to make sure you've shutdown any other running notebooks, as it's quite memory hungry and memory is shared across all of your notebooks.

I also want to add that we plan to continue using the backend that supports Fundamentals data in new integrations. That way, you can get a consistently fast interface to new data that we add to the platform. We'll also have to take a look and see if we can speed up get_clean_factor_and_forward_returns in Alphalens.

Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you Jamie, very helpful!

get_clean_factor_and_forward_returns is indeed quite slow, and it's difficult to know how long the cell will take to finish. Maybe one way would be to implement the progress bar example from this post? That way, one would expect 15 '*' in a period range of 1-15? It would also help in estimating if the NB will run into memory issues when running that cell.

No plan on adding more memory in Research, is there?

The notebook and algo environments are indeed slow, I usually watch a movie on the side while quantopianing...

They could figure out how to externalize computation to users. (allow downloading of data by paying more to data providers and less for infrastructure?)

Or update the pipeline api with a period parameter so that if you work on weekly data it won't run for every single day. There you go, a 5x speed and memory improvement.

Even in the algo environment if you rebalance weekly the pipeline runs factors for every day.

Jamie
Thanks for the answer, much appreciated!
Pieter-Jan

@Attila

I was under the impression that if you put:

algo.pipeline_output('pipeline')  

In a section which is only called once a week, the pipeline is only run once a week?

@Anthony: My suggestion for now would be to avoid running get_clean_factor_and_forward_returns in a loop. Much of the basic Alphalens usage only calls this function once, so maybe just hold off on some of the newer functions from Thomas' notebook.

@Quant Trader: Pipelines get run every day, regardless of the frequency with which they're called. That said, in most use cases, an algorithm wouldn't actually run much faster if the pipeline was only run weekly. The reason for this has to do with the way that the pipeline engine is implemented. The use cases where it would be faster is if you have very slow or complex computations being run. At some point, we will likely try to build different execution frequencies into the pipeline engine, but it's not on our short term to-do list right now.

@Jamie,

While it's maybe not a loop per direct definition, but I thought the whole point of Thomas' NB was to try to find which forward period IC and specific returns are maximized. In order to do that one would need to try different periods (e.g. 1-15 days as in the example), and get_clean_factor_and_forward_returns would need to run for each of those periods in the range, right? For slower moving factors (e.g. purely fundamental based factors, that get updated quarterly) you could use a 'step' in the range, but you still need to run it on multiple periods.

I hope this is not the reason we're told to expect to spend 90% of our time in Research...

Anthony, Joakim, Zenothestoic, etc: we recently shipped a performance update to Alphalens' get_clean_factor_and_forward_returns method. In Jamie's notebook above, the call to get_clean_factor_and_forward_returns that used to take ~177 seconds now takes about 18 seconds.

There's still room for improvement but hopefully this will help for now.

Thanks for the feedback and please keep it coming!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Awesome, thanks @Jean, @Luca, and @Thomas!

Do these AL methods also use less memory now, and if so, do you have the improvement metrics available?

Joakim, I don't have any memory profiling information ready to share at the moment, unfortunately. If I can, I'll get something for you.