Back to Community
Large data creation efficiency optimization

Hi all, I'm creating an algorithm that requires a large set of data in the form (past month of minutely data, split into intervals of every possible 180-minute interval, i.e. minute 0 to minute 180, minute 1 to minute 181, minute 2 to minute 182, etc.). Currently, the algorithm goes through a for loop and appends to a list the following:

A list of interval length 180, created by using the current price's placement in overall time series' index to append to the interval list, then moving to the next price point by adding one to initial price placement, and so on, filling the list by recursion (if interval length != 180, call function again, which grabs the next minute price) and finally appending this newly created list of 180 sequential prices to the list in the first paragraph, moving to the next price point to start this process all over again.

Is there a better/faster/more efficient way to run this algorithm? It's taking ~30 seconds to run this code alone, which is valuable time I'll need for other computations down the line, and which could affect the accuracy of price my algorithm would like to trade at.

Also, on a side note, is the code being run off of my computer's CPU, and would using a higher-core computer speed up computation? Or is Quantopian providing a cloud-like service.

5 responses

I can answer the side note. The code isn't ran off your PC. If you haven't installed Python packages and services, this would be evident as you can run the code and not have a Python interpreter on your PC so this is being ran off of a Quantopian server and/or PC at Quantopian. This can account for some of the possible lag you're experiencing. When live testing the algo, it should be faster as it will be running each day individually rather than cramming all the data in your backtest range into your algorithm at the same time. But that does sound highly intensive on CPU to for loop through it. I would think you could do data.history(asset, 'close', 180, '1d') to get the last 180 minutes within the day at the current time and use it in the handle_data(context, data) method which runs every minute. This suggestion is possibly wrong so don't take my word for it. Someone else might come through and say whether I'm right or wrong but if you ever need data in intervals, it's best to use data.history for it.

It might still be slow if the algorithm calls up the last 180 price values every minute. Attempting to find a better solution. Tried replacing hist = data.history(asset, 'price', 180, '1m') with:

price = data.current(stock,'price')
if hist exists:
del hist[-1]
hist = data.history(stock,'price',180,'1m')

This throws up a syntax error, unfortunately I don't know the correct syntax to use. The idea is that since handle_data is being called every minute, add the current price value and delete the oldest price value if the original 180-minute price record exists.

You just forgot to indent the 2 lines by 4 spaces (or 1 tab) under the if and the 1 line under the else

The message board software took away the indents, I tested it with the indents included and it still doesn't work. Think it might also be necessary to have before_trading_start store the data in context.price[stock] and context.hist[stock] so each stock has a separate price record.

The idea is genius, I know exactly how I can implement it to make it work how. Huge help, thank you!

Update - the aforementioned algorithm now takes ~.6 seconds, thank you again for the help :)