Back to Community
Minute Data in Pipeline Notebook

I have created a notebook to analyze minute data using a pipeline (I have not come across any notebook doing this yet, please let me know if there is something out there, so far, all I have seen is dataframes that use Quantopians prices function). I think this can be useful for others as well.

However, I am not able to reconstruct how the 1 min returns are actually computed, you can see below that I have created a close price column, which contains the price on the day before, but the 1 min returns to not match the return from the close on the day before to the open on the first day. I have added 2 cells at the bottom that demonstrate this. I am concerned that maybe my 1 min returns are not aligned correctly, any help is appreciated.

Loading notebook preview...
5 responses

Hi Niccola,

Do you mind sharing a bit more about what you are looking to do by aligning minute pricing data to a pipeline output? In general, Pipeline is a computation engine and there isn't currently a way to get minute pricing data "in" pipeline. You can align the outputs, but it's not clear to me what problem that solves that can't be done more easily with the Research API. If you are looking for a dataframe of minute pricing data, can you tell me more about what you get from this format that you don't get from prices? I'm wondering if there's a more straightforward way to get the data that you're looking for.

P.S. I'm having a bit of a hard time following the code in your notebook, and it's very tough to tell whether the alignment is correct just by reading it. Any chance you could add some comments to make it easier to follow?


The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Jamie for your help. have 2 goals in mind:

The first one is to create a simple dummy variable that equals 1 if within N days a stock hits an upper boundary and -1 if it hits a lower boundary (similar to the idea explained here in the forum in response to Marco de Lopez Prano's idea), and 0 if it hits neither one. The goal is then to use some ML-Algorithm that can predict this dummy variable. The idea being, that it is easier to predict this variable than the old fashioned idea to predict where the stock is going to be 5 days from now. I am not sure that this can be one in the Algorithm API??

The 2nd goal, is to test a simple idea of selling stocks if they hit an upper/lower boundary, with the goal to test what the boundary should be. I think this would take a lot of time in the Algorithm API, and could be done much faster with a few loops in Notebook.

Does this make sense or do you think there is an easier way to do this. I will edit the code to make it more readable.

I'm mostly wondering what pipeline is needed for if you are using prices to get the minute pricing data. Are you only using it to get the list of assets in a universe (like the QTU)? If so, I'd recommend just running an empty pipeline (no columns) to get the QTU assets and then pass each of those to prices to get a dataframe of minute pricing for each asset. You would then have to get the subset of those results according to the QTU constituents each day.

One issue with the technique you are using here is that prices returns pricing data adjusted as of the end_date of the query. If you only collect prices one month at a time, you will see jumps in pricing any time there is a split. Have you considered using returns instead of pricing data to avoid this issue? It might make things easier.

I need the pipeline to add all of my explanatory variables for the ML-algorithm (right now, the pipeline is empty just to keep things simple).

I see your point regarding the adjustment of prices. My holding period is usually only 1 day, so I thought that should not be a problem, but you are right, it might be just as easy to use returns at the minute level. I will change that accordingly.

My main goal is like I said, to to develop minute return data for 1 day, and then use the pipeline to create a dataframe that will allow me to run an ML algorithm on it.

The 2nd goal is to test optimal lower upper boundaries for a stop loss or limit strategy (this is doable in the algo api I guess, it just seems easier to run a loop inside of the notebook).

I looked into this and I do not think it is possible to pull the returns directly without the prices, is it? I can obviously pull daily returns but not minute returns correct?