Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Alpha Vertex PreCog Dataset

We recently added a new premium dataset - Alpha Vertex's PreCog (formerly Logitbot) Dataset. Alpha Vertex’s PreCog model uses a variety of data sources to predict price movements through machine learning and artificial intelligence techniques. There are two versions of the PreCog dataset available. The datasets are free except for the two weeks prior to the current day. Live data is available by subscription:

The datasets are available in both interactive and pipeline.

Attached is an example market neutral, long-short strategy written by Alpha Vertex that uses the PreCog Top 500 to select and weight a portfolio on a daily basis. The example uses forward predictions as well as past prediction accuracy to assign weights to a long-short portfolio. More specifically, the algo:

  1. Normalizes the projected log returns for each stock by the historical volatility of those
  2. Selects stocks whose recent prediction quality is high from the Q1500US.
  3. Goes long stocks with a high normalized expected return and goes short stocks with a negative expected return. Positions are held for a minimum of 2 days and a max of 5 days.

Here is Alpha Vertex’s official company and product description:

Alpha Vertex combines data science and machine learning technologies to deliver cognitive systems that provide advanced analytical capabilities to the investment community. Our analytical solutions monitor, link, classify, measure and analyze large volumes of information across global financial markets to identify emerging trends before they become obvious and model the impact of events on financial instruments.

PreCog is an analytical service built on top of the AV knowledge Network that uses powerful machine learning models to forecast time series and returns at multiple horizons. PreCog leverages thousands of models working in unison to analyze high dimensional datasets drawn from the AV Knowledge Network across hundreds of categories spanning price, company fundamentals, technical indicators, geopolitical, macroeconomic and news events.


The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

61 responses

Here is a similar algo with leverage more under control. This version adjusts weights of existing holdings based on new targets.

precog_top_500 is impressive but precog_top_100 is not really worth it (bad performance since 2016).
It is also interesting to note that the factor works very well in a long-short (dollar neutral) portfolio only, it doesn't work well as a pure returns predictor.
Also, the factor fails in predicting the very high return stocks. if you look at the performance of the bin containing the predicted returns above 0.03 it's disappointing, but this can be mitigated considering the full top quantile.
So this dataset seems really good for a Q fund style algorithms, but it is of little use for a retail investor (you are forced to be long-short and to invest in a very big basket of stocks).

We have now to wait 6 months/1 year time to see the out of sample performance of this dataset. If it keeps performing like this it's really good.

@Luca - Nice work! Thanks!

A couple of questions...
1. What is the difference between bins=[...], quantiles=None and bins=None, quantiles=6 ?
I think it's specifying non-equally space demarcation points...but what does it mean to the analysis ?
2. What are the auto-correlation traces telling you...look at them for non-stationarity ???...I'm not sure.
3. Looks like the 3-day prediction is better overall that the rest...do you come to the same conclusion?
4. Do you think that the factor analysis you've done translated directly to the trading algo done by @Jamie above,
or is there another trading algo that should come out of your analysis work?

Again, appreciate you sharing this!


1 -

quantiles classifies factor values into a certain number of categories with an equal number of units in each category. This produces bins with the same amount of elements but with different values ranges. This is useful if the factor is used to rank the stocks and you are interested in comparing the stocks relative to the others.

bins classifies factor values according to the values themselves. This produces groups with varying number of elements, but with the specific values span. This is useful if the factor values have a meaning (e.g. negative values means negative returns and positive values means positive returns) .

precog_top_500 returns the predicted log return for the security over the next 5 days, so I wanted to have a look if positive values were giving positive returns and negative values were resulting in negative returns. This is especially useful if you run a long only portfolio. Imagine the factor returns 100 negative values and 400 positive values, if you use quantile=2 you get 2 groups of 250 elements each, while if you use bins option you can separate the group according to the value, that is more interesting in this hypothetical scenario.

Note: when I set long_short=False in my NB I then use quantiles=6, but it would have been a better idea to use bins option instead.

2 - auto-correlation is useful for measuring the turnover of a factor. It is a way of evaluating how often the factor values change. If the auto-correlation is high between each period, then you can imagine the factor values don't change so much and vice versa.

3 - It seems to me the 1 day fwd period prediction is the best (that translates to rebalancing every day). You can see it from the "Mean Return by Factor Quantile" plot and the "Factor Weighted Long/Short Portfolio Cumulative Return" plot. Why do you like the 3-day prediction better?

4 - I haven't looked at the algorithm properly enough to express an opinion, but I can tell you that a good algorithm gives you the exact performance you see in "Factor Weighted Long/Short Portfolio Cumulative Return" plot. I do that all the time and except for the commission cost and slippage I see the same results.


I am getting the same issue with an unrelated algo. Think something broke

@ Luca -

You might try adding this:


If I understand the whole Quantopian long-short framework, it is intended to de-risk the strategy, per the various defined risks (see https://www.quantopian.com/allocation ).

Also, I wonder if, as a stand-alone strategy, it is worth anything to the Q fund? It could be highly correlated to algos they've already identified, and so would add nothing. The performance appears to be amazing, but maybe it is totally correlated, and not so attractive.

There is also the question of strategic intent ("We are looking for algorithms that are driven by underlying economic reasoning"). How does one articulate this when the factor is coming from something like Alpha Vertex PreCog? It would seem to be shrouded in mystery.

Kinda cool, but in the context of Quantopian and its fund, I'm trying to digest where this fits in. It seems like most of the work has already been done, so what are Q users gonna add? It certainly sets a high bar for anything a user might conjure up and submit, with beta = 0 and SR ~ 2-3.

@Grant, keep in mind that Alpha Vertex PreCog Dataset has just come out. As far as we know it is possible they completely overfit their model to the sample data. Indeed the factor performance gets worse in recent months. We really need to wait some time and see if the dataset keeps being so well performing in out of sample. Another surprising thing is that they didn't provide sample data during 2007-2010, this is suspicious.

If what we see now is comparable to out of sample performance (and performance during financial crisis) this is certainly something worth to the Q fund, but it would be interesting to hear from Q.

I didn't use the optimization api because I wanted to show we can get what Alphalens plots but it might be interesting to see what improvements it is possible to achieve with those api.

Thanks for reporting that issue. Indeed, it was introduced when we updated research yesterday. We rolled back the change and we have a fix in the works - it should be out shortly.

We re-updated research and included the fix to the pipeline issue that came up last night. You should be all set to re-run the notebook now.

Sorry for the trouble.

@Luca, @Grant An abbreviated answer for the moment and will follow up with some more detail this afternoon / evening.

First, very much appreciate the skeptical analysis -- one of the reasons we wanted to put this signal onto Quantopian is for the very reason that we couldn't think of a more skeptical audience than our quantitative finance friends. We thrive on that skepticism and have a quant on our team we strongly encourage to throw cold water on any modeling that becomes too opaque that we would be unable to understand why a specific prediction had been made.

Two brief answers - one on the backtest period -- we'd provided a look-back that Quantopian felt was sufficient (disclaimer: we'd begun conversations a year ago so my recollection here might be incorrect and this pre-dated Jamie so please don't shoot quivers his way) . As an aside, the models lost less than the market during the end of '08 and recovered massively within 1-2 months. If there is sufficient demand from the community, I'm sure we would consider extending the backtest period and I'd be happy to discuss with our CEO.

For Luca, (and obviously not speaking for Quantopian) on how our signal could possibly fit into the Q fund. Remember, that our signal is not a strategy. We do believe that we capture unique factors that are not highly correlated to the overall market and the models chose from 250 potential factors for each stock (there is no "God" model used for all stocks, each one is bespoke to the particular instrument and the Machine is free to choose or discard any of the 250 factors).

How you interpret this signal, the weight you give it and whether you use it as confirmation of a buy/sell signal your algo has generated or as a primary signal and use your traditional methodology to confirm ours instead, IMHO this is in the same general vein of high information technical indicators.

While you could use our signals and no others, it would be professional malpractice for us to suggest that we'd produced a signal that requires no additional inputs. I believe the most interesting and much better performing portfolios will combine the signal we provide with the best of quantitative finance. We don't see ourselves in any way as a replacement, but just another piece of kit in the toolbox.

I can sympathize with your question re ("We are looking for algorithms that are driven by underlying economic reasoning") and that one I'll have to leave to Q to weigh in on. My intuition here is, again, to think of us as just one input into a traditional quantitative model.

We're more than happy to answer any questions (except ones that peek into the rube goldberg machine - sorry) and if any members are having difficulty seeing how best to factor -- an admittedly unique tool -- into their existing methodologies, we'll do our best to help frame it according to your particular approach. And that would extend to waving you off if we think we'd detract from your strategy.

Feel free to ping us here or [email protected] , a mailbox we've set up for the community that will blast out to all of us in the office.

As far as we know it is possible they completely overfit their model to the sample data.

I think that this can be explored by setting up the model using the earlier half of the data, for example, and then running the backtest up to the present (or some such thing). This would be one way to demonstrate that no over-fitting is going on.

Generally, I'm confused how one might use backtesting on Quantopian to evaluate a strategy using PreCog indicators. Say we take Luca's fine example above, and run it again in 6 months, and then load the results into pyfolio. How would one sniff out over-fitting, since the indicators over the out-of-sample 6 months could have been over-fit, too. I guess the only way to test the PreCog indicators would be to do Quantopian paper trading on a live feed (which would cost $600). Or am I thinking about this incorrectly?

@Grant a suggestion on how you could do testing on out of sample data for free. The free historical dataset will always be kept at 2 weeks prior so, starting from launch, out of sample predictions have started to accumulate and will be available for inspection in two weeks.

I can say that we are hyper sensitive to both overfit and lookahead bias and have spent what feels like man months at this stage proving to ourselves that neither have worked their way into the models. In fact, we have discarded models where a whiff of overfit occurred amongst the 30,000 models we produce (for the global equity market).

That being said, it is refreshing to hear from folks who know where to look for likely pitfalls in ML based signals. Our personal OCD in avoiding overfitting has absolutely delayed many an improved model revision.

Thanks for the comment and hope this is helpful.


Hi Michael -

I'm sure you are sensitive to over-fitting and look-ahead bias, but it would be more convincing if you could provide a signal data set that was built/trained solely on an earlier time period, and then we could see how it performs out-of-sample. Even then, there is the risk of man-in-the-loop influences, since you can rigorously preclude the machine from knowing the future, but typically machine operators know what will happen.

Above, Luca makes the point:

We have now to wait 6 months/1 year time to see the out of sample performance of this dataset. If it keeps performing like this it's really good.

But I realized that this would not work. I could clone Luca's backtest, put it on the shelf for 6-12 months, and then re-run it, load it into Quantopian's pyfolio, and still not know what I'd need to know to determine if the signals are coming from a biased model. This works if the algo is self-contained and based on basic data (e.g. derived from OHLCV bars, company fundamentals, etc.), but not if the algo is plugging into a black-box ML signal feed. In 6-12 months we would be right where we are today, still with no way of assessing the quality of the signal. I guess the trick is to make sure that the historical data is not changing, on a rolling basis. So, it is a matter of running a series of backtests out-of-sample (e.g. monthly), and then re-running all of them in 6-12 months to ensure that the data set is not being changed. This is a pain, though; it would be much more convenient to run Quantopian paper trading, followed by a backtest in 6-12 months. Or maybe there is a simpler way?

@Grant, my idea of testing the out of sample data is more in line with Michael comments. Now that Alpha Vertex PreCog Dataset is available, Quantopian will store every day the new data coming from Alpha Vertex, so backtesting at any date later then now guarantees a look-ahead bias free test. Regarding the overfitting I believe that as long as the out of sample performance are in line with the results we see here then we are fine.

@Luca - Yes, so long as Quantopian creates the PreCog historical data going forward from a live feed (or effectively does so, by regularly comparing the live and historical data). But if they don't control or quality-check it, then there could be a problem. Jamie doesn't cover this important point above, but perhaps somewhere in the Q documentation it is stated that for their "curated" external data feeds, they handle this quality control detail. One could imagine that the PreCog data would be managed solely by Alpha Vertex, and so Q users would need to perform the check that historical data are static.

Sorry for the tangent, if this is a known quality control practice at Quantopian; I just came to the realization that for external data feeds, there is a risk in simply running a single backtest 6-12 months out, without a bit more information about the quality control of the data.

The problem is that precog is not really a traditional "factor". Factors have already been included and ur getting a list of stocks that are supposed to be long/short. This would be fine on its own because the only way to test is to paper trade it. Though with a big price tag someone is gonna have to upfront cash and put faith into this and let it run for a couple months. High cost high risk kind of deal. If it ends up not working out you just paid a ton of money for data that didnt really pan out. On the flip side if its too cheap then everyone will get it and alpha will decay. Institutions or individuals with lots of cash to try this out would be the only ones to truley get the best out of the data set because they get to try it out over prolonged periods.

@Grant: I just posted an explanation of how Quantopian loads, processes, and surfaces partner datasets. I'm hoping the walkthrough helps you to understand why Luca's suggestion of running a backtest at a later date is a valid out-of-sample test. Since Alpha Vertex is a machine learning signal, we don't apply 'deltas' (see the link for definition). This means that the AV dataset is a build-up of data collected live tacked on to the original historical load. Please note that the historical load for Alpha Vertex was done on March 6, 2017.

@Jonathan: You can run a backtest at some point in the future to evaluate the out of sample performance of the dataset per the explanation linked above.

Thanks Jamie. I'll take a look. --Grant

Thanks for the trading algo version of your research on PreCog!
To respond to your responses:

  1. Ok, so the way to go, for this app is bins...and the bins you have described are decoded as:
    Q1=[-100% , -3%],
    Q2=[-3%, -1%],
    Q4=[0%, 1%],
    Q5=[1%, 3%],
    Q6=[3%, 100%] Ok, for long_short=false, I'll run it with the bins and see what I get.

  2. Autocorrelation: I understand what you are saying about turnover...makes sense...thanks.
    I've been trying to understand more about that, wrt to AR/ARMA models, and the paper:
    which ascribes a major factor of daily price return autocorrelation to
    "partial price adjustment (PPA) (i.e., trade takes place at prices that do not fully reflect the information possessed by traders)" I don't understand that yet, and perhaps alphalens isn't even computing that.

  3. Ok, I see what you are saying about the 1-day fwd return being better...it is appreciably higher, numerically, for the precog500 run.
    I guess what confuses me is the distribution plots next to the Q-Q plots that show Mean/Variance for the various forward-days doesn't match up with the cumulative returns vis-à-vis which is "better". Also, just a check, but when alphalens is talking about forward days, they are referring to sampling period of the data...which in this case is 1,3,5,10 days. So if one of these is better than the 1-period sampling, there must be some pattern in the alpha signal...right?

  4. Again, I like your trading algo translation of using the pecog500 factor...looks good!
    Am also unsure about how it performs out-of-sample...but I guess that's all part of the "buy-low-sell-high" paradigm by predicting what tomorrow will bring!

Another thought as to validation...since precog500 is predicting the ups and downs of all the stocks all the time, then by re-creating a good sample of the S&P500 historically, using only assets tracked by precog500 in the S&P500, as a portfolio, you should be able to get a trading algo that beats the S&P500.
Hmmm...is that what you are doing in your last tests with long_short=false ?


Some things in the code to consider. Normalization. Sell before buy. Take profit and stoploss. Ignore the chart, just think about whether any of the concepts in code are worth trying with changes. This time leverage is not margin but longs & shorts relative to portfolio. Needs some goodness added back in.

This is not targeting returns, it is targeting ideas, it's rough and intended to be honed by anyone motivated.

better weights IMO

A previous algo actually had a heavy long bias because the short divisor was 20 although that wasn't its focus.
Anyway, here's some dynamic adjustment of long and short, adapted from my example here.

Desired Beta can be set. They are achieved, with these returns.

   beta_target     Returns  
       .30          405%  
       .20          369%  
       .15          376%  
       .00          327%  
      -.15          211%  

@ Jacob -

Regarding the beta ~ 0 (market neutral) objective, my understanding is that it is largely driven by the institutional market Quantopian is targeting for their hedge fund. The market will pay a premium for isolated alpha, whereas beta is a kind of commodity investment. Why would the Q fund customers want beta from Quantopian when they could just buy it themselves (e.g. SPY or some such thing). Hence, many examples posted attempt to achieve beta ~ 0, by using a 50/50 long-short allocation (this will get you close, and then the new optimization API under development can beat down the beta to zero, subject to other constraints).

One could increase performance by accepting a little more variance.

@Guy, now THATS a nice looking strategy!

now THATS a nice looking strategy!

What do you like best?

@Michael, my modifications were minor in nature. Most of the work was done by Blue. I changed only 5 parameters in that program, none of the code or logic.

I changed:

c.beta_target = 0.15 to allow more wiggling room (more variance)
c.beta_limit = .16 and not exceed.
context.long_ratio = 1.5 to favour the long side
context.threshold1 = 0.08 to allow more stocks to take a profit
context.threshold2 = -0.05 to take a stop loss earlier.

These changes had for objective to gradually increase the trading activity and profitability.

But still, those changes were done on Blue's program version.

@Blue @Guy Easy answer would be that you blew our strategy away but in reality, I'm just happy to see folks getting in and starting to make it their own.

This will likely sound odd, but I personally appreciate the creativity (that goes into reconciling an ML signal with traditional factors) as much as the raw performance, but I've certainly not been accused of traditional thinking. And as @Johnathan rightly pointed out earlier in the thread, the feed is not a traditional factor, making it a bit of an odd-man out.

Don't know if you'd heard the new-ish term, "Quantamentals"? If not, I like this reading of it (mostly because of the healthy skepticism of the writer). So, welcome to Quantamentals and may everyone go and make lots of (real or paper) money from your ingenuity and welcome skepticism (at least by us) of signals that require some new tricks to fully exploit.

As always, if we can help or provide any insights ping us here or on our special Quantopian email set up for you guys: [email protected].

Oh yeah, while I'm in a very rare PR mode, for anyone interested in Artificial Intelligence, we'll be at the NYU Future Labs AI Summit in a few weeks with some awesome speakers (and us!). And of course the obligatory coupon code of ffvc for $50 off. Happy trading folks!

I put in default commissions and slippage just to see how this would impact the returns of the algo and was surprised to this happen. Could anyone explain why?

how would you calculate the position sizing if you're using limit orders?

114,584 in commissions, compare weekly, monthly.

@Rohit, that strategy, as is, is not designed to play small.

Just putting up default commissions settings and you will lose it all. Your chart has demonstrated that.

Paying a minimum $1.00 per trade, playing 2 to 3 shares at a time, and doing it hundreds of thousands of times is definitely not the way to go. As @Blue has expressed, the commissions will simply eat you up.

However, even with default commission settings, the strategy will be productive if the initial capital is above $1,000,000. It can then give a decent rate of return.

Nonetheless, the strategy does not scale so well, either up or down. It does trade too much if you raise capital further and will not beat the average if it exceeds $10,000,000. It's like saying it has a goldilocks zone.

Notice, that what can make this trading strategy productive has nothing to do with its code. It is all a matter of its initial stake, of how much you put at play. It can not be too small, it can not be to large. But, one or two millions could be just right.

The attached strategy will do over 800,000 transactions over the 7.14 years. Including commissions, it managed to generate a 20.12% CAGR. If you also include leveraging fees, which you should, the CAGR will drop to 17.72% based on a 3% leveraging charge.

Conclusion: the results are still better than market averages if you put in the capital.

However, since the strategy does not scale up too well, I would have to say: there has to be better ways, better strategies, better returns. Even, if it behaved nicely.

I see what you mean. Thanks for the explanation. What about this algorithm makes it only suitable for higher amounts of capital and what do you think can be done to make it function better outside of its "goldilocks zone"?

@Rohit, the strategy's achilles' heel is: context.max_long_sec = 150.

As coded, the strategy with $100,000 will start with a potential 225 positions allocating $444 per trade. So, you generate a lot of small positions and trade them frequently. To the point that after a couple of years, it will trade in the vicinity of 1 to 3 shares at a time. Commissions simply eats your lunch! A strategy ending as a broker's dream come true: automated commission generation.

context.max_long_sec is fixed for the duration of the program. It is also why, with higher stakes, it will get to exit its goldilocks zone. With higher stakes, it will allocate more to positions. Thereby, making frictional costs less of a drawback percentage wise. However, the strategy will trade more which will increase frictional costs to the point of reducing overall performance. And again, render it less productive (as in not beat its benchmark).

A solution: make context.max_long_sec adapt to the equity curve. Trade more only if you have the money and a reasonable position size. Only trade more if there is a positive edge in there. If it requires higher stakes to get it, then the solution is evident.

Notice also that my modifications slightly favored the long side. The market is not a 50/50 proposition as others have said.

Here are 3 simulations with only change: commissions and slippage. This should help view why I consider including the impact of frictional costs to any stock trading simulation with volume. No code was changed. Leverage was used but stayed at the same level for all 3 tests.

The above chart shows ending gross profit starting with $2,000,000 as initial capital:
Simulation 23 with slippage and commissions: $10,030.000. Net CAGR: 22.65%.
Simulation 24 no slippage or commissions: $14,844,000. Net CAGR: 29.73%.
Simulation 25 with slippage, but no commissions. $12,972,000. Net CAGR: 27.25%.

We can deduce the cost of slippage and commissions: $4,814,000. More than twice the initial stake. Default slippage alone was: $2,942,000. And, we should not ignore leveraging costs related to each outcome (see above chart).

The gross CAGR which started at 32.43% with no commissions or slippage was reduced to a 22.65% net of all expenses, including leveraging fees. Still, a long-term 22.65% CAGR is not that bad. It is better than average.

Notice that in all 3 simulations, some important portfolio metrics remained relatively stable, namely: beta, volatility, and drawdown. I accepted some beta having biased the strategy to the long side while maintaining volatility and drawdowns relatively low.

Doing simulations without considering all frictional costs can give quite a distorted image of the end results.

In this case, frictional costs, including leverage, would have reduced the final result by 42.16%.

This is not negligible. It is, nonetheless, the impact of just one cent commission per share with a little slippage and a lot of churning. Trades average out to a $10 profit per transaction.

If you can reduce frictional costs by getting better rates and better treatment, it does appear worthwhile to do so.

It might not be that wise to ignore that in real life you will have to pay these fees. However, it does make your simulations definitely look much better than they really are.

Now, to do better, I will have to read the code, see what it really does. Then work on methods to reduce the volatility and drawdowns a little bit to make it a potential contender in a contest. That is, if I can, or find the motivation to do the work. However, I find it more important to generate more alpha than reducing volatility, this evidently, within reasonable and acceptable limits. To each his or her own.

As a side note: has anyone noticed that just the trade scheduling order will change end results. One should do the shorts first, then the longs, since there seems to be an overlap generating some account churning. The following two tests were with frictional costs included. The difference is noticeable. There was no code change except for the change in trade priority. That is a 28.14% CAGR, net of all expenses, including leverage.

I finally read the code, and not so pleased with the outcome. But, still.

If, after having pushed on the pressure points of a trading strategy, you get some alpha, then the question becomes: where does it really come from?

You have isolated some alpha over past data, now you want to make sure it will be there in the future. The reason for the alpha better be based on something visible, measurable, and tangible. Otherwise, on what basis would you call it alpha?

A strategy is based on some premises. What we could call its functions, procedures, objectives or whatever. Let's say it is a trading strategy for lack of a better word.

The purpose, evidently, is to generate some alpha which has already been demonstrated (see my previous posts).

But, what if the reasons for the over performance, espoused by the strategy, were not really there?

For instance, the strategy has a code section for taking profit targets and stop losses if they exceed certain fixed thresholds. It all appears reasonable. Except that over the life of the portfolio, over one million transactions for test 46, no such trades were ever executed. Could we say: redundant code?

I removed the program's temperance scaler destined to maintain the beta close to a specified target beta. There is a lot of math around this one, but still, I made it inoperative. See the output of test 46. No significant impact.

Piece by piece, this program was being taken apart. Its raison d'être was disappearing. What was left were the highest and lowest 225 momentum stocks (150 longs and 75 shorts). But, even there, I removed the beta re-targeting. Because, after only the first two weeks of its 7.14-year journey, it did not change a thing for the rest of the trading interval (see test 46).

I even removed one of my modifications. My boosting scaler was supposed to increase profits. It had absolutely no impact at all (see test 47). No need to guess what will happen to it.

At the beginning, the strategy buys a single share of SPY. It puts $113.32 to work that will grow to $236.45 by the end of the simulation. So, you get a $123.13 profit. Anybody would find it insignificant and not even bother about it. It is part of the trade and beta regulation process. However, from another point of view, it could be considered a very expensive trade.

I stopped the strategy from buying SPY resulting in test 48. It doubled the outcome. It increased the net CAGR to 33.81%, with all expenses paid: commissions, slippage and leveraging fees. A single line of code modified a pressure point for the strategy which was used to regulate the beta targeting.

This is only an analysis of a trading strategy. It is what I found the code to do, nothing more.

Pyfolio crashes on test 48, unable to complete (too many transactions, over two million). Would have liked to see how it behaved.

Still, the impact of not buying one share of SPY is considerable. Dismissing a single $113.32 trade, the strategy raked in, net of all expenses: $ 7,460,919 more in profits. Pushing the initial $ 2M stake to a net grand total of $ 15,991,248.

Test 48

Test 48 does show there was an impact on beta, but not so much on the volatility or drawdown. And when looking at the above chart, the equity curve is pretty stable nonetheless. There is some alpha in there just by favoring the long side of things.

Would you throw away doing a single trade, the SPY trade, for the added profit, even if it slightly increased volatility and drawdown? While, at the same time, demolishing the foundation on which the strategy is built.

Regardless of everything, test 48 did achieve a net 33.81% CAGR including leveraging fees.

It should make the strategy more interesting. It did this, but not based on the strategy's initial premises.

I see your point with context.max_long_secs. What do you think would be the best way to put that 150 value into an equation? Would a simple AUM to max long secs ratio work or should I use something a little more complicated like a regression? I also tried excluding SPY from my list of tradable assets but found that none of the performance metrics had changed. Do you know why this is? Here is the algo that restricts SPY

And this is the algo which includes SPY

@Rohit, our strategies might have the same origin, but they are very different and behave very differently as well. In my code SPY has an impact. It might not be purchased, but it is still part of the equation.

Based on your code, that you add or remove SPY should have no significant material impact. And, your charts do show that.

As for context.max_long_sec = 150 in your code, it sets your initial position size to $22.2k. It is acceptable on a $5M portfolio. Each position will be 0.44% of your equity. At 225 positions, it makes the strategy diversified with some emphasis on longs. No one stock, or trade, can make that big an impact. It is like trading on market noise. Nonetheless, you want this bet size to grow with available capital resources and time.

I use context.max_long_sec to bias my strategy to the upside, as in 150/75. If I allocate half the resources to the shorts, I will be biased to the upside by 0.66. Thereby, get a positive net leverage above 1.00 after scaling to the desire level. However, I want the strategy to be able to pay for the added expenses, this includes leveraging.

I updated the formula for context.max_long_sec to be a bit more adaptable to various capital amounts. This is what I've come up with:

Here is the backtest similar to my previous post with the change:

And this the a version with just $100K in AUM:

I think this a decent start but I'm curious on what you think would be a better way to increase the returns while keeping every other metric relatively constant.

@Guy also would you happen to know how I could analyze my strategy in alpha lens to see if there's a better way I could be organizing and ordering my equities? Specifically what would I do next in this notebook? Thanks for your help!

@Rohit, you have already demonstrated, as others before, that $100k was insufficient to trade 225 stocks for this strategy when you include commissions and slippage. The reason is simple, the starting bet size is too small: $444. That buys 3 shares of AAPL. You will pay a dollar to get in and as much to get out of the trade. It makes it unrealistic to play the game is such a way, unless you really want to lose.

If you want to play small, then, increase the bet size and reduce the number of stocks to something you might find more acceptable.

The amount of capital used in a trading strategy, whatever it is, is a limiting factor. It will dictate what you can do and what you should not.

We can increase this strategy's net long market exposure by accepting a higher beta which in turn would add slightly more volatility and drawdown with for only objective to increase its long-term CAGR.

The following chart illustrates this. It increases the beta from 0.65 to 0.78. This relates to its increase in volatility from 0.13 to 0.15 with its drawdown which goes from 14.5% to 16.8%.

These are minor changes all considered. A direct result of increasing leverage from an average 2.1 to an average 2.2.

Since Quantopian is considering stuff like leveraging 6 times on something like the original version of this program. I can consider this one below the mark even if I changed the strategy's underlying trading philosophy. At least, I know the strategy can support the leveraging, and pay for it!

Test 57

Test 57 increased the total output to $22,448,203.

When considering all expenses: commissions, slippage and leveraging charges, the total would be reduced to a net: $ 18,647,770.

It would still represent a 36.73% CAGR for the period, net of all expenses. It added 2.00 alpha points to its previous CAGR. Note that at that level alpha points are hard to come by.

This puts the strategy in a different class than the original.

For temporarily a little more pain, which occurred on two short spikes early in the run, you get an extra $ 2,656,522 in profits. This gain is distributed over the entire 7.14-year trading interval. The strategy does over 2 million transactions. The average transaction gets a buck or so more in average profit. It is a penny per share here, a few cents there, and it all adds up.

It becomes acceptable since all expenses are paid for. And, it is a machine that is really doing all the work. So, let it print those pennies.

The funny thing is that I have been increasing the alpha generation of this trading strategy one step at a time, not by enhancing its original properties, but by destroying them.

An easy way to generate more CAGR for this trading strategy is to simply increase the number of trade candidates. For instance, allowing 180 instead of 150 longs. This would reduce the initial bet size from 0.44% to 0.42%. Not what could be considered a major change.

Also, it is a decision being taken before the program even start. There is no change in program logic or code. Yet the trade allocation would be different, and it would increase the number of trades taken.

This increase in tradable candidates generated the following:

Test 58

Evidently, it would trade more, have a higher market coloration (higher beta) with an increase in volatility, drawdown, and trading expenses. Nonetheless, bottom line, including all expenses: commissions, slippage and leveraging fees, it resulted in a net 38.72% CAGR.

You changed one number, and it generated $ 2,031,496 more in profits.

Sure, to increase profits, you had to suffer a little bit more in drawdowns, but a net 38.72% CAGR does not run free in the streets. You technically have to pay for it in higher trading expenses: higher commissions, higher slippage, and higher leveraging fees. And, maybe, in mood swings, even if it is a machine doing the work.

You could also opt to reduce those portfolio metric using other means. But, somehow you will have to pay for it in one way or other.

Lame question. If this data only goes back to Jan 1 2010, then any algorithm that uses this data will not function (or do nothing) in backtests for period's prior to that. Wouldn't that be an automatic disqualify from a Q fund's point of view if an algo does nothing pre January 2010? I would like to use this data but status and clarity with regard to (periods of) backtests run for Q fund will help in confidently using this data.

If the last test (58) was too aggressive or generated too high a drawdown, as was said previously, other means could be taken to maintain profitability while at the same time reduce those portfolio metrics to more acceptable levels.

We are the architects of our trading strategies. We tell them what to do. But, if you do not follow the underlying math of the game, or consider it as some fancy formulas as some have argued, how can you mold your trading strategy to do what you want?

The mathematical formulas of this game are rather simplistic and restrictive as to what you can do to change the final outcome. But they are still the building blocks of any trading strategy.

Whatever is done in a stock trading strategy, all its trading history can be resumed using its payoff matrix: Σ(H.*ΔP). To see how it is derived, look up: https://www.quantopian.com/posts/the-payoff-matrix.

This strategy's payoff matrix, with its 2 million+ entries, can be resumed to two numbers: Σ(H.*ΔP) = n∙APPT, where APPT is the net average profit per trade. It does not matter what the trading strategy is, its composition, or what it is intended to do. The output will be: n∙APPT no matter what it does.

For this trading strategy, simply putting more money at work increases the bet size's dollar amount. The 2 million+ transactions that will be taken by this program will not escape those two numbers either: n, & APPT. It is all the transactions that will be affected by this change in bet size since it raises the net average profit per trade.

Yes, we could exchange upfront money for more subdued portfolio metrics. Increasing the initial stake can dampen some of the targeted metrics.

Putting more capital will increase the bet size which proportionally will increase the average profit per trade while allowing more transactions to be executed. As such, we would be increasing n and APPT with for result, evidently, an increase in the portfolio's payoff matrix.

If you say something like this, it should be because you can corroborate it with some kind of evidence.

The next chart illustrates the point. One million dollars was added to the initial stake. It lowered the desired portfolio metrics while maintaining its CAGR.

There were more expenses to be accounted for which slightly reduced the net CAGR to 37.82%.

Test 60

So, yes. We can exchange portfolio metric preferences for money. Meaning that you can pay to reduce those metrics and not the other way around. And still, be rewarded for doing so.

Test 60 does show reduced drawdown, reduced volatility and reduced beta compared to test 58. It is the result of putting more money on the table. Doing so increased the Sharpe and Sortino ratios as if we were slightly more efficient in extracting profits.

This was not the result of a factor or parameter optimization. It was again a decision taken from outside the trading program itself. No change to the strategy's trading logic, code or parameters.

It is as if the added million had operated at a net 35.89% CAGR of its own. Much better than a bank account or index fund.

By the way, making over 2 million+ transactions over a 7.14-year period is definitely playing on market noise in a sea of variance, nothing more, but still catching part the drift which is what this strategy does. But then again, why should we care if the noise we hear is Cha-Ching.

A summary for the last few tests follows:

The observable overall increase in performance was first due to removing the original strategy attributes. Then modeling the underlying mathematical formulas to do more trades with an increased net average profit per trade. That is it. There is no mystery. It is the result of a simple multiplication: n∙APPT, and a slight upward market drift.

Having portfolio level controlling functions, we should know where to apply or relieve pressure to kind of direct the general outcome of a portfolio's payoff matrix, and indirectly, its metrics:
F(t) = F(0) + g(t)∙Σ(H.*ΔP).

For example, if test 60 became more tolerable with its -12.70% drawdown than test 58 with its -19.40%, (which by the way appeared as two short-lived drops at the beginning of the simulation, it becomes more subdued afterwards, about -10%), then we might also accept a compromise like test 57 where the drawdown was -16.80%.

It is a compromise that could be acceptable only if we get something out of it, for instance, a higher net return.

You do not want to increase a potential drawdown with nothing to show for it, or even worse, have to pay for it.

If we intentionally apply pressure to re-increase the general market participation by increasing the number of trades and/or increasing the average net profit per transaction, we should expect higher performance. Otherwise, we kind of missed the boat or misunderstood the very nature of these controlling functions.

Evidently, this can become worthwhile if, and only if, when we tally the final results, we indeed made more profits. And this, after all additional trading expenses have been paid for.

The chart below is a simple change of intentions, a deliberate request for the trading procedures to do a little bit more of this and a little bit less of that. Two strategy constants slightly incremented giving the following results:

Test 70

Test 70 exhibits a net 45.33% CAGR after all expenses. We can observe that the equity line for test 70 is very similar to tests 60, 58, or 57. It is as if only the scale changed. A result of raising the alpha a few points at a time. Yet, every trade in these tests is different, either in size or timing.

To achieve this level of performance, it was required to accept higher trading expenses, meaning higher commissions, higher slippage and higher leveraging fees.

While we might want to minimize trading expenses, it is not the primary objective. What is, however, is the alpha generation after all expenses are paid.

There is a cost to the trading business. There are costs for the alpha generation. The more you want to raise it, the more it is going to cost in some way or other. The stock market does not have a philanthropic nature.

Test 57 generated net of all expenses $ 18,647,597, for a net 36.73% CAGR. While test 70, also with all expenses paid, made $ 28,827,861 with a net CAGR of 45.33%.

Test 70 even slightly increased profit extraction efficiency as conveyed by the Sharpe and Sortino ratios.

The drawdown did increase to -16.20%, about the same level as test 57. Not enough difference to statistically say they are different. Compared to test 57, volatility went from 0.15 to 0.16, which also can be considered about equal. The beta was slightly reduced from 0.78 to 0.74, meaning that we still managed to slightly reduce market correlation, but again, not significantly.

The question is: what can you exchange for a higher CAGR level?

I could push for more, but, I reached some Quantopian execution limits since some of the tests do not complete. I will try to stay within those constraints. Nonetheless, the strategy can do more, should I want it to.

@Leo, when evaluating algorithms, we take into account the start date of any required datasets.


The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Josh another question with regard to the contest. Do I need to purchase the paid subscription for the duration of the contest if I use this dataset in the algo that I end up submitting into the contest? Also wanting to make sure that the contest evaluation will proceed with the most recent data in the 6 month live trading period, even if I just backtested the algo with the free version/period of the dataset.

Today, the contest requires you to have subscribed to the data (if the data requires a subscription. Some data sets like PsychSignal do not).

@Josh, I am trying to figure a rule for the condition "if this dataset has no data take followup action (i.e close all positions etc.)".

If from your pipeline method I remove the quality filters mask and ret_filter and return everything from normalized_returns , and in get_long_and_shorts() method I use the fact that if len(long_df) and len(short_df) are both zero then is it fine to assume the dataset had no data.

Test 70 (see previous post) showed a relatively smooth equity curve with acceptable volatility and drawdown. Its impressive CAGR was achieved through its market participation rate.

In other words, I simply made it do more of what it was already doing. So, no big deal, just set the program to increase the number of transactions.

As was also said in a previous post, all the trading history of a trading strategy can be resumed in two numbers: Σ(H.*ΔP) = n∙APPT. If APPT tends to a limit for a particular trading strategy, then, n is the other option to increase in order to increase profits. And if you can raise APPT just a bit, it will be noticeable due to the large n. Very simple math.

Due to the large number of transactions, (well in excess of 2.5 million) the net average profit per trade (APPT) tends to a limit. Adding a few hundred or a few thousand will not move the needle much either way.

Since these added trades are spread out over the timeline, they will have the same statistical characteristics as the existing trade population. A way of saying you will be adding more of the same and the APPT will continue to tend closer to its limiting average.

Therefore, should you want to push for more, then you should be ready to accept the additional costs involved in doing more business, just as in any other kind of business.

Because I am dealing with the whole payoff matrix from start to finish as one big block of data, I can “guide” its outcome in the direction I want. As if giving a slight initial nudge that will reverberate as a crescendo throughout the payoff matrix over the trading interval. The given nudge is just a slight alpha nudge to its cruise control. After which, the trading strategy is left on its auto-pilot.

In test 72, again, I raised the bar, put on more pressure by requesting to trade even more (increase n) and take a more positive net long participation with the following results:

Test 72

This raised the CAGR to 49.64% net of all expenses.

When you look at the portfolio metrics, the changes could be considered minimal. Drawdown went from -16.2% in test 70 to -18.6% in test 72. For accepting a momentary stress of 2.4% more in drawdown, the strategy made $ 6,688,837 more in the net profit department. This added 4.31% alpha points to its CAGR compared to test 70.

Evidently, it endured higher fees, some $ 2,125,163 more. It was the price to pay to add $ 6,688,837 more to the total net profit with all expenses paid.

To push even further, it is enough to request even more of the same, nudge the alpha a bit more. This generated the following chart:

Test 73

The net CAGR came in at 53.22%, all expenses paid, including leveraging fees.

Compared to test 72, it added $ 6,523,215 in net profits. And, if compared to test 70, the added profit would have been $ 13,212,053, all for accepting to trade more with for consequence higher trading expenses with about the same portfolio metrics.

Test 73 started with $ 3 million and ended 7.14 years later with $ 42,039,913 and a remarkably smooth equity curve. I should repeat that: a net 53.22% CAGR, all expenses paid. Remarkable, even if I have to repeat myself.

I am starting to see signs of strain, that this thing has a limit. I might be close to it or have exceeded it already. It is why I do the simulations, to find out where those limits are to then select a comfort zone. Without knowing the limits to a trading strategy, how could we ever say how far it could go?

When we look closer at charts 73, 72, 70, 60, 58, and 57, the first year looks about the same for each one, mainly flat. We can hardly distinguish their initial trends when looking at the big picture. Yet, it is on those foundations that everything else is being built since all we see is incremental alpha from one chart to the next culminating in chart 73.

By my estimation: anyone can do even better. It is just another way of doing things.

I ended my last post saying: anyone can do even better.

After all, if you want more, you have to do more.

And, in this case, it only required to do more of the same.

The real question might be: do you, or don't you, want more?

Answering yes appears reasonable as long as the system remains profitable. Everything needed is already there: Σ(H.*ΔP) = n∙APPT, as if pointing to n. We should be able to accept a little more early discomfort if there is a higher reward in the end.

The more you want to request from this trading strategy, the more it will cost in commissions, slippage, and leveraging fees.

The strategy's alpha gets more and more expensive to extract.

It can also be more lucrative, meaning more profitable. There might be no free lunch in this trading business, but, you can pay for your own.

This new test would not terminate, it was too big. It exceeded Quantopian's testing environment limits.

As an alternative, I broke the testing interval in 3 parts, ran them successively and tabulated the results:

Table Test 78:

The ending CAGR was 52.84% net of all trading expenses (see table's Total line).

The strategy definitely made more money: $ 103,256,190 net. It more than doubled test 73. This, after all expenses paid, including $ 37,305,054 in leveraging fees.

To reach these levels, I increased the initial capital to $ 5 million, a decision that is made from outside the strategy, and primed it to trade a bit more. I calculated it made over 10 million transactions (n). That is a payoff matrix of over 43,000 rows. It is a lot of trading activity. It is also why the program could not complete on a single run, and that it was necessary to break in down in three, and finally four parts to also barely get the information I wanted.

The strategy is showing signs of strain, the CAGR stopped rising.

Test 73 had a net 53.22% CAGR. I estimate I have exceeded the top of the strategy's quadratic return function, and from there, the CAGR could gradually continue to slowly decline should I add more pressure. However, one could still add pressure as long as the strategy stayed above a certain limit. Say, for instance, above a 30% to 45% CAGR.

To reach higher CAGR levels it becomes harder and harder to achieve due to the increasing drag from the costs of doing business. But, that is not a new concept, it is the same as in any other kind of business.

Notwithstanding, money-wise, you made much more, in fact, 2.4 times more than test 73. It traded a lot more too, but then, a machine is doing the work. Regardless, net of all expenses, a 52.84% CAGR remains remarkable.

You could always tone it down, if desired. You already know it would work since it is where you came from.

Simply go back to the same program settings as tests: 73, 72, 70, 60, 58, 57, 48, 24, or 23. Overall, it would reduce some of the portfolio metrics and also reduce profit extraction. It is a matter of choice. You can exchange your sensibility to certain portfolio metrics for money.

My previous post's conclusion remains. It was possible to do even better simply by adding more initial capital and continue doing more of the same by increasing n. It is also feasible to do even more by enhancing or adding other strategy properties and better controlling functions to alleviate some of the program's shortcomings. However, I am not prepared to do that at this time. At least you have a program structure that showed it was possible and feasible to do more, much more.

I consider the stock selection process in this trading strategy as not biased as some might think.

The strategy uses the Q1500US stock selection, a list of stocks considered liquid and tradable. It includes some delisted, merged and bankrupt companies while they were eligible, meaning met the selection criteria. As such, I find it reasonable and acceptable. It is like saying: let's deal with the upper quintile of listed companies. How could I object to that?

I consider that list as stocks that could be included in any portfolio due to their tradable nature while they are in the list. And, while not all inclusive, the list is large enough to meet most portfolio needs. It is also an easy solution, somebody else did the work for me.

Any subgroup of this is the same as picking for some reason or other a sample from the existing population. There are gazillions of possibilities as to the content of a particular selection. You can not test them all, you have to make a choice since you want to trade some of them.

From this list, up to 500 can be taken that also appear on the PreCog list. What is left is this list of potential trade candidates from which at any one time the program can pick up to 240 of them (180 longs and 60 shorts, this based on my latest version of this program).

The compilation of the list is somewhat unimportant, what is however, is that those stocks were tradable candidates and could be included in any portfolio knowing that a trade with volume could be executed at whatever date and time it was issued.

So, yes, this list will be composed of some of the best names and most valued companies trading a sufficient daily volume to enable getting in and out of trades. To declare a profit or a loss, a trade needs to close. You need someone on the other side to accept your trade.

There are a lot of reasonable reasons to exclude from your stock selection a lot of stocks contrary to some suggesting that all stocks should be included which would be nonsensical.

The problem is not to include all stocks in the “possible” stock universe.

It is to generate a profit from the selection you make. Whatever it is.

I would not be surprised if most of the selected stocks were also part of the S&P500 for instance. Nonetheless.

At the top of this thread, doubts as to the relevance of the PreCog list has been expressed. I had the same concerns as stated here: https://www.quantopian.com/posts/using-alternative-data-researching-and-implementing-a-market-neutral-strategy And, those same concerns were also raised by @Pravin: https://www.quantopian.com/posts/alpha-vertex-precog-test

I am not surprised by @Pravin's results. I reached about the same conclusion.

There is not that much difference in using the PreCog list from maybe another list that would have 252-day momentum sorting. Any trading decision based on price that is sorted on its last 252-day return history is really late to the party. It might as well be considered to be equivalent to simply trading on market noise.

If your trade selection process is highly subject to market noise, your trade results will be also. So, I agree with @Pravin's results. And I accept other people's concern about the validity of the predictive powers of the PreCog list. It has not been demonstrated, at least not to my satisfaction.

But, for me, it is a secondary issue. What I am looking for is a tradable stock universe that will respond to my trade mechanics for buy and sell orders when requested. And, that list is as good as any in that respect.

I can guesstimate that most of the S&P500 stocks are also included in the Q1500US list. All I will be dealing with will be a sampling method. Which should raise the question: is it a better sampling method than others? I do not know, and I have no way of knowing since it would required to go through the gazillions of other possible scenarios.

The stock selection might be no better than something else, and just for that, surprisingly, it becomes acceptable.

Either way, I am using this selection method as the excuse to pick stocks in order to get in and out of trades, even if it is on market noise. It is once in a trade that I need to manage the inventory. And it is from the outside that I need to figure out how to do it.

A trading strategy needs to generate a profit: Σ(H.*ΔP) > 0. H, the tradable stock inventory, is limited by the portfolio's available capital. Every element of H has for expression h(i,j) = q(i,j)∙p(i,j) with i = 1 to n, and where j in this case can be up to 240 stocks. You have a position in stock j if h(i,j) ≠ 0. A long if: h(i,j) > 0, and a short if: h(i,j) < 0. A hold has for expression: h(i,j) = h(i-1,j), where the quantity in inventory did not change.

This strategy takes a reduced sample from a tradable sample of a selected sample of the existing US stock universe. It is almost like saying: make a choice, you need to make one. It might not be the best, but it is a choice.

I consider it sufficient, as if a reasonable excuse to pick out up to 240 stocks out of some 8,300 while they are alive, and presumably having 180 prospering while 60 are not.

If we took the 500 stocks from the S&P500 that are in the Q1500US list, I think we might get something quite similar. A question like: how many of the S&P500 are in the Q1500US? I put it at a high percentage. I see this particular stock selection process not that much different than picking 240 stocks from the Q1500US while they are in the S&P500 and while they are in a self-defined uptrend or downtrend.

It is a way of saying that the PreCog selection is not the driving force, the strategy H is. What really counts is the strategy's trading mechanics. Can it extract the profits from the selection it made? Can it profitably manage its ongoing stock inventory? Those are the questions, and they can not be answered unless we do some simulations over past market data.

We have to go beyond opinions, wishes, and show where the rubber hits the road. As said before: we are the architects of our trading strategies, we are the builders of our own payoff matrices.

That is what was demonstrated in my backtests. A somewhat controllable payoff matrix, the result of slicing and dicing price series drowning in a noisy sea of variance with little that we could call an edge.

What I want to see in a stock “trading” strategy is a large number of potential trades from which to choose from, and a positive net average profit per trade (its edge). This has for equation: Σ(H.*ΔP) = n∙APPT. That is not a guess, or an opinion, it bears an equal sign.

Trading Strategy Analysis.

What interested me in this trading strategy was not the stock selection process. It was the trading strategy's structure. I saw something I could reshape and mold to meet my objectives.

Gradually, as seen by the successive simulations, I gave the strategy a new direction by adding the things I wanted to see and partly control. Each step doing more and more. Each step gaining more and more.

What I consider the cornerstone of this program is a lowly variable. It controlled everything of significance this trading strategy had to offer.

The constant: max_long_sec, represents the maximum number of stocks one can take long at any one time. It is determined by choice. It is not calculated by the program's logic, nor regulated by any other variable. It is simply preset in code.

I viewed it more as the programmer's afterthought or prerequisite. As if “maybe I should limit this and have half for longs from the top of the list of the projected rising stocks”. All the program's effort seemed to have been concentrated elsewhere with no trace or mention of the importance or significance of this constant. And this, even if it has its say in about everything.

I disabled most of the original functions in this trading strategy, some without replacements, others altering the very nature of the program. I modified its trading mechanics, and it resulted in quite a different program from the original version.

For example, stop loss and profit target procedures were eliminated since it was redundant code. Beta scaling and targeting functions were rendered inoperative. In fact, little of the original strategy was kept. Even the benchmark's one share directional referencer was not purchased (SPY).

The max_long_sec constant permeates the whole infrastructure of this trading strategy forcing it to do things maybe not intended by the original programmer, or just maybe he did not see what I saw in it.

As a side note: my books, Stock Trading Strategy Mechanics, if you want more, you have to do more, and Building Your Stock Portfolio. Both covered the subject of trade mechanics, and that the trading strategy itself was at the center of the payoff matrix: Σ(H.*ΔP). It is what you do (H) with ΔP that will have an impact. Your trading logic over what is available will determine what will be the outcome. This has been extensively covered on my website.

A payoff matrix is trade agnostic. It does not care where the profit or loss comes from. Neither does it care about sentimentalities, trade psychology, theories or moon phase. It only keeps track of the final score.

All the successive tests I presented showed incremental alpha by modifying the trade mechanics of the program, by modifying g(t) in: F(t) = F(0) + (1 + g(t))∙Σ(H.*ΔP).

Test 60 showed remarkable performance. Relatively low drawdowns and volatility with a 37.82% CAGR net of all trading expenses: commissions, slippage, and leveraging fees.

Test 70 did even better with a 45.33% CAGR net of all expenses.

Test 72 raised the bar with a 49.64% CAGR.

And, test 73 raised the bar even higher with a 53.22% CAGR, net of all expenses.

In all four cases, I increased performance by increasing the number of trades (n). And this, increased the alpha generation which is what, in the end, really matters. We can also see that the incremental CAGR is decreasing. It is getting harder to produce more.

Test 70's payoff matrix Σ(H.*ΔP) has a minimum size of 5,580 rows by 240 columns, a total of 1,399,200 data elements. However, due to Quantopian's 2.5% of volume rule the size of this payoff matrix is more like a minimum 10,400 by 240, or at least 2,496,000 data entries. The matrices at play: the price matrix, holding matrix, price difference matrix, as well as the decision matrix are of the same size, each with as many data elements.

All the payoff matrix data elements can be resumed in two numbers: Σ(H.*ΔP) = n∙APPT where APPT is the net average profit per trade. It does not matter what the trading strategy is or its composition, the output will be: n∙APPT, with unit a dollar sign.

If I can write: F(t) = F(0) + g(t)∙Σ(H.*ΔP), then I can also write: F(t) = F(0) + g(t)∙n∙APPT. Meaning that by controlling the number of trades, and the average net profit per trade (APPT), I can have some function to scale up, or down, the output of the trading strategy.

All I need is find g(t) and give it a positive stance. Just g(t) > 1 would appear sufficient. Or, expressed differently, but saying the same thing: F(t) = F(0) + (1+g(t))∙n∙APPT with g(t) > 0. We can make g(t) market correlated.

My calculations for test 60 estimated the APPT to be about $ 13.95 on an average $ 8,889 bet (0.1569% profit or 0.001569). To put this in perspective, it is less than an average 8-cent move on a $50 stock (0.07848). While the market as a whole had to offer on average: $ 13.68 (approx: 0.1538%) just for participating in the game, also less than 8 cents (0.07695) for a comparable move.

This would imply that the advantage of using the PreCog dataset might be 0.0031% or $0.27 per average transaction. You get $0.27 more profit than market average on an $ 8,889 bet.

This is not a great incentive for or against the use of this dataset since I could not differentiate it from the market's average offer by any statistically significant measure.

This confirms why the initial version of this program could not obtain high alpha readings from the PreCog dataset since it tended to be close to the market's average performance level.

Tests showed you had this swarm of price variance hovering above and below the zero-change line. This swarm was ordered by momentum (252 days), then divided into three parts. The top 180, the bottom 60, and all the rest standing in between. As if the middle was a buffer, a no-trade zone. A 5-day PreCog prediction could throw a selected stock in either of the three zones every day with for side effect a lot of churning.

I consider the dataset used as not that different from what the market had to offer. At least, I can not exclude the null hypothesis that they are about the same. You were simply trading on market noise.

Test 73 does show impressive results. A net CAGR of 53.22%, all expenses paid, including leveraging fees. But as expressed before, one could still nudge its alpha higher simply by increasing g(t): F(t) = F(0) + (1 + g(t)↑)∙Σ(H.*ΔP) as was done in prior tests (see test 75).

This was also demonstrated in test 78.

Here are the important numbers in graphic form:

CAGR vs Leveraging Costs:

Now that I know where the strategy limits are and how far it could go, I can choose the level at which I feel most comfortable knowing that I could do more if I wanted to.

Also, knowing what controls the trading strategy, I could give this control to outside program intervention as if having the trading program on a joystick, or some slider control of some kind. That would be fun.

Note: with all the observations provided, anyone could rebuild this trading strategy. Rebuilding it yourself makes you understand what makes it tick.

Some can ignore the math of the game, but it won't make it go away.

Post-Strategy Portfolio Analysis.

I started analyzing the impact of a trading strategy - like 78 - could have on a portfolio of trading strategies. This game requires being played for years, even decades. One has to plan for the long term.

Some assumptions are made. But, they are generalizations and can be modified. What will be presented are simple calculations.

The market over a 20-year period can offer a 10% CAGR, on average, with reinvested dividends. It is the secular market average, but with no guarantees. Nonetheless, with this, we can make an estimate, an approximation of Quantopian's fund value might be in 20 years time: 250M∙(1+0.10)^20 = 1.68B. Its fund would have grown to about $ 1.68 billion dollars. Interesting, but just average. And it took 20 years...

However, Quantopian with its approach is adding some alpha to the mix. Therefore, I raised their estimated CAGR to 15%. That is a 50% increase attributed entirely to their alpha generation. It is also better than the average fund manager out there. Under this scenario, the Quantopian fund could reach $ 4.09 billion dollars in 20 years. Their approach adding $ 2.4 billion to their fund. I would say: well worth it. Almost 10 times more than their original stake. And $ 2.4 billion is still considered money.

If they could reach a 20% net CAGR, then after 20 years their account might be at $ 9.58 billion. Much better, their efforts paid off, returning $ 7.90 billion above market average. So their quest for higher alpha was rewarded. BTW, to put this in perspective, Mr. Buffett has a near 20% CAGR. So, I see it as easily achievable using all the tools at Quantopian's disposal.

Now, let's add the impact of a single high alpha trading strategy. Say we allocate 20% to a high alpha strategy and the rest (80%) is left for the set of 10 strategies under management.

From tests 78, 75, 73 and 72 we could average out the high alpha strategy at a 50% CAGR net of all trading expenses. I have methods that would help sustain this CAGR level. Note that these tests were all done with an overcharge on commissions and slippage, as well as on its leveraging fees.

The following table looks at the various trading intervals based on an averaged 15% return on Quantopian's set of strategies with equal allocations, and the impact of an added high alpha strategy receiving a 20% allocation:

High Alpha 20% Allocation on Total Portfolio

Over a 20-year period, this aggregated fund would have added 23.54% in alpha points above Quantopian's expected return, generating $165 billion more in profits due to the 20% participation of a single high powered trading strategy.

It is a totally different ball game!

I have shown at least 4 trading strategies found on Quantopian that I have cloned, modified and where I have raised their CAGR levels in the 50% range or above. From this trading strategy alone, I have generated tests: 78, 75, 73 and 72 at that level or better. These are not exceptions. It is not like saying it can not be done. A demonstration of this has already been made in prior posts.

You increase Quantopian's CAGR to 20% for its 20-year investment period and raise the participation of a single high alpha strategy to 30%. Here is what would result:

30% High Alpha Impact on Total Portfolio

Increasing the high alpha participation had quite an impact on the overall picture. It alone added $ 83 billion to the fund over those 20 years. The numbers do speak for themselves.

Increasing the high alpha allocation from 20% to 30% is an administrative move; a management allocation decision taken in a boardroom. It is also independent of whatever the other trading strategies in the group did. They averaged out to a 20% net CAGR.

Some will say this is unrealistic. Yes, I could have agreed to that years ago. But I already have at least four trading strategies that could do it, and I have some more in reserve that can do better.

So, I find the above two charts not only reasonable but also feasible. And, if I can do it, then others can too.

If we look at a portfolio metric like drawdown, we expect each strategy to contribute its share. From test 78 and Quantopian's expected -0.10 drawdown, we get for the 20% allocation: 0.20*-0.195 + 0.80*-0.10 = -0.119. Adding a high alpha strategy increased the drawdown from an expected 10% to 11.9%. I think anybody can survive the added pressure.

The same calculation applies to volatility. From a Quantopian expected 0.10 for volatility, we would get, also based on test 78: 0.20*0.19 + 0.80*0.10 = 0.118. This would raise Quantopian's expected volatility from 10% to 11.8%. Again, not a major strain.

The impact of a high alpha strategy on a portfolio of strategies can be considerable as presented above. In fact, it can overwhelm its peers. Here is a look of ten similar funds to which is added a high alpha strategy. There is no lost of generalities by making STRAT-1 to 10 equal, they will average out to the same number. Using Quantopian's expected 20% average return for strategies 1 to 10, with the high alpha strategy getting a 30% allocation for its 20-year term would result in:

30% High Alpha Allocation Fund

The high alpha strategy clearly dominated all others, to such an extent that its ending weight is 0.9738 compared to the average strategy weight of 0.0026. 97.38% of the fund is now concentrated in one high performing strategy.

It made the fund what it was or what it could be. As simple as that. The program version for test 78 is just an averaging machine.

Because you provided time to a high alpha strategy, it simply came to dominate the entire portfolio to such an extent that in the end you could almost throw away all other strategies, and it might go unnoticed.

What we would see in time is STRAT-1 to 10 having gradually decreasing weights from their initial 0.07 down to 0.0026. While the high alpha strategy would have its weight rise from 0.30 to 0.9738. In its early years, the high alpha strategy would have its higher volatility and drawdown dampened by the other strategies and still build on its inner strengths, its alpha generation capabilities.

It becomes quite understandable to seek such strategies when you look at the impact they could have in time. A couple of those in a portfolio and it would change its long term outlook considerably.

I see the quest for the high alpha strategy as warranted and more than worth it. From the expected 20% net CAGR 10-strategy portfolio, the added high alpha strategy raised the total portfolio CAGR to 41% almost all by its own. More than doubling the initial portfolio's expected 20% CAGR.

So, how much is such a stock trading strategy worth? Based on the above charts and what has been presented so far: a lot.

How much is 1 extra alpha point worth at test 78 levels? The answer is easy. You change one number in the last chart:

1% Added Alpha

That 1% added alpha could generate $ 35.4 billion more on its own! So, yes, strive for it.

Bottom line, it is just a program, a piece of software that has for mission to average everything out, and on average take its cut out of millions of transactions.

I chronicled most of what I presented in this forum in my new book: From Zero-Beta to Alpha Generation, Reshaping a Stock Trading Strategy.

In it, I explain the different steps taken in the development process to make the initial trading strategy go from its zero-beta design to a much more productive alpha generator. To the point that in a portfolio of strategies, it could become the elephant in the room, as my previous posts showed.

It was not by recognizing the predictive powers of the PreCog dataset that I arrived at these results, but simply by accepting the Q1500US as a reasonable stock selection since it is what had predominance in the way I transformed the trading strategy.

It is easy to see if a trading strategy has merit. And, it is a simple math problem too.

Your strategy has to have an impact on the portfolio's final results. It has for equation: F(t) = F(0) + (1+g(t))∙Σ(H.*ΔP). And therefore you have to concentrate your action on Σ(H.*ΔP), the strategy's payoff matrix.

Organize your trading methods to achieve: (1+g(t))∙Σ(H.*ΔP) > Σ(H(a).*ΔP) > Σ(H(m).*ΔP). Meaning that what you bring to this problem (your skills, investment functions, and procedures) is greater than the other guy's average input which needs also to be better than what the market had to offer almost for free (low-cost index funds).

Your strategy fails, at any of these levels, and you have nothing.

If it fails against your peers, they win.

If it fails against market averages, your peers, win again. And, technically, no one is interested in your trading strategy. The reason is rather simple too: why in the world would anyone deliberately choose to underperform the averages?

The book also shows that not seeking the limits of our trading strategies can make us greatly underestimate what they could really do.

Also, From Zero-Beta to Alpha Generation, Reshaping a Stock Trading Strategy, can serve as a blueprint in structuring your own trading strategies to make them high alpha performers. It reiterates what I often say: if you want more, you have to do more.

Some might think that the presented trading strategy is unrealistic. That we can not reach those kinds of numbers. When all it did was follow the math of the game.

What I see is a simple piece of software. A program instructed to trade in a particular manner.

Since it is just a program. It could be re-engineered by anyone having the means to carry it out. The successive simulations showed it could have been done with relative ease using the Q1500US dataset.

Understandably, if you do not have the means (capital), the last iteration of this strategy is totally out of reach. In that state it literally is not designed for small accounts.

But, that alone, is not a valid argument against the strategy itself. It might only says you do not have the capital or inclination to carry it out. Your time might be better spent finding more capital for instance.

In trying to scale the strategy down to a smaller account size, one could reverse some of the steps that scaled it up. My first step would be to reduce the number of trade candidates to a level more in line with the smaller initial capital.

The notion of an atlas strategy dates way back. I see a multitude of them. Sure, there will be one at the top, but, be assured, this one has no pretense of being it. A lot of strategies could do better.

However, at some point, we do have to make a choice and pick at least one. So, take your best shot (highest alpha), or maybe chose the strategy you like best for whatever reason. This, until you can find a better one.

The search for an atlas strategy should be unrelenting. There are whole families of these strategies out there. Plus many variations on these same themes. Your best trading strategies should already have reached that status.

Your fund has for equation: F(t) = (1+L)∙F(0)∙(1 + r_m + α - fc% - lc% - d%)^t, where your alpha should not only be positive, it should exceed frictional and leveraging costs. This over the long term. Leveraging costs apply only if L>0, meaning that in the program it is greater than 1 (leverage > 1.00).

Looking at it from the above equation's perspective, there is one number that should be your main objective. That is alpha. It depends on the skills you bring and it puts in one number the outcome of your trading procedures. It translates to: can you do better than your peers?

This alpha is compounded. Its real power is not at the start of the game. It is at the other end of the time spectrum. Where you will find that a few extra alpha points could have made a huge difference. But, by then, you will have reached destination and the market does not offer a rerun button. Therefore, plan for those extra alpha points from the start. The “I should have” in this game does not carry any rewards.

I have no way of saying that what was presented is better than others. All I can say is: it is pretty good, I liked it. It shows a lot of promise, and merits more investigation. Especially, since there are still some weaknesses to take care of. Meaning we could still do better.

After analyzing the limiting factors in the original version of this program, I opted to transform it by first eliminating most of its tracking and trading procedures, to then add what more satisfied my view of its payoff matrix. Observation gave what the trading strategy did, adding new procedures made it do more, and at a much larger scale.

Some of the principles used in that trading strategy can also apply to a lot of strategy designs. Just looking at how a trading strategy is intended to trade over time can compel us to redesign the thing differently.

It is not by doing the same thing as everybody else that you will get different results.

It is by looking at the trading problem differently. Stuff that others do not even consider or try for some reason or other.

Notwithstanding, there is a need for a background trading philosophy which is to be supported by a methodology that is anchored in math.

If you do not have a long term vision of where you are going, where do you think you will end up?

For those wondering why I am not publishing the code, the reason is very simple. Look at test 78 and tell me why I should give it away.

Regardless, with everything that was provided, anyone could rebuild something similar or better. The benefit: it would now be their own code. A design they would understand well enough to maybe give them the confidence needed to apply it.

Hope it helps someone.

Hi Guy,
I like it and I will be thinking a lot more about your ideas.

I think there is also a wonderful secret of success contained in your post of 15 July 2017 in which you say:
"I simply made it do more of what it was already doing".

Thanks for your inspiration to my own work.
Cheers, best wishes, Tony

Unfortunately no longer works, and due to the trading rules, a naive reversal of long/short signal doesn't get you +3 SR.