Back to Community
Using Alternative Data: Researching and Implementing a Market Neutral Strategy

This notebook is a new version of an earlier post - How to Get an Allocation: Writing an Algorithm for the Quantopian Investment Management Team. The original version used the Sentdex Sentiment Analysis dataset. This version uses the AlphaVertex PreCog 500 dataset for trading signal and uses the Optimize API to place orders.


The following notebook and backtest walk through the research and implementation of a market neutral strategy that trades a large, dynamically selected universe of stocks. Of course, not all market-neutral strategies will get an allocation. This notebook demonstrates the research and development of an algorithm meeting many of the criteria used by our investment management team during the allocation process.

The project uses the following tools:

  1. Blaze, to study and interact with partner data.
  2. Pipeline, to dynamically select stocks within the Q1500US universe to trade each day.
  3. Alphalens, to analyze the predictive ability of an alpha factor.
  4. Optimize, to move the portfolio from one state to another.
  5. The Interactive Development Environment (IDE), to backtest the strategy.
  6. Pyfolio to understand the performance and risk analysis of the backtest.

This specific example also uses the Alpha Vertex PreCog 500, EventVestor Earnings Calendar, and EventVestor Mergers & Acquisitions datasets.

The end result of this project was a long-short strategy with the following attributes preferred by the Quantopian Investment Management Team:
- Large, dynamic universe.
- Equal long/short exposure.
- Only trades stocks in the Q1500US.
- Low position concentration.
- Beta is 0.
- Low volatility.
- High amount of daily turnover.

Clone the notebook, investigate other datasets, and make your own market neutral strategy with the above criteria.

Note: This pipelines in this notebook were run using premium data for some of the EventVestor datasets. You can run the backtest up to 2 years ago with the free data.

Loading notebook preview...
Notebook previews are currently unavailable.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

18 responses

And here is the corresponding algorithm + backtest that uses the Optimize API. Note that the algo uses the MaximizeAlpha objective instead of TargetWeightsbecause we might want our target weights to change depending on how they fit in with the constraints.

The algorithm uses the default slippage model as well as a commission model that charges $0.001/share. This is approximately what a large institutional hedge fund would pay.

Note: This algorithm was run using premium data for some of the EventVestor datasets. You can run the backtest up to 2 years ago with the free data.

Clone Algorithm
206
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5947e4e609c7d969f9c2a62a
There was a runtime error.

Here is a version that can be run using the free versions of the EventVestor Data.

Clone Algorithm
206
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5947e5362322a46e46841c0f
There was a runtime error.

Very cool. A few questions

1) I see you're looking at your measure of prediction quality above a certain threshold. How are you sure this is not overfitting this threshold to the data to make the results to look better? In general, would it not be better for PreCog to give us a confidence interval on their prediction, rather than try to establish this ourselves?

2) Why do you zscore, rank then quantile? I think you can just apply quantile straight away?

3) You are both dollar neutral and market (beta) neutral. Why do both? It seems constraining to have two competing definitions of risk.

4) Why do you filter out stocks near earnings announcement and acquisition targets? Doesn't PreCog bake this in to their ML predictions?

Perhaps the answer to some of these is that this is an education piece, so you have the kitchen sink in there to demonstrate what Pipeline can do.

Not sure if Jamie mentions it here, but per his comment here and detailed explanation here, the Precog data (derived signal, really) are out-of-sample only after March 6, 2017. My sense is that we need a solid 6 months minimum, per the standard Q practice for the contest and Q fund, to know if there is any validity to the Precog data (but maybe I'm being too conservative). That said, it doesn't hurt to start using the data in backtests and paper trading.

Hi Jamie,

I see that the Optimize API is still marked as experimental here. So, can we use the API for the contest algorithms and for consideration for allocation or not?

Thanks,
Ishwar.

Hi Jamie,

I have an algo architectural question: I'm wondering if your alpha factor here can be written in a standard way so that it can be combined with other factors. Can your factor be expressed so that it could be plunked into a multi-factor algo (see this one, for example)? I'm trying to follow the guidance from y'all regarding incorporating multiple factors, and there are some good examples:

https://www.quantopian.com/posts/machine-learning-on-quantopian-part-3-building-an-algorithm
https://www.quantopian.com/posts/quantcon-nyc-2017-advanced-workshop
https://www.quantopian.com/lectures#Example:-Long-Short-Equity-Algorithm

So, I'm thinking that as you put out examples of single factors, such as you share above, you should express them so that they can be combined per the multi-factor algos you are also providing as examples (in fact, if you write the canonical multi-factor algo template as universal, you can just use it every time you provide a single factor example).

The other thing, which you may realize, is that we have no way of knowing what data sets are dominant in the Alpha Vertex signal feed. If their signal is predominantly derived from price/volume and fundamental data, then if I understand the guidance here, you would not want it. Have you discussed this with them? Could they provide a signal that does not use price/volume and fundamental data (at least directly)?

@Dan:

1) In general, I didn't spend much time thinking about the strategy itself. Instead, my focus was on showing off the tools that you can use to build and test an idea. The strategy is just a modified version of the strategy that AV put together on this thread. I lowered the threshold from 65% to 50% to increase the number of stocks that were being considered for trade. I think your concern about overfitting is a good one, and you should feel free to reach out to them. They mentioned that you can reach them at [email protected] on the other thread.

2) D'oh! When I started working on the new version of this notebook, I had included both Sentdex and Alpha Vertex datasets. I was planning to try combining them, so I took the zscore of the factors that I created for both. I decided I didn't want to go down the alpha combination step in this post so I removed the Sentdex dataset. I had planned to do some sort of mean rank which is why I had taken the rank (which actually makes the prior call to zscore redundant). In the context of a single factor, you're right that I only needed to call .quantile().

3) The short answer here is that sometimes dollar neutrality and market neutrality are the same, but not always. Both are considered sources of model risk, and we have a budget for both separately. Looking over the allocations page, we don't do a good enough job distinguishing between the two. I've notified our research team that we should improve the language to clear things up.

4) Your note about the education piece is right on the money. I'm simply trying to show off some of the datasets and tools to give a sense of what a workflow might look like. There certainly might be reason to not exclude stocks around earnings announcements. That being said, you might want to see how the precog data performs when predicting price movements around these events just to be sure!

@Ishwar: The Optimize API is still technically still in experimental, but we will be finalizing the API and removing the experimental label soon. Algos using Optimize are certainly eligible for allocations (that's why it's included in this post), and you can use it in the contest. I checked with our engineers, and we're expecting to make some changes, and deprecate some of the current names, so you may have to make a few minor changes when Optimize makes it out of experimental (or tolerate deprecation warnings). However, the underlying behavior is expected to stay the same.

@Grant: See my answer to Burrito Dan above. The zscore and ranking functions are good ways to normalize - though using them together was a mistake on my part as it is redundant to call zscore before rank. In general, you can pick a normalization technique that you prefer and then try to combine as you wish.

Your other question was already answered by Dan Dunn. We are interested in any algorithm that meets the allocation criteria.

Hi Jamie, Thanks for the clarification.

If we have to make a few changes when Optimize is finalized and then re-deploy the algorithm, then the out-of-sample history for it until then will be lost, right? Or is it acceptable to keep the algorithm running with deprecation warnings in the contest without making these adjustments later?

@Ishwar great question. You do not have to redeploy your algorithm in the contest to use the optimization API. Instead, you should use this API going forward, as you're designing and creating new strategies.

In general, if an algorithm is stopped (because it was disqualified in the contest or something else) it is still eligible for consideration for an allocation. The 6 month clock is still running in the background. If we discover a strategy for an allocation and it's not on the optimization API, we will work with the author to make the update. To get a head start, go ahead and use the API (and the QUS universe) in your algos.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Jamie McCorriston

please could you provide me with another example of this please

And here is the corresponding algorithm + backtest that uses the Optimize API. Note that the algo uses the MaximizeAlpha objective instead of TargetWeightsbecause we might want our target weights to change depending on how they fit in with the constraints.
The algorithm uses the default slippage model as well as a commission model that charges $0.001/share. This is approximately what a large institutional hedge fund would pay.
Note: This algorithm was run using premium data for some of the EventVestor datasets. You can run the backtest up to 2 years ago with the free data.

An interesting strategy. I took Jamie's program above and pushed it to a 4.00x leverage with the following results:

I left the commission settings at 0.001. I think it is low, but Jamie says it's ok. Made some minor adjustments to some of the parameters that I consider pressure points in order to contain the swarm of variance since this strategy does trade a lot. I estimated that about $1M would be paid in leveraging fees. This would reduce the realized gross from 53.05% CAGR to about 45.11% on the premise that leveraging fees would be at 3.00%.

Since this kind of strategy is intended to possibly be leveraged up to 6.00x, This test does tend to corroborate this distinct possibility, at least up to a leverage factor of 4.00x. The result stayed within the 0.10 volatility range with low drawdown (-3.7%) and with a -0.08 beta.

I was not able to past this backtest to Pyfolio or Alphalens. So, no tear sheets. The program stops before reaching the end. Probably not enough memory to do the job. It does over 2M transactions over the period.

My question is: where can I find all I can do with the optimizer? What are the inputs and outputs of the box?

Hi Guy,

The help documentation has a lot of information on the different objectives and constraints in optimize.

As for the algorithm, I tried to draw attention to some of the properties like the high position count, high turnover, market neutrality, dollar neutrality, and sector neutrality. These are features that we are looking for when evaluating algorithms for evaluation. For the strategy itself, as Grant mentioned, you should consider waiting for out of sample numbers to make sure the strategy is not overfit since the input is a machine learning signal.

Hi Jamie, thanks for responding. I have the same considerations. I have read the help documentation. Still, I have the impression of looking at a black box with some doubts on the data series themselves. If they were over-fitted in some way, I would not even know!

I found the strategy interesting since it could support leveraging even in a zero-beta, low volatility environment. Always looked at those strategies as producing not enough. And yet, if the data is correct, one could push and extract some interesting CAGR levels with relatively low risk. Any strategy generating over 2 million transactions can be considered to adhere to the law of large numbers.

My next step would be to test for a much longer trading intervals, add more constraints, make the strategy trade a lot less, and as you said, wait for some out of sample data.

For now, it forces me to consider such strategies as deserving more of my attention.

In case you missed it, Scott Sanderson's post that originally announced the Optimize API might be of interest to you. The explanation there might be what you're looking for.

The following chart resembles very much the one presented in an earlier post. It is part of my after test forensic trade analysis.

I would like to draw your attention to the bottom panel, the trace of the number of long positions taken. If we exclude the spikes which should not be there, even if there are reasons for them, we can still notice the cyclic nature of the inventory.

It should not be like that! The market is too erratic to show such generally well-behaved cycles. There is time correlated data here.

It makes me doubt the integrity of the data set used. Is it rigged, over-fitted or somewhat doctored in some way? Or is the optimizer showing some yet unnoticed problems?

This alpha search changed to: what are the economic or strategic reasons for this behavior?

I do not find that chart appealing. I do not like what it is saying. Either way, there is a need for a reasonable explanation. Otherwise... I will have to conclude on the negative side.

If you extract some alpha, which technically was not there in the past, on what basis would you expect that it will be there in the future?

Is the cyclicality related to quarterly earnings announcements? They are not uniformly spread as I recall.

Dan, that's a very likely cause. The number of earnings announcements per day definitely oscillates and peaks once per quarter.

@Dan, @Jamie, yes, it is what I suspected at first.

A projection related to earning results. And, possibly, an optimizer of sort at play which could be a bad sign since implying dataset manipulation or over-fitting. Both almost invalidating the simulation.

However, just based on the cyclicality of these projections, it might show that they were not that good after all, since after their seasonal peak, the number of positions declined.

For the number of positions to do down, they first had to fail making the Q1500US positive momentum list, not to mention the PreCog list. Yet, the equity line kept going up that you increased or decreased the number of positions.

What I now consider is that the projection made were more highly correlated to the market in general. You had more higher price projections simply because you had more stocks going up. Until more and more of your projections failed.

Nonetheless, I think the strategy is trading on market noise. The ProCog list did not show that it was that much different from what the market had to offer. And as such, surprisingly, it made it acceptable.