NO Price Data At All !

Hi Quantopians, especially Delaney,
[I come in peace and i hope you are friendly]

Although I’m still “new around here”, if you will allow me to be so bold, I would like to start my very first thread here on the Quantopian Forum.
It is in the form of a challenge to anyone who wants to play, and I hope it will be fun and educational for everyone. You can do your own scoring for this one. There is no prize, at least certainly not from me anyway. In this particular challenge you can even cheat if you want to, although you will only be cheating yourself, because the aim here is to help everyone win something very special in terms of your own knowledge.

I created this for two reasons. Firstly because Delaney at Quantopian has been strongly urging us to look at “alternative data” and use inputs other than just price. Secondly, I know there are some people who don’t believe in “fundamentals”. They put forward various reasons about why they think “fundamentals are BS”, but I think they are wrong, and I would like to demonstrate that to you now.

Please take a look at the results below, from an algo that I wrote. Those results look rather un-spectacular, don’t they? Yes, I think they are rather un-spectacular too ….. except for one small thing……. The algo used NO PRICE DATA AT ALL!!

The only inputs are from the Morningstar Fundamentals data freely available to all of us, and excluding any ratios that involve the stock prices in any way, for example PE ratio, Price to Book ratio, Earnings yield, etc.

To anyone who thinks that Buffett & Munger’s consistent performance over decades is just a “statistical anomaly” (i.e. lucky ...yeah, sure, just like the idea that, given enough monkeys with typewriters, one of the monkeys will surely write the entire works of Shakespeare), I would say OK, continue to believe whatever you want but, IMHO, fundamentals really DO work and I believe Buffett & Munger are excellent proof of that. So is the algo output shown here….. Unless of course the only reason that it works is because the entire period from September 2009 to September 2017 is just a big bull market, and everyone knows that absolutely ANY fool whatsoever can make tons of money very easily in a bull market, right? ;-))

So here is the challenge for you:
Design an algo to beat the results shown, over the 8 year period from 1st September 2009 to 1st September 2017 (as an equal basis of comparison for everyone), using ONLY Morningstar Fundamentals data, EXCLUDING the price-related ratios. All other constraints are exactly the same as per the real Quantopian Open Contest, including "competition transaction costs" etc, and especially leverage <= 1.

You can score yourself however you want to really, but my personal “scoring system” for this little exercise is as follows. (Please note: this is not intended to have any particular relationship to the way in which Quantopian might calculate the Quantopian Open Contest scores and, as far as I know, it doesn’t but it is still useful, at least to me).

Uncle Tony’s Score = 100*Sharpe* (Returns% / 8years) * (1 + 10*(alpha-abs (beta))) / (1 + Drawdown%)

On that basis, the example shown would have
Uncle Tony’s Score = 100*0.93* (24.35 / 8) * (1+10*(0.03 – abs (-0.04))) / (1 + abs (5.70)) = 54.2

Can you improve on that? If so, please share with us how did you do it?

Remember, NO price data or ratios that involve stock price data in any way. Do your own scoring. This is designed as a learning experience. Have fun!!!

After playing a few times, I hope you will be asking yourself why you or anyone else would ever even consider throwing away a whole lot of perfectly good alpha by NOT using available fundamentals data.

Delaney, just imagine what we could do if we actually added PRICE data as well!! ;-))

Cheers, best wishes, Tony M.

82
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5a00712816a8b244fa9d97fb
There was a runtime error.
25 responses

This is a great example of starting to move towards more real, fundamental, economic hypotheses by using fundamental data about a company to make your predictions. In general pricing data can certainly be helpful, but really should in many cases be thought of as the outcome variable for which we are trying to predict.

I'm very interested to see what other directions people can take this algorithm. Obviously it's also totally possible to overfit models that use fundamental data, but as long as you're coming to it with a valid line of reasoning for why the model should work, that's a big plus.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I wanted to encourage people to also look at these factors in a research setting, as you get a ton more visibility plus faster turnaround time on analysis. I cloned the factor analysis lecture as a baseline and swapped in Tony's factors. Looks like the first produces a lot of discrete values which currently doesn't work well with out alphalens library, but the second one works just fine and gives a whole readout. You can see that at least over a shorter predictive window of 1, 5, or 10 days, the factor is pretty inconsistent. This makes sense as fundamental factors look at real properties of a company that may take months to result in actual price actions.

Some next steps for curious folks:

1. Which factor is actually contributing to performance?
2. Run it on a longer time horizon, maybe look at 20, 40, 60 day windows.
3. Are the factors covarying a lot?
11
Loading notebook preview...
Notebook previews are currently unavailable.

the first produces a lot of discrete values which currently doesn't
work well with out alphalens library

It might be tedious to write but Alphalens support discrete values, just use the 'bins' option with custom intervals so that the discrete values fall in those bins. Here's an example Alphalens on the sentdex factor that has only discrete values.

Delaney, hi! & thanks.
I hope a lot of other people will pick up on this too.

The inputs I used were nothing special really, and there was definitely no attempt whatsoever at over-fitting or even any sort of fitting at all!
The basis of the algo is very simple, no secrets, and you can find it all in lots of places in the literature, in text books, and even on Wikipedia or Investopedia.

The "Altman Z score" part was published way back in 1968, so its not exactly new. It was designed by a Professor of Finance at New York University to predict the probability of a company going bankrupt within the next 2 years. If a company is financially sound then its Altman Z score should be high. If a company is financially shaky then its score will be low. Companies with negative Altman Z scores should generally make good candidates for shorting ....at least until they get de-listed and disappear! The Altman Z score was never meant as a short term indicator. I use it in my own personal, very small-time, long-only trading account to screen out stocks that have too much risk of bankruptcy for my liking.

The other component of this algo is the well-known Piotroski F-score, which you can also find just about anywhere (text books, Wikipedia etc). It was designed by Professor Joseph Piotroski (you can check him on Linked-In if you want), as a measure of a company's general financial strength, rather than specifically its risk of bankruptcy. The only minor problem with Piotroski's 9-point score is that each item gets scored +1 or 0 (actually I didn't even do that, I just took the sign +/-1), so it is a discrete, integer scale.

I made no attempt to optimize or even modify anything at all. I wanted to use both Piotroski-9point and Altman-Z because I think they complement each other nicely. I decided to add them together as factors, with some weight given to each. Piotroski is widely applicable, so I gave it a significantly higher weight than AltmanZ which is really only intended to estimate risk of bankruptcy, rather than likely financial performance. And that's all. Very easy really.

The weaknesses of the algo as presented here are the following:

a) It is obviously not a practical stand-alone system as it is, but that is quite deliberate. I wanted to completely separate out fundamentals from absolutely ANYTHING else at all that you might want to put into a "real" algo.

b) Piotroski 9-point is a discrete, integer scale. Obviously you need to convert it to a continuous scale which is more suited to our application, although there are some interesting & different ways you might do that and plenty of scope for experimentation. I didn't bother with that here, because I just wanted to show that, without any modification at all, there is lots of stuff out there that is very accessible and can so easily be used.

c) The output from the algo is quite "lumpy" in the time-domain sense. That's because the fundamental data only gets updated at the reporting periods of each company. You have to live with that. Don't try to interpolate in any way, because that would introduce a look-ahead bias.

d) I haven't tested it, but I expect that the algo will probably work better if you allow it to use a larger number of companies that I did. There is probably some optimum number that gives the best balance between rewards & transaction costs with too many tiny little trades. I don't know. Try it!

There is lots more fundamental data in Morningstar, just waiting to be experimented with in real algos, as compared to this little demo one.
I very deliberately didn't do anything more to improve it, because I wanted to put a challenge out there that I knew could be beaten!

Please, I invite anyone who is interested to try to do better than I have in this example. It shouldn't be too hard really, and basically it's "free alpha", out there and just waiting to be harvested!

Best wishes, Tony

Hi Luca,
I think being able to use discrete values is very useful in general, even if not actually essential in this algo. Thanks for your post.

To everyone,
If you have cloned the algo as I wrote it, you will see in line 96 my comment: # Yes, sure I know this is not the real Piotroski score!
I wrote this with reference to the rather long line 95 above it. In the real Piotroski score, he gives +1 for each item that passes the required criterion, and zero otherwise. What i have done is to just use sign(...) which gives +1 or -1 for each item rather than 1 or 0, so the number that comes out at the end will not be identical to Piotroski's actual numbers, but the ranking of the numbers is identical. If you want to get the actual Piotroski score numbers, then you need to insert something of the form MAX(0, my calculation value) in front of each of the 9 individual terms.

That difference from Piotroski's actual method was deliberate on my part, and it makes no difference to the functioning of the algo because the ranking for the individual stocks will be the same.

There are however some small typo / copy & paste errors in the out[:] = ........ part of the Piotroski9 custom factor on line 95 of the code as I wrote it.
In Piotroski's 9 point scheme, he subdivides the 9 terms into three parts, namely: Profitability (4 items), Leverage & Liquidity (3 items), and Operating Efficiency (2 items) and each of these items are correctly defined in the COMMENTS on lines 80-91. If you want to get correct Piotroski results, you will need to make some small corrections to code line 95 to ensure that it is actually consistent with comment lines 80-91.

Some suggestions:

a) Pull the Piotroski & Altman formulas apart and examine each term individually (9 terms in Piotroski and 5 in Altman) as possibly useful factors.

b) Some of these individual terms are well-known ratios from BalanceSheet, Income Statement/P&L or Cash Flow Statements (e.g. ROA = Net Income / Total Assets) but are already reported exactly as required in Morningstar (e.g. roa).

c) All the terms in the Altman formula are ratios of 2 different financial statement items and in all cases except Altman item D, the denominator is Total Assets so as to normalize each of the terms and make them dimensionless ratios before adding the together as Altman did.

d) For anyone who actually wants to treat this as a competitive exercise and really wants to use NO price data at all, you will have to leave out Altman's term B = Market Value of Equity / Total Liabilities, and also avoid using anything else from Morningstar that implicitly contains price, for example any type of yield, PE, PEG and so on, and also Market Cap and Enterprise Value. Obviously those are all useful things to look at too, but they do contain price data, just by the way they are defined.

I expect that if enough people at Quantopian play around with the Morningstar data, both on a stand-alone item-by-item basis, and also using some sensible combinations of the different items, we will probably come up with some good innovative alphas. Although the academic literature is full of studies by people looking over and over at the same old well-known factors (like Price-to-Book Value, etc) there is a lot of scope for innovative thinking and new ideas, as long as they are based on an understanding of the meanings & relationships between financial statement items. We want to make sure we don't come up with apparent but nonsensical correlations. I'm not sure if Delaney already told the story, but there was one infamous study in which the researchers which found that, at least over their original test period, out of a very large number of different possible factors, the factor with the highest correlation to S&P returns was the price of butter in Bangladesh! [hmmmm, how interesting! .... now why would that be ?] ;-))

One thing to keep in mind is that for allocations, Quantopian is looking for strategies that trade frequently enough to develop a good statistical confidence. Purely fundamental based strategies tend to have long predictive horizons on the factors. As such any trading more frequently than every 1-3 months is just paying unnecessary costs. Their infrequent nature makes them hard to evaluate as you'd have to wait years to develop enough sample points. Instead good approaches involve using fundamental data alongside other sources. You can sort and bucket by fundamental values, or use it as part of a larger overall model. One example might be finding that only certain types of companies were affected by sentiment, and using fundamental data to select for those. Then using the actual sentiment and pricing data to decide if they're currently under or overpriced.

Hi Delaney -

I haven't had much time lately to dink around on Quantopian, but in the back of my mind, I'm wondering how to approach the kind of multi-dimensional problem you describe:

good approaches involve using fundamental data alongside other sources

We have data coming out the wazoo:

https://www.quantopian.com/help#overview-datasources
https://www.quantopian.com/help/fundamentals
https://www.quantopian.com/data
Q500/Q1500 universes
Fetcher
Time of day/week/month
Etc.

The number of dimensions is huge. It seems like a problem for a computer, versus an individual formulating hypotheses and testing them one-by-one (using the research platform and Alphalens, for example). The problem needs to be reduced down to salient dimensions to be tractable. For example, one could consider attempting to see if fundamentals could be used to improve one or more of the 101 Alphas (https://www.quantopian.com/posts/alpha-compiler), across the Q1500US. Any idea how to do this on the research platform? We have the fundamentals, the alphas, and the universe, so what next?

Hi Grant, you are of course correct that, with lots of data, the number of dimensions is large and so part of the problem is simply dimensionality reduction. There are however two more-or-less diametrically opposed schools of thought about how to attack the problem. These are generally called the "Data First" approach and the "Ideas First" approach. Each has some advantages and some disadvantages. The former approach, Data First, is basically the Data-Mining type approach, which it appears that you are implicitly advocating as you write: "The number of dimensions is huge. It seems like a problem for a computer ......" The advantage of this approach is that it can sometimes uncover subtle relationships that are hard to spot manually. The disadvantage is that it can also uncover apparent relationships that are actually not there at all (for example the famous "S&P500 vs Butter Production in Bangladesh" phenomenon), or alternatively relationships that are real but which have no underlying or enduring basis and so get arbitraged away & disappear very quickly as people find them. The other approach, "Ideas First", starts with examining ideas & concepts that actually make sense from some deeper perspective and which are therefore far more likely to endure and produce robust trading systems / algos.

References, for your amusement:

"Butter in Bangladesh Predicts the Stock Market" https://www.fool.com/investing/general/2007/09/20/butter-in-bangladesh-predicts-the-stock-market.aspx

"The Bangladeshi butter-production theory of asset prices" http://business.time.com/2009/04/16/the-bangladeshi-butter-production-theory-of-asset-prices/

More BS for "Butter in Bangladesh" Fans
https://www.forbes.com/sites/davidleinweber/2012/12/31/more-bs-for-butter-in-bangladesh-fans/#6944115f451f

Nerds on Wall Street / Stupid Data Miner Tricks
http://nerdsonwallstreet.typepad.com/my_weblog/2007/04/stupid_data_min.html

Now, coming back to reality, in particular all of the fundamental data available in Morningstar are from one of three standard financial statements: the Balance Sheet, the Income Statement (P&L) and the CashFlow Statement, as well as some other miscellaneous bits of data like company address, etc. All the data from the 3 financial statements fit together in a coherent way (or at least they should, and if they don't they don't then maybe that means the company is "cooking its books", and that can be valuable info to uncover too). The point is that, at least as far as the "fundamentals" data are concerned, it makes sense to think carefully about what each of the numbers actually MEANS and how they are derived, and how they relate to a company's operations. They aren't just "signals" in some abstract sense. So I would suggest that a good starting point for "dimensionality reduction" with regard to Morningstar-type fundamentals data is to develop a good understanding of the actual meanings of the numbers and of the fundamentals of corporate accounting and financial statements, rather than to just number crunch & see what comes out.

@Tony Morland when I attempt to run your sample algorithm, I receive an error on line 108 claiming that Fundamentals has no attribute "net_income". Yet, for some reason, the algorithm does execute occasionally after multiple attempts. Do you know why this may be happening or is there something wrong on Q's side?

Hi Mustafa .... yes, I see the same strange error msg as you do. It wasn't there a few days ago, so I think it must be a problem on the Q side. I will follow up with Ernesto at Q help/support. Hope it get's fixed soon. Sorry for any inconvenience. Best regards, Tony

@Tony Morland no it’s cool, just want to make sure it gets fixed on Q’s side. Thank you for the Altman z score as well, I was wondering how I could translate it into python. Your code really helped a good amount!

@Tony, Sorry about that error. This weekend, we shipped a change that disambiguates some field names which we learned are being used to represent multiple data points by our fundamental data provider. The net_income is one such field. In this case, we get a net_income from two different reports: income_statment and cash_flow_statement. Sometimes, the data points differ depending on which report they come from. The two versions of net_income can be referenced with net_income_income_statement and net_income_cash_flow_statement, respectively.

I apologize for the confusion.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks Jamie :-)

Mustafa & others: Algo in now revised & re-posted

Also, FYI, here's some info about the more modern successor to the Altman Z-score
https://alphaarchitect.com/2011/07/23/stop-using-altman-z-score/

@ Tony -

Regarding the "data first" versus "ideas first" approaches, it may be more synergistic. For example, my understanding is that for chess, the best approach is to pair powerful computers with expert players. I've yet to understand, at a basic level, what the 160,000 Quantopian mostly non-expert users will do (and my understanding is that Quantopian aspires to 1 M users). I'm one of the non-experts, and so some dimensionality reduction would help. Circling back to Delaney's suggestion:

One example might be finding that only certain types of companies were affected by sentiment, and using fundamental data to select for those. Then using the actual sentiment and pricing data to decide if they're currently under or overpriced.

So, where should one start? It seems the right algorithm could provide some clues, and then specific cause-effect hypotheses could be formulated and tested.

@Grant
Hi. You raise an number of interesting issues, ALL of which I think are worth picking up on, and some of which may just lead a long way..........

• Chess:
Firstly I love metaphors & analogies in general. They are great ways to look at things as "picture" or "story" or "... well, it's sort of like, but not quite, so what if we just tried .....", and thereby gain additional fresh insights into problems from unexpected angles. Personally, I think Chess is a GREAT metaphor for trading the markets, especially if you play against a chess computer, as I do. I almost never win, not only because I'm not a very good player, but also because every time I do (occasionally) win, I raise the level one more notch. The computer never gets tired, never misses anything, never makes "silly"mistakes in the ways that I do. I have learned a lot about trading while playing chess like this. I'm not sure I could actually verbalize some of those learnings, but that doesn't make them less real and I know my trading improves the more I play.

• Data First vs. Ideas First:
There's plenty of room for both and yes, they can reinforce each other. My only concern with "data first" is the risk of loss of robustness. I know some people say: "as long as it works, that's enough", but I don't quite buy it. Maybe it's not essential to understand why something works, but I think it helps, especially when things start to go wrong. Its also one of the problems with Neural Networks, some kinds of ML, "black boxes" in general, and abstracting things too far from the context in which they belong.

• Context:
I think it's always important to consider in trading, and I made some comments on that in a post about the need for long-term data. I think it also ties in with what Delaney has said about sentiment and trying to figure out when it works and when it doesn't (just like some TA "indicators?)

• Quantopian users:
" .... I've yet to understand, at a basic level, what the 160,000 Quantopian mostly non-expert users will do "
No, I have no idea what they will do either ...but we will see.
Although you write: I'm one of the non-experts, actually I think that you are probably just being modest. I'm sure you didn't write all those 623 algos without some great programming skills, and you are right about how "some dimensionality reduction would help". The question is how to go about it. My suggestion is that at least some level of relevant background knowledge always helps. So anyone who knows nothing whatsoever about accounting or corporate financial statements , or what all those names in Morningstar mean, or how the underlying companies can "cheat" & "massage" some (but not all) of those numbers to try to make things look better than they really are, well without at least a little bit of knowledge then you are at a disadvantage. I'm not necessarily suggesting reading lots & lots of boring accounting books, but at least some background knowledge will help a lot. I will post a short "reading list" for anyone interested.

On the topic of sentiment, I tried playing around with it a bit and so far very disappointing. Please see my post entitled: "Alternative Data: The Good, The Bad and the Useless". Maybe you can help set me on the right track .... perhaps I'm just not on it because of some silly mistake on my part.

• Cause & Effect Hypotheses:
With regard to (Morningstar) Fundamental data, there's no shortage of such things. Would a "reading list" help?

Cheers, best wishes.

@Tony Morland I would greatly appreciate a reading list; the accounting course I’m in currently drives me mad.

@Grant, @Delaney, please see posts on "Alternative data: Good / Bad / Useless". Now I like where this is going .......
........ finger pointing at the moon (for @Karl ). :-)

@Karl, could you expand a little? i don't follow what you would like to do

With the release of the new risk model, we can see what's producing these returns much better. I made a notebook that demonstrates that this algorithm has high exposure to momentum, size, value, and volatility. Not surprising for a fundamental value based strategy, but still interesting imho.

3
Loading notebook preview...
Notebook previews are currently unavailable.

Many thanks @Delaney. I look forward to playing around with it and seeking further improvement .... . still without using any price data ;-)

@Mustafa, & others
I will put together a good reading list for you, but my suggested starting point is:
a) "The Little Book of Valuation" by Aswath Damodaran (Wiley, 2011). Small book but big on content. Inexpensive. Very easy reading.
b) Any other books by Damodaran e.g. "Damodaran on Valuation", "The Dark Side of Valuation", etc.
c) His website -- lots of excellent free material there. http://pages.stern.nyu.edu/~adamodar/
.

@Karl,
Re Damodaran: You're welcome :-)

Re ERP: I have some other good info on this which I will share, but i just need to go and get it, so maybe tomorrow.

Re your dialog with @Luca: im not sure exactly where you are going with this, but i start to see how you might take it a little further, construct a "synthetic" or "shadow" portfolio of stocks .... not the ones you are actually holding, but the ones you might potentially be holding, see how they perform, and then on the next day actually take the best-of-the best, both long & short (assuming you are working a balanced equity LongShort strategy) from the synthetic/shadow portfolio and put them into or use them to adjust the real portfolio, which then becomes a sort-of "best possible portfolio" with minimum (i.e. 1 day) lag. I don't have the python skill to code it, but i think it's a neat idea.

@Karl, i like it ;-) cheers!

@Tony thanks for the list; gonna get on it asap

@Grant, i appreciate your input to this discussion, thanks.
@Delaney, FYI.

I think part of beginning the solution to the: "...where to even start?" problem with applying Fundamentals data is to have some appropriate visualization tools.

Although Q is developing good tools for Risk evaluation, there is not as yet much apparent interest in tools for examining the INPUT data & inter-relationships of Fundamentals data in detail. I think this would be very worthwhile, and i have made what i think is a useful start. Please see the thread entitled: "Fundamentals - python ..... help .....".

The problem is that i am hampered by my very limited python beginner-level skills, so what I have done is clunky, clumsy and not very flexible or user-friendly. I hope that someone else can pick up on this. Would you or any of your python-savvy friends & colleagues be interested in taking this on further than i can? Cheers, TonyM.