How to Be a Successful Quant

Here at Quantopian, your friendly neighborhood crowd-sourced asset manager, we have evaluated millions upon millions of strategies. We have also engaged hundreds of thousands of community members, like you, and carefully studied how they do their work. Recently, some of our quants got funded with up to \$50 million. How did they do it?

Recently, we’ve been talking about the connection between these two sets of observations. What work habits help community members create successful strategies? Here are three habits of successful quant authors that you can adopt in your own work:

• Learn by Doing
• Focus on Alpha by Building Models
• Let the Contest be your Guide

## Learning by Doing

Building a pure-alpha strategy is one of the most difficult intellectual challenges in existence. It can be daunting to think about going from a cold start to evaluating a fully formed strategy. Luckily, Quantopian’s platform provides a workflow that takes you through every phase: from working with new data, all the way through evaluating a complete strategy.

Learning our platform will help you break down the creation of a strategy into manageable steps. Depending on how you prefer to learn, there are several ways for you to get up to speed on the platform. For those who prefer to dive into a hands on experience, you can start with Getting Started, which will walk you through the steps. Folks who like to have a more formal introduction to the financial and statistical concepts could start with our lecture series. If you like to imagine the big picture before doing either of those learning activities, check out our short video or blog post on the Quant Workflow to get that aerial view of all the steps. If you want an even deeper dive, check out this longer webinar on going from an idea to a full strategy.

## Focus on Alpha by Building Models

The hardest and most valuable work is identifying alpha: an exploitable market inefficiency. Inefficiencies are hard to find and also difficult to verify. Our philosophy at Quantopian is to do our best to cover all the non-alpha production work, so our community can focus on the hardest (but also funnest) part of the problem.

One of the standout habits of quants who successfully find alpha is research work. Nearly all of their working time is spent in our research environment. The research environment is based on Jupyter Notebooks, which provide an interactive coding environment. That means you can type a little code, run it, and see the output. That is a short creative loop, which will tend to help you focus. That short loop environment is phenomenal for creating the type of strategy we work with here: cross-sectional equity strategies. These strategies find small market inefficiencies in large numbers of stocks. The general idea is to come up with a hypothesis, use that to define a model, and then test your hypothesis by evaluating the model. The hypothesis might be:

Companies recently experiencing extremely high price/equity ratios are likely to revert to the sector mean price/equity ratio within 7 days.

From this you’d construct a model which took historical and current price/equity ratios into account, and made a forecast about the 7 day return of each stock. This forecast, or score, per stock is commonly referred to as an alpha factor. The alpha factor describes the market inefficiency you’re exploring in two steps:

• First, by scoring every stock in your universe in a consistent way. This scoring is the alpha factor definition.
• Second, by ranking every stock in your universe using that score. Ideally, you’ll find a linear relationship between your score and the returns of all the stocks in your universe (higher score, higher returns / lower score, lower returns). The strength of the relationship is the expected efficacy of your alpha factor.

Quantopian provides two specialized programming libraries (python modules) that help you with each step. Pipeline is application programming interface (API) for defining alpha factors based on your model. You can learn about it in the Getting Started Tutorial, or you can try the Pipeline Tutorial, or you can read the Pipeline programming documentation.

Once you have defined your alpha factor in Pipeline, you’ll want to evaluate it. That’s where the Alphalens library comes into the picture. Alphalens combines all the alpha factor evaluation studies that we have developed at Quantopian into a single package. Alphalens is also a free-standing open source library (created and maintained by the Quantopian team). Among professional quants in the industry, Alphalens is our most popular open source library.

You should expect to spend around 90% of your effort in the development of your models and alpha factors. Alpha factors are the simplest and most idealized version of your investment idea, and testing them just comes down to testing a hypothesis as you would in any scientific field. Everything else you layer on top of your alpha factor (with the exception of mixing factors together) to create a full blown strategy (construct the portfolio, trade into positions, limit risk and exposure) will tend to decrease the efficacy of your alpha factors. So, the quickest way to rule something out is to explore the alpha factor. If it doesn’t work in isolation, you won’t fix it by adding the headwinds that come with building an algorithm around it.

You should be absolutely paranoid about overfitting. We have determined many times over that overfitting is the single biggest mistake made by quants on our platform. At a minimum, limit the date range of the data you test with. Quantopian has >10 years of history available for you to do research and testing. Limit your in-sample portion (the section of data you use while developing your model) to about half of the available history. Save the rest of the data to validate your model when you think you have something working, and to validate the full strategy.

## Let the Contest be your Guide

Speaking of creating a full strategy, the best authors use the contest criteria and the closely aligned full backtest screen to build from alpha factor to full blown strategy. When you run a full backtest on the platform, the user interface will highlight the criteria you have yet to meet. There’s even links to the relevant tutorials/documentation/examples to help you address any of the structural issues. Once you have done the hard creative work of devising a model and alpha factor (or several), we’ve made addressing the portfolio construction comparatively straight forward. Here is a template into which you can copy-paste your alpha factor(s) before backtesting them and submitting them to the contest.

The contest is framed as an optimization problem; there is a combination of constraints and a utility function you need to optimize. The constraints are our entry requirements, all of which can be checked in-sample with a backtest and our full backtest output. The contest scoring is the utility function you need to optimize. The scoring function rewards consistent performance over live data.

We based both the entry requirements and the scoring function on everything we learned from the first year and a half managing our fund. If you can find an alpha factor with predictive power and avoid overfitting, you’ll do well in our contest, and give your strategy the best chance to get funded.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

12 responses

How to be a Successful Quant using fundamentals data?
Actually it's not possible, because there are quite many problems when using such data, for example take a look at those recent posts:
https://www.quantopian.com/posts/fundamentals-dot-total-liabilities-dot-latest-always-nan-before-2010
https://www.quantopian.com/posts/potential-bug-in-the-fundamentals-and-morningstar-apis-receiving-nan
https://www.quantopian.com/posts/fundamentals-updating-daily-vs-monthly-or-quarterly

@Q-Team, please give us a feedback

Hi Fawce -

I'd definitely recommend that folks have a look at the blog post, since it provides the overall architecture. By the way, can you get Jonathan Larkin back? He was making nice contributions.

One thing that has been murky to me is the best approach to defining an alpha factor. As a forecast, it suggests that it should be formulated to predict relative returns directly (stock X will go up by 1% in 5 days). But then as a score, perhaps we are talking about something that, when put through some unknown function, will produce a forecast (e.g. alpha_forecast = F(alpha), where alpha and alpha_forecast are vectors). I gather we aren't actually trying to get alpha_forecast by finding F, but rather convert to alpha_rank, where we rank the alpha factor, with serial integers, the lowest to highest. Then, we'd like to have alpha_forecast = a*alpha_rank, where a is a constant of proportionality. In other words, ranking should turn the unknown function, F, into a known linear one, and if it doesn't, then the alpha factor is flawed, and needs to be revised to make it a monotonic predictor of returns across the entire QTU, for the holding period to be used for trading. The alpha factor could be multiple factors combined, to achieve this aim.

Assuming one can find an alpha factor that works, then alpha_rank is put straightaway into the Optimize API objective function:

class MaximizeAlpha(alphas)


Objective that maximizes weights.dot(alphas) for an alpha vector.
Ideally, alphas should contain coefficients such that alphas[asset] is proportional to the expected return of asset for the time horizon over which the target portfolio will be held.
In the special case that alphas is an estimate of expected returns for each asset, this objective simply maximizes the expected return of the total portfolio.

My sense is that doing this with a single factor may not work so well, since it implies finding a broad inefficiency across the entire QTU. To get to 1000 stocks, one needs 5-10 decent factors, perhaps?

I'm wondering if this linearization by ranking, although convenient for analysis might be throwing away some information? Certainly, the actual relationship between the alpha factor and the forecast is tossed out the window, since we are just looking at monotonicity. Maybe there are some standard industry references on various approaches to formulating alpha factors, "massaging" them into shape, and then doing optimization on them? It is not obvious that the Q recipe is the only and best one.

@Constantino,

I have commented on this here I believe this is a serious problem that poses some risks for Q. I recommended that they do some integrity data tests., filtering and standardization of Morningstar fundamental data much like they did with QTU as described here In this way, we as authors will have more confidence in using fundamental data given they were properly processed and screened by Q.

@James
completely agree! Quantopian should give this issue very high priority, "data hygiene" is essential: if the starting data are wrong, everything else is also wrong (garbage in -> garbage out)

I hope, that the Q-Team would provide us any feedback soon (there was also the alternative solution to use the FactSect as datasource for fundamentals).

@Constantino,

Even if FactSet is a cleaner provider, they also do not have control over corporate filings, its timing and frequency, let along accuracy. The solution I believe is for Q to do what they did in the standardization of QTU. For their own sanity check and confidence, they should do data integrity checks, weed out stocks with insignificant or insufficient data, standardize or indicate frequencies and timing, etc. under the QTU universe. This way ,when we the authors run and score fundamental factors across QTU, we can be confident that these data have been processed and screened by Q to pass their significance and consistency thresholds.

The first step in my mind would be to have a Quantopian public-facing bug/error data base, so that one could search it for data and software problems. I've suggested this in the past, but it was never implemented.

It'll be interesting once Q goes to the Quantopian Enterprise model to see how it integrates with the Quantopian Community. I'd think that part of the package would be to let Enterprise users know when there are bugs/errors, so they don't waste precious time. Presumably, since there will be parity, the information will be shared with the Community.

Hi, great article i enjoyed every bit of it. Do i need to be very good in python to be able to succeed on this platform. if so what advice do you have for me.

Hi Frank -

Do i need to be very good in python to be able to succeed on this platform?

I'd say you just need to be a Python hack. You don't need to be a whiz. There are some wizards who work for Quantopian and some Quantopian users who will sometimes help on the forum, but you need to get to the point where you can pose succinct questions beyond the newbie level.

if so what advice do you have for me.

• Python basics. Any intro book/material will suffice. I read the free Think Python, and it'll get you started.
• NumPy - If you are familiar with MATLAB, then see https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html. Otherwise, you'll need to find some introductory material, and become accustomed to doing array programming (which is not just for elegance, but usually will speed things up dramatically when processing large data sets).
• Pandas - Assuming you've gotten a sense for Python, and know a little about NumPy, I'd just get the book Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney (Amazon link: http://a.co/68yHcKW). If you skim through the entire book, and at the same time, start playing around on Quantpian, you'll be on your way.
• Explore Python libraries - SciPy, CVXPY, scikit-learn, etc. Just don't get too excited about fancy machine learning, since Quantopian is behind the times in this area, although some stuff can be done (Fawce...what's the plan? I'm thinking the Quantopian Enterprise users will want a lot more modules and computational horsepower).
• Use Google and Stack Overflow. Python/NumPy/Pandas are widely used. By poking around, you'll find that many of your questions have been answered, with examples. By the way, using the built-in Quantopian forum search is pretty hopeless, in my opinion. Just use Google to search the site.
• Post your entire code to the Quantopian forum, when asking for help (by sharing your notebook/algo). It is unlikely, just starting out, that you will have found any secret-sauce alpha, and the Quantopian community has always been accepting of all levels of programming sophistication.

Regarding Google, these are a few things that help improve results:
1. Use site:quantopian.com in the search field, that limits results just to Quantopian pages.
2. Use quotes around strings that have more than one word for "all of these words in this order". Say, you want to see examples for creating your own pandas dataframe: https://www.google.com/search?q="pd.dataframe"+site:quantopian.com (Google takes the period as if it were a space so the quotes are useful there, it treats all non-alphabet characters as if they are spaces). Or, if you want to make certain that each page contains a backtest, https://www.google.com/search?q="def+initialize"+"pd.dataframe"+site:quantopian.com because all backtests contain def initialize.
3. A dash (minus sign) in front of a search term says it must NOT appear on the page, and sometimes helps a lot for narrowing things down.

It's a vast world here yet some basics go a long way. A few other things that might be helpful ...
In learning pandas, the debugger was incredibly useful, worth the time to become comfortable with it: https://www.google.com/search?q=debugger+site:quantopian.com. Don't think of it for bugs really, it is a learning tool and increases the speed of progress.

Then, if I do say so myself, these tools are geared for increasing efficiency, visibility into what is happening, avoiding pitfalls, putting you in the driver's seat and I'd recommend some of my coding style you can see there. For example, use of 'c' in place of context for better readability etc. When I first saw that from a Q rep I rebelled in my head yet tried it and would never go back. 1/7th the amount of typing. Same using s for stock (or security). Also, use extra spaces in Python for vertical alignment, easier readability and faster editing, it'll give you an edge in the race. Aside from those, dig into cloning some algorithms, make a few changes and compare what happens using just a short date range at first for quick run.

There is certainly a lot of information to process...It looks like it will take a lot of time to become a successful quant no matter how you do things.

As a relative beginner to python and quantitative finance, I had misconceptions about how 'fast' I would be able to get a good algorithm going. My expectations have gone from a 3 month period over the summer since I've joined quantopian to being longer than a 1yr period especially because I am not doing this full time but rather on the side. From a programming point of view, it certainly seems easier to implement an algorithm than it is to understand what makes an alpha factor a GOOD factor and I feel that I have sufficient programming knowledge after about 3 months of learning the basics to attempt to carry out research the notebook and run backtests in the algorithm.

I am happy to see the continued output of all of Quantopian's educational data. I have found it to be very useful in developing a vague understanding of the concept being talked about. Certainly there is still plenty of content that has been created that is still firmly out of my grasp of reasoning because I do not have the fundamentals of an undergraduate education in finance to bolster my efforts.

@Grant, @Blue Seahawk, @John, your names are all familiar to me as being active community members within Quantopian's forums. It appears that you all have a firm understanding and I look forward to reading your material (I have difficulties understanding most of what I read though hahaha)

@Grant I have just realized that even the most common ML frameworks are not available here. From what you wrote, do you think it's still possible here to code some simple ML algorithms in Numpy for example and let them work with provided data? I was thinking to code a simple neural network, but I am worried about the computational power of the platform.