Feedback Requested: Improvements to Quantopian's risk and performance calculations

Summary:
Quantopian has created empyrical , an open source library that calculates risk and performance metrics. empyrical will soon be used by other libraries on Quantopian, like zipline and pyfolio. The use of empyrical will change the calculations used by the Quantopian backtester.

When this change rolls out, the risk and performance metrics (like Sharpe ratio) for new backtests will not match those from previously generated backtests.

Before we update the backtester, we're requesting community members review our calculation methods in empyrical. We'd love to hear your feedback.

Detailed Description
We've been grappling for some time with a set of problems related to how we calculate and display risk and performance metrics in our products - metrics such as Sharpe Ratio, Max Drawdown and others. As many in the community have helpfully pointed out, we have been inaccurate in some of our calculation methods, especially within zipline (which is used in the Quantopian backtester). Furthermore we were inconsistent in our calculation methods across zipline and pyfolio. As a result the backtester frequently displayed a different value for metrics (like Sharpe Ratio) as compared to pyfolio.

In order to solve these accuracy and inconsistency problems, we've created a unified library for use by zipline and pyfolio. This open-source library is empyrical. empyrical will be deployed to the Quantopian site in the coming days within pyfolio and the Quantopian backtester.

This rollout brings benefits such as consistent and more accurate calculations across Quantopian. Furthermore, because the library is open source, you (the Quantopian community) can examine the methods used for calculating the metrics.

This move to empyrical brings changes to our metrics calculations. The calculations previously made prior to the use of empyrical in the backtester will be different from new calculations. Specifically, following metrics are impacted:

• Max Drawdown
• Sharpe Ratio
• Sortino
• Downside Risk
• Information Ratio
• Alpha

There are numerous reasons the values have changed. I won't outline each of them here. Generally, our testing has shown that most impacted metrics like Sharpe Ratio will be lower with empyrical. This will not universally be true.

Our intention here is to communicate some key items to the community:

1. Our methodology for calculating risk and performance metrics in the backtester will be changing in the near term so previous backtester calculations will not match future calculations. You will see the changes in the coming days both in the backtester and the contest leaderboard. The timeline of the rollout will be, in part, influenced by the feedback that we get.
2. The methodology for pyfolio calculations remains the same -- it will simply use empyrical.
3. We're keen for the community to investigate and provide us any feedback on our calculation methodologies. The platform has benefited in the past from your insight and feedback and we hope that empyrical will be another opportunity for you to contribute.

If you have questions on the changes, especially the changes in our methodology, feel free to ask them here -- or check them out for yourself in the empyrical repo.

Happy coding,
Josh

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

57 responses

It looks like the new library takes risk-free rates as a given. Will you be passing in the rolling daily OIS rates, or something like that?

Good point Simon.

Furthermore, OIS rates would likely provide a better benchmark for market neutral strategies. Currently there aren't good benchmark options within the backtester IDE except for short duration interest rate ETFs like BIL and MINT whose history only extends back to 2007.

Yeah, last time I checked it was using the price return of treasury ETFs, which is of course completely wrong; as rates drop, those ETFs jump, so a price-return of the ETF might be 10% for a year, but the risk-free rate was never 10%.

can we have rolling metrics like annualized gains

+1 for addressing a dynamic risk free rate.

@ Simon, Thanks for noticing this. I spent a few minutes looking for OIS on CME, but could not find it. Do you have a link for a quote? Is the market heavily traded?

Edit: But thinking about it twice...Is it the right idea to peg it to the Fed Funds market while Interest on Excess Reserves remains greater than Fed Funds. Thoughts?

@ Q - One thing that would be nice if a dynamic risk free rate is to be added to empyrical, would be for that same data source to be accessible within our algos. For example, if OIS is chosen as the benchmark, then it would be great if OIS (or OSS?) were added to the list of futures contracts that will be released for futures. Overnight Libor would probably be the quickest engineering solution as it already available from the Quandl data.

@ Q - The beta calculation function could cause issues in the future if a risk free rate other than 0 is implemented. The line of code below where the covariance is calculated is subtracting the risk free rate. Probably not noticed for now as risk free is set to 0 so it does not impact the calculation, but probably best to remove now.

existing:

def beta(returns, factor_returns, risk_free=0.0):
"""Calculates beta.
Parameters
----------
returns : pd.Series
Daily returns of the strategy, noncumulative.
- See full explanation in :func:~empyrical.stats.cum_returns.
factor_returns : pd.Series
Daily noncumulative returns of the factor to which beta is
computed. Usually a benchmark such as the market.
- This is in the same style as returns.
risk_free : int, float, optional
Constant risk-free return throughout the period. For example, the
interest rate on a three month us treasury bill.
Returns
-------
float
Beta.
"""

if len(returns) < 2:
return np.nan

covar = np.cov(returns.dropna()-risk_free,
factor_returns.dropna(), ddof=0)[0][1]

return covar/np.var(factor_returns)



fixed:

def beta(returns, factor_returns, risk_free=0.0):
"""Calculates beta.
Parameters
----------
returns : pd.Series
Daily returns of the strategy, noncumulative.
- See full explanation in :func:~empyrical.stats.cum_returns.
factor_returns : pd.Series
Daily noncumulative returns of the factor to which beta is
computed. Usually a benchmark such as the market.
- This is in the same style as returns.
risk_free : int, float, optional
Constant risk-free return throughout the period. For example, the
interest rate on a three month us treasury bill.
Returns
-------
float
Beta.
"""

if len(returns) < 2:
return np.nan

covar = np.cov(returns.dropna(),
factor_returns.dropna(), ddof=0)[0][1]

return covar/np.var(factor_returns)



Thanks for the feedback. While empyrical allows you to pass in any risk-free rate time-series, our initial implementation on Q will set it to 0 (i.e. no risk-free contribution). The reason is that in the backtester you currently do not earn interest on your cash (and leverage is free). Fixing risk-free to 0 will also make the performance numbers from the IDE identical to those produced by pyfolio. Risk-free also acts as constant offset for every algorithm (over the same time-period), so as long as they all use the same risk-free rate, they are comparable.

Eventually, we will fix the issues in the backtester and probably also let you specify your own (dynamic) risk-free rate. I want to stress though, that the fixes here are much more fundamental than any contribution from risk-free rates.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi Thomas,

Another disappointment.

Thomas Wiecki
Nov 17, 2015
Thanks everyone.
Here's what we'll do:
* Fix the zipline implementation to use risk-adjusted returns in the denominator
* Default zipline to use 1M T-Bills instead of 10Y.
* Have pyfolio use risk-adjust (using 1M T-Bills) when used in research.

Taking into consideration that risk free rate is not constant, simplification of calculation
does not respect the spirit of what the Sharpe ratio is trying to calculate - namely
the leverage independent quality of a strategy.
The fact that backtester currently do not charge interest for leverage and borrowing cost may not be the reason
for changing formula of Sharpe ratio.

It is not constant over time, but it is constant across strategies tested over the same time period. I should add that we don't prohibit you from using your own risk-free rate. empyrical will be available on research, thus you can pass in your own risk-free rate to compute the Sharpe Ratio in the manner you see fit.

Thomas
I tell you what I would find really helpful but I suspect it may be too unsophisticated for you lot.

Win / Loss statistics:
% of trades winners / losers % winning / losing months

R squared of the equity curve
Downside deviation
Longest drawdown
Average max drawdown
Earned interest
Margin interest
Total commissions
Total slippage
Total dividends (although your time series will not enable this I think)
Maybe some Monte Carlo confidence level statistics
Percent profit factor
Average win %
Average Loss %

Oh, and if you allow shorting (or indeed insist upon it) then I think it ought to be accounted for properly in back testing - an assumed borrowing rate, handing back of dividends and so forth

@Anthony: Thanks for the suggestions, I agree on all of these. In fact, most of those are already implemented in pyfolio as part of the round-trip analysis, see also https://www.quantopian.com/posts/round-trip-trade-analysis for a preview. This version of pyfolio is not on research yet but will be as soon as we update some dependencies (mainly pandas). We have also done more analysis on slippage and commissions which now that you mention it I realize should make it into pyfolio.

And yes, borrowing rate when going short etc is critical too.

Tomas,

If I will use my own risk-free rate it will be Vladimir Ratio.
If you will use your own risk-free rate it will be Tomas Ratio and they may differ.
What we need -the industry standard - Sharpe Ratio ( reward-to-variability ratio) named after the winner of the 1990 Nobel Memorial Prize William Forsyth Sharpe.

Bootstrap analysis: https://github.com/quantopian/pyfolio/pull/261
R²-measure: https://github.com/quantopian/empyrical/blob/master/empyrical/stats.py#L671

Nice Thomas! Thanks
A

@Thomas - as suggested by both Valdimir and James (and numerous others in the past), we really need CAGR to be added to the stats people see when looking at a shared backtest.

The current process of having to clone an algo, then run a backtest, then analyse the CAGR is really not good enough. We should be able to look at a shared backtest and just see the CAGR (just like we can see the sharpe ratio, max drawdown etc.).

This is not just an improvement to the backtester but an important improvement to the forums.

Adding another vote for CAGR. Returns mean nothing unless they are normalized.

When should we expect this to go live? I'd like to see CAGR as well.

Tomas,

If I will use my own risk-free rate it will be Vladimir Ratio.
If you will use your own risk-free rate it will be Tomas Ratio and they may differ.
What we need -the industry standard - Sharpe Ratio ( reward-to-variability ratio) named after the winner of the 1990 Nobel Memorial Prize William Forsyth Sharpe.

Vladimir, You can also pass in the William-Forsyth-Sharpe-approved risk-free rate ;). To my knowledge, that does not exist though and this thread already highlights how many different opinions there are on what risk-free rate should be used. There are other complexities too, like what to use if your strategy is fully hedged long/short. As I said, we might change this going forward and you have the option to compute the Sharpe Ratio as you see fit, but for now we have to make due with greatly improved and more accurate performance-metrics that do not take risk-free rates into account.

Can you share a clonable Notebook?

Tomas,
you have the option to compute the Sharpe Ratio as you see fit
Is not it the same as for Tomas 1 inch equal 25mm for Vladimir 25.4 mm and for somebody else 26 mm ?
Sharpe Ratio is a metric and should not differ from standard.

If the name of your risk metric has a formal definition that is used in academic research, you must use that definition. If you want to modify the Sharpe Ratio, you call it something else — perhaps "Q Sharpe Ratio" ;)

Even worse is allowing people to calculate it however they want. Then you can't even compare apples to apples on your own platform.

In your specification, all of your industry standard risk metrics should follow the accepted definitions. Further, for each one you should cite the papers where the calculations come from.

I glossed over the code. Many are isolated calculation. You should have one tearsheet function which calculated alls the measures and charts like in Alphan.

@Suminda: Pyfolio (http://quantopian.github.io/pyfolio/) has that functionality and is available on research.

If there is a consensus risk-free rate to use when calculating the Sharpe Ratio I think we should use that. For the Quantopian platform it would be easy enough to implement a metric like the Safety First Ratio which allows the developer the option to vary the minimum acceptable return.

We need a way to clearly model the costs of shorting. Given the profit incentive to build hedged strategies I think it's important these costs are explicitly considered during algorithm development.

A "draw down curve" is something that needs to become standard in a lot of the strategy sharing posts in this forum.

So far all I've seen plotted are the equity curves and sometimes a leverage ratio plotted beneath them. However, equity curves are misleading over long periods of time.

The draw down curve will show the % loss from the last high point on the equity curve.

is there a metric for drawdown that throws out peaks? like a moving average drawdown? as an example, your algo pops 100%, then loses 25% the next day due to a single security. its gains are 50% but the drawdown is now 25%, which doesn't accurately reflect the performance of the algorithm. if the algo eases into positions (example: a momentum algo) then it wouldn't ever experience that 25% drawdown without also experiencing that 100% gain

James,

If your algo has those values in your backtest then typically you don't want to throw them out for analysis. In reality you don't get to trade on moving averages.

When you perform a complete analysis of your strategy, it is possible that you might want to supplement your standard metrics with a breakdown by instrument or group. However, if you're going to try to claim that your strategy won't suffer such outliers going forward, you need to build a quantitative or qualitative argument for why that won't happen.

I think you are misunderstanding. I'm giving an extreme example of why straight drawdown from peak to valley isn't necessarily an accurate representation of true drawdown as it pertains to risk.

James,

But it is an accurate representation. Under the same set of conditions your algorithm will trade the exact same way. Therefore your outcome will potentially be the same.

Metrics are supposed to be very raw, comparable, and informative representations of how a strategy might perform in the future.

What you're trying to do is create a metric that is more favorable to the results of your strategy. What you should be doing is creating better logic for your strategy.

If I'm 50% richer than I was 10 days ago then why call any sort of drops from the theoretical maximum profit as drawdown? Drawdown implies risk against your initial investment. I might buy the security when it is cheap, then hold it when its clearly overbought and on a downtrend. If I were to start my algo after the point in time where my algo is signaled to buy the security, then it would never experience the drawdown in question.

James,

Draw down is the amount of money (or percentage) that you are down from your most recent peak. That's all. Plotted over time, the draw down curve gives a good idea of expected behavior of a strategy/algorithm.

It's fine if you want to show results from your algorithm that start at a different date than one that would generate a draw down. When investors perform due diligence they will inevitably ask for alternative scenarios and figure out that you were curve fitting. Such is the nature of quant.

If and when you are actually trading with investor's money, what happens when you draw down 10-20%? That's guaranteed to happen from time to time for even the best non-high-frequency strategies. You will answer 2n phone calls where n is the number of clients you have. When you're just starting out then you are at the whim of your clients, often providing daily liquidity if they want to withdraw their money at the end of any trading day. That's why having these solid backtests and metrics is helpful.

No one is going to give you money based on your moving average. And no one is going to keep their money with you for that reason either.

An ulcer index would be great. It gives both the duration and severity of drawdown.

Hi all,

Thanks so much for all the thoughtful feedback.

With respect to the debate on Sharpe Ratio, we understood going into this discussion that there would be a diversity of opinions on our chosen method. We think there is enough flexibility built into the library for the different methods to be accommodated, including our preferred approach for the purpose of the Quantopian backtester. I know there will be people who will disagree with us. This is a situation where we have to make a decision that makes at least some people unhappy.

The feedback on CAGR has been heard as well. We are in the midst of adding the calculation to empyrical.

WIth that in mind, we will be merging the changes to the backtester in the 1 - 2 days.

Thanks
Josh

One minor request; in the live trading dashboard, could you please add Equity? Long/short/cash is pretty useless, and I just wind up getting out my HP every time I want to see where the accounts are relative to initial funding. This is particularly important since every time they get stopped (Q2 ahem) I lose all my trading history. Equity vs initial funding is the only way to calculate CAGR since inception despite restarts, aside from IB reports.

Josh and Thomas

Thanks...excited for the merge to occur and thanks again for this update. The risk free rate explanation makes a lot of sense from the perspective of Q.

One thing to think about in the future...I noticed a post from Adam the other day regarding modelling taxes in the backtest:

https://www.quantopian.com/posts/modeling-taxes-into-back-testing

It would be a good idea to build in a feature that calculates after tax free cash flow from returns. Essentially a function in empyrical that defines the cap gains tax rate and the ordinary income tax rate. 20% for long term cap gains and 39.6% for ordinary income would be a good default rate.

Then for every position liquidated over the course of the trading year, the system would calculate the duration of that trade and calculate the tax rate on that position. From there the effective tax rate of the account can be calculated for the year, and ultimately for the entire backtest. When I have some free time (after the merge) I will fork empyrical and try to see if I can put something together.

Hi folks,

Quick update -- this change is in the process of propagating out to our servers and you'll see the new behavior shortly (if not already).

Thanks
Josh

The first test shows wrong calculation of Sharpe and Sortino Ratios in Bactester.
Pyfolio metrics:

Backtest Months: 26
Backtest
annual_return 0.14
annual_volatility 0.09
sharpe_ratio 1.54
calmar_ratio 1.95
stability 0.93
max_drawdown -0.07
omega_ratio 1.30
sortino_ratio 2.31
skewness -0.07
kurtosis 2.39
information_ratio 0.03
alpha 0.14
beta 0.05

Becktester Metrics:

Total Returns
33.8%
Benchmark Returns
16%
Alpha
0.13
Beta
0.06
Sharpe
0.37

Sortino
0.55

Information Ratio
0.02
Volatility
0.09
Max Drawdown
-7.4%

so which employee at Quantopian cannot do math?

Toan,

That is not very polite.Maybe a suggestion on how to fix the issue would be more appropriate?

Frank, I'm just frustrated. but for an angry person, i wonder what's the correct balance between speaking your mind and being polite. hm.

I think Thumper provides some great insight...

Are the Pyfolio metrics that you've reported from the Research environment?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Lotana,

Pyfolio metrics from Research is right.
Becktester Metrics from Becktester is wrong.
The algo is the same.

We're investigating this; from the screenshot it's hard to analyze the underlying problem. We sent you a private email if you could link us to the backtest driving those performance calculations. Thanks for your help.

I think, in essence, what Vladimir is trying to point out is how it's possible to reach a Sharpe ratio of ~0.3 now when it was ~1.5 before the change (with unchanged 14%-ish annual return and 9%-ish annual volatility). Assuming a standard formula for Sharpe Ratio that would imply a risk-free interest rate of around 11%.

@josh when you said "Quick update -- this change is in the process of propagating out to our servers and you'll see the new behavior shortly (if not already)"

Is CAGR now in the backtest? I'm not seeing it.

While empyrical allows you to pass in any risk-free rate time-series, our initial implementation on Q will set it to 0 (i.e. no risk-free contribution). The reason is that in the backtester you currently do not earn interest on your cash (and leverage is free). Fixing risk-free to 0 will also make the performance numbers from the IDE identical to those produced by pyfolio. Risk-free also acts as constant offset for every algorithm (over the same time-period), so as long as they all use the same risk-free rate, they are comparable.

This essentially means that positive P/L is positive sharpe. But I don't see that happening. In the meantime, can we atleast trust pyfolio metrics? Please advise.

@Adam, the CAGR calculation (to my knowledge) was added to empyrical but not yet to the backtester.

@Pravin, @Vlad, et al: thanks to the help of this thread, we think we've identified a bug in the new calculation of Sharpe Ratio in the backtester relative to the use in pyfolio. We are working on a fix right now. We're working on fixing the backtester.

Regards,
Josh

Oh, and logarithmic charts.

I would second this, IMHO the chart should definitely be logarithmic by default.

I think so

Alpha, Beta and Sharpe would all be the same here if calculated based on the amount invested since it is the same in all. Wouldn't that eliminate a lot of difficulties? Downward these are heavy margin to unused capital.

@Tomas,

Don't you think that top ranking of unproductive algo in the contests 22, 24, 34, 35 are results of simplified (without risk free return) calculation of Sharpe and Sortino Ratios?

Quantopian open contest 22 results

The first on finish was Wenkai Zhang with amazing 0.86% return.(Disqualified Quietly)

Wenkai Zhang 1

score_contest_rank 1 score_contest 86.30