This post was originally featured on the Quantopian Blog and authored by Sepideh Sadeghi and Dr. Thomas Wiecki.

**Foreword by Thomas**

This blog post is the result of a very successful research project by Sepideh Sadeghi, a PhD student at Tufts who did an internship at Quantopian over the summer 2015. Follow her on twitter here.

**Introduction**

When evaluating trading algorithms we generally have access to backtest results over a couple of years and a limited amount of paper or real money traded data. The biggest with evaluating a strategy based on the backtest is that it might be overfit to look good only on past data but will fail on unseen data. In this blog, we will take a stab at addressing this problem using Bayesian estimation and prediction of possible future returns we expect to see based on the backtest results. At Quantopian we are building a crowd-source hedge fund and face this problem on a daily basis.

Here, we will briefly introduce two Bayesian models that can be used for predicting future daily returns. These models take the time series of past daily returns of an algorithm as input and simulate possible future daily returns as output. We will talk about the variations of models that can be used for prediction and how they compare to each other in another blog, but here we will mostly talk about how to use the predictions of such models to extract useful information about the algorithms.

**How do we get the model inputs?**

At Quantopian we have built a world-class backtester that allows everyone with basic Python skills to write a trading algorithm and test it on historical data. The resulting daily returns generated by the backtest will be used to train the model predicting the future daily returns.

**What can be learned from the predictive models?**

Lets not forget that computational modeling always comes with some risks such as estimation uncertainty, model misspecifications and implementation limitations and errors. According to such risk factors, model predictions are not always perfect and 100% reliable. However, model predictions still can be used to extract useful information about algorithms, even if the predictions are not perfect.

For example, comparing the actual performance of a trading algorithm on unseen market data with the predictions generated by our model can inform us whether the algorithm is behaving as expected based on its backtest or whether it is overfit to only work well on past data. Such algorithms may have the best backtest results but they may not necessarily have the best performance in live trading. An example of such an algorithm can be seen in the picture below. As you can see, the live trading results of the algorithm are completely out of our prediction area, and the algorithm is performing worse than our predictions. These predictions are generated by fitting a linear line through the cumulative backtest returns. We then assume that this linear trend continuous going forward. As we have more uncertainty about events further in the future, the linear cone is widening assuming returns are normally distributed with a variance estimated from the backtest data. This is certainly not the best way to generate predictions as it has a couple of strong assumptions like normality of returns and that we can confidently estimate the variance accurately based on limited backtest data. Below we show that we can improve these cone-shaped predictions using Bayesian models to predict the future returns.

On the other hand, there are algorithms that perform equally well on data from the past and on live trading data. An example of that can be seen in the picture below.

And finally, we can find differences between the algorithm behavior in the past and in live trading period that are due to changes in the market and not due to the characteristics of the algorithm itself. For example the picture below illustrates an algorithm, which is doing pretty well until sometime in 2008, but all of a sudden it crashes as the market crashes.

**Why Bayesian models?**

In the Bayesian approach we do not get a single estimate for our model parameters as we would with maximum likelihood estimation. Instead, we get a complete posterior distribution for each model parameter, which quantifies how likely different values are for that model parameter. For example, with few data points our estimation uncertainty will be high reflected by a wide posterior distribution. As we gather more data, our uncertainty about the model parameters will decrease and we will get an increasingly narrower posterior distribution. There are many more benefits to the Bayesian approach, such as the ability to incorporate prior knowledge that are outside the scope of this blog post.

Now that we have answered the problem of why predicting future returns and why using Bayesian models for this purpose, lets briefly look at two Bayesian models that can be used for prediction. These models make different assumptions about how daily returns are distributed.

**Normal model**

We call the first model the normal model. This model assumes that daily returns are sampled from a normal distribution whose mean and standard deviation are accordingly sampled from a normal distribution and a halfcauchy distribution. The statistical description of the normal model and its implementation in PyMC3 are illustrated below.

This is the statistical model:

```
mu ~ Normal(0, 0.01)
sigma ~ HalfCauch(1)
returns ~ Normal(mu, sigma)
```

And this is the code used to implement this model in PyMC3:

```
with pm.Model():
mu = pm.Normal('mean returns', mu=0, sd=.01, testval=data.mean())
sigma = pm.HalfCauchy('volatility', beta=1, testval=data.std())
returns = pm.Normal('returns', mu=mu, sd=sigma, observed=data)
# Fit the model
start = pm.find_MAP()
step = pm.NUTS(scaling=start)
trace = pm.sample(samples, step, start=start)
```

**T-model**

We call the second model the T-model. This model is very much similar to the first model except that it assumes that daily returns are sampled from a Student-T distribution. The T distribution is very much like a normal distribution but it has heavier tails, which makes it a better distribution to capture data points that are far away from the center of data distribution. It is well known that daily returns are in fact not normally distributed as they have heavy tails.

This is the statistical description of the model:

```
mu ~ Normal(0, 0.01)
sigma ~ HalfCauchy(1)
nu ~ Exp(0.1)
returns ~ T(nu+2, mu, sigma)
```

And this is the code used to implement this model in PyMC3:

```
with pm.Model():
mu = pm.Normal('mean returns', mu=0, sd=.01)
sigma = pm.HalfCauchy('volatility', beta=1)
nu = pm.Exponential('nu_minus_two', 1. / 10.)
returns = pm.T('returns', nu=nu + 2, mu=mu, sd=sigma,
observed=data)
# Fit model to data
start = pm.find_MAP(fmin=sp.optimize.fmin_powell)
step = pm.NUTS(scaling=start)
trace = pm.sample(samples, step, start=start)
```

**Prediction Cone: Visualization of predictions for live trading period**

Here, we describe the steps of creating predictions from our Bayesian model. These predictions can be visualized with a cone-shaped area of cumulative returns that we expect to see from the model. Assume that we are working with the normal model fit to past daily returns of a trading algorithm. The result of this fitting this model in PyMC3 is are the posterior distributions for the model parameters mu (mean) and sigma (variance) – fig a.

Now we take one sample from the mu posterior distribution and one sample from the sigma posterior distribution with which we can build a normal distribution. This gives us one possible normal distribution that has a reasonable fit to the daily returns data. - fig b.

To generate predicted returns, we take random samples from that normal distribution (the inferred underlying distribution) as can be seen in fig c.

Having the predicted daily returns we can compute the predicted time series of cumulative returns, which is shown in fig d. Note that we have only one predicted path of possible future live trading results because we only had one prediction for each day. We can get more lines of predictions by building more than one inferred distribution on top of actual data and repeating the same steps for each inferred distribution. So we take n samples from the mu posterior and n samples from the sigma posterior. For each posterior sample, we can build n inferred distributions. From each inferred distribution we can again generate future returns and a possible cumulative returns path (fig e). We can summarize the possible cumulative returns we generated by computing the 5%, 25%, 75% and 95% percentile scores for each day and instead plotting those. This leaves us with 4 lines marking the 5, 25, 75 and 95 percentile scores. We highlight the interval between 5 and 95 percentiles in light blue and the interval between 25 and 75 percentiles in dark blue to represent our increased credible interval. This gives us the cone illustrated in fig f. Intuitively, if we observe cumulative returns from an algorithm that are very different from the backtest, we would expect it walk outside of our credible region. In general, this procedure of generating data from the posterior is called a posterior predictive check.

**Overfitting and Bayesian consistency score**

Now that we have talked about the Bayesian cone and how it has been generated, you may ask what these Bayesian cones can be used for. Just to give a demonstration of what can be learned from Bayesian cones, look at the cones illustrated below. The cone on the right shows an algorithm whose live trading results are pretty much within our prediction area and to be more accurate even in high confidence interval of our prediction area. This basically means that the algorithm is performing in line with our predictions. On the other hand, the cone on the left

shows an algorithm whose live trading results are pretty much outside of our prediction area, which would prompt us to take a closer look as to why the algorithm is behaving according to specifications and potentially turn it off if it is used for real-money live trading. This underperformance in the live trading might be due to the algorithm being overfit to the past market data or other reasons that should be investigated by the person who is deploying the algorithm or selects whether to invest using this algorithm.

Lets take a look at the prediction cones generated using the simple linear model we described in the beginning of the blog. It is interesting to see that there is nothing worrisome about the algorithm on the left, while we know that the algorithm illustrated on the right is overfit and the fact that the Bayesian cone gets at that but the linear cone does not, is reinforcing.

One of the ways by which the Baysian cone can be useful is detecting the overfit algorithms with good backtest results. In order to be able to numerically measure by how much a strategy is overfit, we have developed Bayesian consistency score. This score is a numerical measure to report the level of consistency between the model predictions and the actual live trading results.

For this, we compute the average percentile score of the paper-trading returns to the predictions and normalize to yield a value between 100 (perfect fit) and 0 (completely outside of cone). See below for an example where we get a high consistency score for an algorithm (the right cone) which stays in the high confidence interval of the Bayesian prediction area (between the 5 to 95 percentiles) and a low value for an algorithm (the left cone) which is mostly out of predicted area.

**Accounting for estimation uncertainty**

Estimation uncertainty is one of the risk factors, which becomes relevant with modeling and it is reflected on the width of the prediction cone. The more uncertain our predictions, the wider the cone would be. There are two ways by which we may get uncertain predictions from our model: 1) little data, 2) high volatility in the daily returns. First, lets look at how the linear cone deals with uncertainty due to limited amounts of data. For this, we create two cones from cumulative returns of the same trading algorithm. The first only has the 10 most recent in-sample days of trading data, while the second one is fit with the full 300 days of in-sample trading data.

Note how the width of the cone is actually wider in the case where we have more data. That's because the linear cone does not take uncertainty into account. Now let's look at how the Bayesian cone looks like:

As you can see, the top plot has a much wider cone reflecting the fact that we can't really predict what will happen based on the very limited amount of data we have.

Not accounting for uncertainty is only one of the downsides of the linear cone, the other ones are the normality and linearity assumptions it makes. There is no good reason to believe that the slope of the regression line corresponding to the live trading results should be the same as the slope of the regression line corresponding to the backtest results and normality around such line can be problematic when we have big jumps or high volatility in our data.

**Summary**

Having reliable predictive models that not only provide us with predictions but also with model uncertainty in those predictions allows us to have a better evaluation of different risk factors associated with deploying trading algorithms. Notice the word “reliable” in my previous sentence, which is to refer to the risk of “estimation uncertainty”, a risk factor that becomes relevant with modeling and ideally we would like to minimize it. There are other systematic and unsystematic risk factors as is illustrated in the figure below. Our Bayesian models can account for volatility risk, tail risk as well as estimation uncertainty.

Furthermore, we can use the predicted cumulative returns to derive a Bayesian Value at Risk (VaR) measure. For example the figure below

shows the distribution of predicted cumulative returns over the next five days (taking uncertainty and tail risk into account). The line

indicates that there is a 5% chance of losing 10% or more of our assets over the next 5 days.

To use these models on your data, check out pyfolio, which is also available on the Quantopian research platform .