Trading algo: how to interpret resuts

Hi guys,

The "getting started with futures" lesson has a complete implementation of a pair trading algo. I'm attaching the algo here for convenience.

Let me give you some context in case you don't remember this specific example.

The idea is that some commodities' prices should be related. In this case they picked crude oil (CL) and gasoline (XB). If they are indeed related then we expect that a price difference should mean revert: if the short moving average is larger than the long moving average we should short the difference, and vice-versa.

To implement the above idea they define variables short_ma = 5 and long_ma = 65 and then compute a zscore. When the zscore is larger (or smaller) than 1.0 (or -1.0) that's a signal to short (or long) the difference between the CL and XB futures. They also define exit signals, which are triggered in the opposite direction when the zscore is 0.0.

Now lets discuss results and how to interpret them.

Lets pick a time period to discuss specific numbers: from 2016-01-01 to 2017-09-29.

• If you run the algo on that time period as is it, you get total return 34.27% and sharpe ratio 2.16.

Lets play a bit with the entry signal.

• If you set short_ma = 2 you get total return -7.71% and sharpe -0.47.

• If you set short_ma = 3 you get total return 6.02% and sharpe 0.46.

• If you set short_ma = 4 you get total return 20.5% and sharpe 1.45.

• If you set short_ma = 6 you get total return 7.44% and sharpe 0.56.

• If you set short_ma = 7 you get total return 16.92% and sharpe 1.20.

• If you set short_ma = 8 you get total return 2.52% and sharpe 0.22.

Now, I understand that there's other things that we can play with other than short_ma. We could change long_ma, we could try to use a different signal than zscore, or we could change the exit signal etc etc. There's many different detailed ways in which to implement a specific idea.

But how to optimise isn't the point of this post. What I'd like to discuss is: How much optimisation is too much optimisation? Here for example the idea is sound, but the results seem way too sensitive to tiny changes.

This is important if you want to go and find other viable pairs. I've tried dozens of other highly correlated pairs and couldn't get good results with short_ma = 5. Is it that for some pairs the idea doesn't work at all (even if the pair is correlated), or could it be that short_ma = 6 or 7 would yield better results? And if it does, how can you detect if a good result for a specific free parameter is a fluke? If you try enough signals with enough assets, it will happen sooner or later.

91
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np
import scipy as sp
from quantopian.algorithm import order_optimal_portfolio
import quantopian.optimize as opt

def initialize(context):

# Get continuous futures for Light Sweet Crude Oil...
context.crude_oil = continuous_future('CL', roll='calendar')
# ... and RBOB Gasoline
context.gasoline = continuous_future('XB', roll='calendar')

# Long and short moving average window lengths
context.long_ma = 65
context.short_ma = 5

# True if we currently hold a long position on the spread
# True if we currently hold a short position on the spread

# Rebalance pairs every day, 30 minutes after market open
schedule_function(func=rebalance_pairs,
date_rule=date_rules.every_day(),
time_rule=time_rules.market_open(minutes=30))

# Record Crude Oil and Gasoline Futures prices everyday
schedule_function(record_price,
date_rules.every_day(),
time_rules.market_open())

def rebalance_pairs(context, data):

# Calculate how far away the current spread is from its equilibrium

# Get target weights to rebalance portfolio
target_weights = get_target_weights(context, data, zscore)

if target_weights:
# If we have target weights, rebalance portfolio
order_optimal_portfolio(
opt.TargetWeights(target_weights),
constraints=[]
)

# Get pricing data for our pair of continuous futures
prices = data.history([context.crude_oil,
context.gasoline],
'price',
context.long_ma,
'1d')

cl_price = prices[context.crude_oil]
xb_price = prices[context.gasoline]

# Calculate returns for each continuous future
cl_returns = cl_price.pct_change()[1:]
xb_returns = xb_price.pct_change()[1:]

regression = sp.stats.linregress(
xb_returns[-context.long_ma:],
cl_returns[-context.long_ma:],
)
spreads = cl_returns - (regression.slope * xb_returns)

# Calculate zscore of current spread

return zscore

def get_target_weights(context, data, zscore):

# Get current contracts for both continuous futures
cl_contract, xb_contract = data.current(
[context.crude_oil, context.gasoline],
'contract'
)

# Initialize target weights
target_weights = {}

if context.currently_short_the_spread and zscore < 0.0:
# Update target weights to exit position
target_weights[cl_contract] = 0
target_weights[xb_contract] = 0

elif context.currently_long_the_spread and zscore > 0.0:
# Update target weights to exit position
target_weights[cl_contract] = 0
target_weights[xb_contract] = 0

elif zscore < -1.0 and (not context.currently_long_the_spread):
# Update target weights to long the spread
target_weights[cl_contract] = 0.5
target_weights[xb_contract] = -0.5

elif zscore > 1.0 and (not context.currently_short_the_spread):
# Update target weights to short the spread
target_weights[cl_contract] = -0.5
target_weights[xb_contract] = 0.5

return target_weights

def record_price(context, data):

# Get current price of primary crude oil and gasoline contracts.
crude_oil_price = data.current(context.crude_oil, 'price')
gasoline_price = data.current(context.gasoline, 'price')

# Adjust price of gasoline (42x) so that both futures have same scale.
record(Crude_Oil=crude_oil_price, Gasoline=gasoline_price*42)

There was a runtime error.
14 responses

We have this problem in machine learning as well. I will share our solution for it:
You have 3 sets of data:

1. Data you fit/train your model on, called train data in ML,
2. Data you use to test and optimize your model free-parameters (also called hyperparameters), called validation data in ML,
3. Data you evaluate your final model (called test set in ML).

If the parameters perform well on the validation set and on the test set then they are good, if only on the validation set then you have overfitted the validation set.

Hey @Illjia, thanks for coming back about this. You're right, of course, however in this particular case there was a bigger issue here: that notebook had a critical bug. Have a look here.

Sharpe of 3 using RBOB Gasoline and Natural Gas futures

33
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np
import scipy as sp
from quantopian.algorithm import order_optimal_portfolio
import quantopian.optimize as opt

def initialize(context):

# Get continuous futures for Nat Gas...
context.crude_oil = continuous_future('NG', roll='calendar')
# ... and RBOB Gasoline
context.gasoline = continuous_future('XB', roll='calendar')

# Long and short moving average window lengths
context.long_ma = 60
context.short_ma = 5

# True if we currently hold a long position on the spread
# True if we currently hold a short position on the spread

# Rebalance pairs every day, 30 minutes after market open
schedule_function(func=rebalance_pairs,
date_rule=date_rules.every_day(),
time_rule=time_rules.market_open(minutes=30))

# Record Crude Oil and Gasoline Futures prices everyday
schedule_function(record_price,
date_rules.every_day(),
time_rules.market_open())

def rebalance_pairs(context, data):

# Calculate how far away the current spread is from its equilibrium

# Get target weights to rebalance portfolio
target_weights = get_target_weights(context, data, zscore)

if target_weights:
# If we have target weights, rebalance portfolio
order_optimal_portfolio(
opt.TargetWeights(target_weights),
constraints=[]
)

# Get pricing data for our pair of continuous futures
prices = data.history([context.crude_oil,
context.gasoline],
'price',
context.long_ma,
'1d')

cl_price = prices[context.crude_oil]
xb_price = prices[context.gasoline]

# Calculate returns for each continuous future
cl_returns = cl_price.pct_change()[1:]
xb_returns = xb_price.pct_change()[1:]

regression = sp.stats.linregress(
xb_returns[-context.long_ma:],
cl_returns[-context.long_ma:],
)
spreads = cl_returns - (regression.slope * xb_returns)

# Calculate zscore of current spread

return zscore

def get_target_weights(context, data, zscore):

# Get current contracts for both continuous futures
cl_contract, xb_contract = data.current(
[context.crude_oil, context.gasoline],
'contract'
)

# Initialize target weights
target_weights = {}

if context.currently_short_the_spread and zscore < 0.0:
# Update target weights to exit position
target_weights[cl_contract] = 0
target_weights[xb_contract] = 0

elif context.currently_long_the_spread and zscore > 0.0:
# Update target weights to exit position
target_weights[cl_contract] = 0
target_weights[xb_contract] = 0

elif zscore < -1.0 and (not context.currently_long_the_spread):
# Update target weights to long the spread
target_weights[cl_contract] = 0.5
target_weights[xb_contract] = -0.5

elif zscore > 1.0 and (not context.currently_short_the_spread):
# Update target weights to short the spread
target_weights[cl_contract] = -0.5
target_weights[xb_contract] = 0.5

return target_weights

def record_price(context, data):

# Get current price of primary crude oil and gasoline contracts.
crude_oil_price = data.current(context.crude_oil, 'price')
gasoline_price = data.current(context.gasoline, 'price')

# Adjust price of gasoline (42x) so that both futures have same scale.
record(Crude_Oil=crude_oil_price, Gasoline=gasoline_price*42)

There was a runtime error.

Hey Frank, several observations.

• First you have a bug in line 71. It should be zscore = (np.mean(spreads[-context.short_ma:]) - np.mean(spreads)) / np.std(spreads, ddof=1)

• Second if you set date starting at 2014-01-01 the algorithm has a sharpe of 0.45, so clearly you just got good results on a lucky period of time. I wouldn't trade that into the future.

This is probably only a minor point in the big scheme of things, but might perhaps help to lead someone on to other bigger ideas.

As Joao writes: "The idea is that some commodities' prices should be related. In this case they picked crude oil (CL) and gasoline (XB)".

In this case the relationship is obvious, as one commodity is simply a refined product obtained from the other. Similarly for all the other petroleum refinery products, and similarly for Soybeans, Bean Meal and Bean Oil. However the relationship between CL & NatGas is not so straightforward. Even though they are both hydrocarbons and are often (but not always) produced together, there is a major difference in how they are transported and sold. As a liquid and relatively easy to transport, oil can be sold on the spot or forward market anywhere in the world at any time, and there is generally a very short delay between production & sale. Gas on the other hand is difficult and expensive to transport because of its low density and therefore low value vs volume compared to oil. Gas is sold in one of 2 ways. Mostly locally via pipeline from the production site (wells) to the place of demand (cities). Usually these are in the same country and so, unlike oil, US Nat Gas is NOT an international commodity but strictly a local one, in the sense of "local" to the USA. This is also true of nat gas in most other countries, with the exception of parts of Europe, where a significant proportion of the gas used in some European countries comes by pipeline from Russia. So the price of pipeline gas in general is very much a "local" price only. The significant price differences of Nat Gas around the world generally do NOT provide arbitrage opportunities. The other way that gas is sold is as Liquified Natural Gas (LNG). This is relatively easily transportable compared to gas in the gas phase but the ships required are special ones, not conventional tankers. Unlike oil, the infrastructure required for handling LNG is large & expensive and so, even though LNG can be transported internationally, it is usually sold on the basis of very long-term contracts rather than on the spot market. So, the result of this is that, within the energy futures group, Nat Gas is very much the odd one out as it is a "local, mostly US only" commodity, whereas the others are truly international commodities.

Now, with the exception of NatGas, we also have another link between all the physical deliverable commodities, which is that their prices are denominated in US dollars. To those of you who live in the USA, this might seem like a "huh? of course , so what?" type of comment, but to anyone outside of the US there is a very obvious link between all physically deliverable commodities and that link is the exchange rate of the USD vs their own local currency. This leads to some interesting (and possibly unexpected) relationships between the prices of commodities that are apparently completely unrelated, for example wheat and silver.

I hope this bit of oilfield insight helps to explain a few things, the main one being the link between ALL commodities that are traded internationally at prices denominated in USD.

Good luck, happy trading, best wishes from Tony
(former project manager & Petroleum Reservoir Engineer, now retired).

Usually these are in the same country and so, unlike oil, US Nat Gas is NOT an international commodity but strictly a local one, in the sense of "local" to the USA.

Amazing insight Tony, thank you.

Regarding the petrodollar you're right, but it's changing. Russia has decided to denominate its oil in yuan in sales to China link and Iran is accepting rupees in sales to India link

Hi Joao,

When I use your code I get a syntax error. I am not sure why this is happening. Can you provide the code on a backtest please? I am not a programmer but I have an intraday strategy I want to test but I need to use minute data. Can we use the spread for the continuous_future using the product prices not the MA? We then add RSI and BB to trade the price spread range intraday over a set correlated assets.

Frank
btw, I have worked with many traders that have traded interproduct energy futures. Alpha can be found!

I've just replaced your line 71 with my line. I'm attaching the backtest.

5
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
import numpy as np
import scipy as sp
from quantopian.algorithm import order_optimal_portfolio
import quantopian.optimize as opt

def initialize(context):

# Get continuous futures for Light Sweet Crude Oil...
context.crude_oil = continuous_future('CL', roll='calendar')
# ... and RBOB Gasoline
context.gasoline = continuous_future('XB', roll='calendar')

# Long and short moving average window lengths
context.long_ma = 65
context.short_ma = 5

# True if we currently hold a long position on the spread
# True if we currently hold a short position on the spread

# Rebalance pairs every day, 30 minutes after market open
schedule_function(func=rebalance_pairs,
date_rule=date_rules.every_day(),
time_rule=time_rules.market_open(minutes=30))

# Record Crude Oil and Gasoline Futures prices everyday
schedule_function(record_price,
date_rules.every_day(),
time_rules.market_open())

def rebalance_pairs(context, data):

# Calculate how far away the current spread is from its equilibrium

# Get target weights to rebalance portfolio
target_weights = get_target_weights(context, data, zscore)

if target_weights:
# If we have target weights, rebalance portfolio
order_optimal_portfolio(
opt.TargetWeights(target_weights),
constraints=[]
)

# Get pricing data for our pair of continuous futures
prices = data.history([context.crude_oil,
context.gasoline],
'price',
context.long_ma,
'1d')

cl_price = prices[context.crude_oil]
xb_price = prices[context.gasoline]

# Calculate returns for each continuous future
cl_returns = cl_price.pct_change()[1:]
xb_returns = xb_price.pct_change()[1:]

regression = sp.stats.linregress(
xb_returns[-context.long_ma:],
cl_returns[-context.long_ma:],
)
spreads = cl_returns - (regression.slope * xb_returns)

# Calculate zscore of current spread

return zscore

def get_target_weights(context, data, zscore):

# Get current contracts for both continuous futures
cl_contract, xb_contract = data.current(
[context.crude_oil, context.gasoline],
'contract'
)

# Initialize target weights
target_weights = {}

if context.currently_short_the_spread and zscore < 0.0:
# Update target weights to exit position
target_weights[cl_contract] = 0
target_weights[xb_contract] = 0

elif context.currently_long_the_spread and zscore > 0.0:
# Update target weights to exit position
target_weights[cl_contract] = 0
target_weights[xb_contract] = 0

elif zscore < -1.0 and (not context.currently_long_the_spread):
# Update target weights to long the spread
target_weights[cl_contract] = 0.5
target_weights[xb_contract] = -0.5

elif zscore > 1.0 and (not context.currently_short_the_spread):
# Update target weights to short the spread
target_weights[cl_contract] = -0.5
target_weights[xb_contract] = 0.5

return target_weights

def record_price(context, data):

# Get current price of primary crude oil and gasoline contracts.
crude_oil_price = data.current(context.crude_oil, 'price')
gasoline_price = data.current(context.gasoline, 'price')

# Adjust price of gasoline (42x) so that both futures have same scale.
record(Crude_Oil=crude_oil_price, Gasoline=gasoline_price*42)

There was a runtime error.

Hi Joao,

"Regarding the petrodollar you're right, but it's changing. Russia has decided to denominate its oil in yuan in sales to China link and Iran is accepting rupees in sales to India link".

Yes, correct, and in fact these are not the only examples of the beginning of moves away from purely USD-denominated commodity prices. Some other examples are Iron Ore and I think also Copper(?). A movement towards truly international (rather than just USD-centric) commodity pricing would break the current link that exists between all commodity prices with the exception of NatGas, but that link will probably remain for a long time and continue to provide exploitable opportunities in trading algos based on unexpected relationships between the futures of apparently unrelated commodities.

Hi Frank
"btw, I have worked with many traders that have traded interproduct energy futures. Alpha can be found!"

Yes, for sure. The so-called crack spread and variants of it are functions of the profit margin earned by petroleum refineries. I have never looked at it in much detail, but I would expect to find some interesting relationships between inter-product energy futures and the profit data of some integrated energy companies that have refining as a significant part of their operations. I think this sort of relationship may provide two different sources of alpha:

1) For anyone mainly interested in Futures strategies: Use of corporate Fundamentals data (Morningstar) of the producer companies as long-term background input factors for the relevant futures, and

2) For anyone mainly interested in Equities strategies:Use of commodity futures data as short-term input factors for any companies for whom the relevant commodities are either a cost or a profit center. (In fact I wanted to do this years ago but until now I never had the nice combination of platform and helpful community of people as here at Quantopian).

Could I suggest to the Q staff that it is worth kicking off a new thread (if one doesn't exist already) specifically on combining futures data PLUS equities data, and any pitfalls that may be associated with it.

Cheers, best wishes, Tony

HI Tony,

I recently wrote a case study on Easyjet plc. Seems to be arb opportunities in new futures energy contacts (especially - Low Sulphur Gasoil (LSGO) futures (see page 14) - https://www.theice.com/publicdocs/futures/Jet_Fuel_Hedging_and_Trading_at_ICE.pdf

I can't find Brent crude on the drop down list. I would like to test the famous WTI-Brent spread on continuous_future function.

Great Idea Tony on combining futures and equities data.

Regards,
Frank

Hi Joano,

Am I right in thinking this stays constant throughout the algorithm's duration. Surely this should be rolling to keep the hedging ratio fixed?

# Adjust price of gasoline (42x) so that both futures have same scale.
record(Crude_Oil=crude_oil_price, Gasoline=gasoline_price*42)

Thanks,

Frank,

Those prices are updated at each time step as you can see below (but this wasn't the point of this thread)

    # Get current price of primary crude oil and gasoline contracts.
crude_oil_price = data.current(context.crude_oil, 'price')
gasoline_price = data.current(context.gasoline, 'price')
# Adjust price of gasoline (42x) so that both futures have same scale.
record(Crude_Oil=crude_oil_price, Gasoline=gasoline_price*42)


Hi Frank,

" ...case study on Easyjet plc" : Yes, I think all the airlines should make interesting "case studies" for algo development. My guess would be that the lower cost airlines probably have the smallest profit margins and therefore likely to be the most sensitive to fuel price changes.

Another aspect of this is fuel substitution as energy prices change. Hydrocarbons generally have four main uses: 1) as feedstock for plastics & chemicals, 2) as fuel for energy generation for industry, 3) for domestic use (heating & cooking), and 4) for transport.

Items 2) and to a lesser extent 3) are susceptible to change, for example substitution of gas for oil, and also coal (especially as technologies improve to clean up the environmental aspects of burning coal. Item 4) transport is the interesting one, especially air transport. Trains can run on electricity supplied via overhead wires and electricity can be generated in lots of different ways (oil, gas, coal, nuclear, solar, wind, tidal power, hydroelectric, etc). Cars & trucks will probably become increasingly electric in future as battery & other electricity storage technologies improve. But as for air transport, the one and only aviation fuel is derived from oil and will probably stay that way for a long time. I mean the idea of a coal-burning plane certainly never took off even if anyone was silly enough to think of it, and as for nuclear-powered planes or battery-powered planes, well I don't really think I would want to fly in them, would you? ;-))

"WTI-Brent spread": yes, worth looking at I would imagine, although I'm not quite sure how this would work exactly. Brent comes from the North Sea and is therefore most relevant to Europe. WTI (WestTexas Intermediate) ... well its obvious where that one comes from and so its most relevant to North America.

Good luck & best wishes, Tony.

So we are not able to access Brent prices through Quantopian?