Back to Community
In Sample VS Out of sample test

Hello Guys!

I'm interested in your in-sample and out-of-sample testing.

For the 2016-2019 (36 month) period, I have made 3,000 strategies using the Random generation method.
I did not use an out of-sample period, just mentioned above
3 years as in-sample.
For this three-year period, all 3000 strategies are more than $ 5
expectancy and 0.90 have greater stability.

Then I started testing all 3000 strategies in "retest" mode,
for periods outside the in-samples: 10,20,30,50,100,200,400% for the period.

10% OOS: 3.6 month: 2015.11.01 - 2016.02.15
20% OOS: 7.2 month: 2015.07.02 - 2016.02.15
30% OOS: 10.8 month: 2015.04.01 - 2016.02.15
50% OOS: ...
100% OOS: ...
200% OOS: ...
400% OOS: 144 month: 2004.02.15 - 2016.02.15

I noticed that as the size of the out-of sample period increased,
less and less of the 3000 strategies
with a expectancy of more than $ 5 and a stability of greater than 0.90.

Do I see things right?
Why is that?
How to set In-sample and Out-of-samplet?
In-sample should be the latest data and the older one is out of sample,
or vice versa?
What ratio should I use for IS and OOS?


1 response

I don't know that there are any hard set rules for in-sample (IS) vs out-of-sample (OOS). Some people use 50% IS and 50% OOS. Others use 80% IS and 20% OOS. Personally I try to save at least 10-20% of the data for OOS, sometimes more.

Should IS data be the latest and OOS data older? Not necessarily I think, but there's nothing wrong with that either. You still have to deal with non-stationarity though, so what 'worked' most recently may not have worked in the far past or in the future. For this reason, I like to use Thomas' NB that can be found on this post. I need to use it more often really.

Lastly, you might want to look at the 'law of large numbers' and survivorship bias. Roughly half of 3000 people 'predicting' a coin flip will guess/predict correctly. 750 will guess correctly two times in a row. 375 will get it right 3 times in a row. And 3 people out of the 3000 might be able to guess correctly 10 times in a row. Does that make those 3 people any better at predicting coin flips? Or where they just randomly lucky?