Increasing turnover with mostly fundamental factors

Good day everyone, I am a long time lurker but very infrequent poster on the platform. But now i am reaching out to our brave community because i am in need of some advice.

I have attached a back test result from one of my few algos that actually worked somewhat ok. Its doing nothing fancy really just cobbling together a lot of factors and weighting them using a random forest every now and again.
However since most factors are of the fundamental kind they are really quiet slow moving, especially during periods where faster factors tend to perform bad (thats my belief but have not put that much thought into it).

In general this is a problem, since i cannot compete with this algo if i cannot force it to trade a bit more.
Obvious solutions might be:

1. Fundamental factors are for loosers and should be minimized, since all they do is basically overfit -> Ditch a few of these factors, they are probably very correlated anyhow and the only reason why they work in my little test is of data snooping.
1. Add some really fast moving factors like high high and low low counter ? -> never really found these to work in terms of information ratios
2. Force the algo to trade a bit more with the cunning use of some trick -> would be neat but I cannot really figure out such a trick
3. Give up and made take up another hobby -> Fly fishing perhaps ?

What do you guys think ? Again my strategy is really not doing anything spectacular but taking mostly known factors and cobbling them together.

Thanks everyone !

7
7 responses

I hear what you say and have had the same thoughts although I firmly believe fundamental factors are far more reliable than technical, price based indicators. What is faintly amusing is that one should be forced to consider how to increase trading when most rational individuals using fundamental factors for investment would seek to trade as little as possible. HFT is a different kettle of fish of course.

I also note with wry amusement that few publish their code these days. Once upon a time things were different here but presumably now people fancy there chances on becomi g the next Jim Simon co~operative efforts have gone out of the window.

Mere wry observations. No criticism intended.

Hi Magnus,

Here are some things I can think of that might help (some you've already mentioned).

• Include factors with 'price' or 'market cap' in them, or anything
else that updates daily. E.g. some price-ratio or 'yield'.
• Increase position concentration (hold fewer positions), though it
• Using tighter constraints in Optimize API should make it trade more
frequently.
• Rebalancing more frequently might possibly help as well.

Daily average turnover is showing as 7.3, which should be high enough? Is it dipping below 5.0 for too long and too often?

Also, I'm not sure I agree with your below comment. Why do you say this? I'd say the opposite is true actually, and that you're more likely to overfit (and possibly more prone to data snooping?) using OHLCV price/volume data, no?

Fundamental factors are for loosers and should be minimized, since all
they do is basically overfit -> Ditch a few of these factors, they are
probably very correlated anyhow and the only reason why they work in
my little test is of data snooping.

Hello Joakim,

These are very good pointers indeed!

lets me further explain my thoughts on overfitting fundamental factors:

Since i know that since the dawn of time (at least dawn of 2006-2007) certain sectors has outperformed the S&P quiet nicely.
I imagine that is i took a simple S&P technology vs S&P cons staples I would have a really decent strategy.

That being said you are almost certainly correct that I could probably overfit alot more using technical factors.

So what ill do (in list format as everyone knows lists are the way to go)

1. Include a few more mixed factors, i.e. fundamentals to current price levels, as most of my factors are quarterly updated right now, mixing them with market cap should maybe, perhaps increase performance. -> ill get back to you on this one.

1. concentration comes naturally as i am using the stock twist and broker ratings dataset, as not all securities are covered here many will fallout.
2. Playing with the optimizer, as i have unscrupulously "stolen" "lent" or whatever you wanna call it this code from different tutorials there might be more to do here.
3. Uncorrelated factors are nice! My thought here is : Is there a good way to check the intra correlation of factors ? I could perhaps just do the following:
Enter all factors in a notebook.
Compute rank correlation across all factors.
2. I am re balacing daily now, but training my little RandomForest (the code used to do this is by the way readily available from Quantopian Thomas).

Ill implement these points and get back to y'all.

Thanks alot for the tips.

(this is a bit of rambling on my part, haven't thought it through super rigorously) Note: I agree with the above poster fewer and fewer ppl are posting their entire code. However that should only be natural as this is infact a type of prisoners dilemma (correct me if I am wrong!) , if everyone were to shared everything the community would probably be alot better off. However if you shared your precious factor and your competitor doesn't you will probably be worse of! So the equilibrium is that few people share their insight, at least if they believe their insights are any good.

Thanks for all the help, ill get back to you with some updates

Hi @Magnus (the lurker ;-) O,

Fundamental factors are definitely NOT "... *for loosers and should be minimized, since all they do is basically overfit".
Hang in there with those fundamental factors!!

I agree w @Joakim's comments, although with one specific exception, as follows.
Conceptually of course the idea of "uncorrelated factors" or orthogonal basis vectors as input certainly makes sense intuitively BUT in practice it doesn't necessarily work out that way here. Yes, you are correct that many of the pieces of fundamental data are: "... probably very correlated anyhow", but personally i have found that sometimes when throwing out something that is OBVIOUSLY correlated with some other input, or something that appears to be only trivially important, then causes the result to fall apart completely. Why?

Well i think it's because we are dealing with a very, very non-linear system and the relationship between output Result = Cumulative profit, or Sharpe ratio, or Minimum DD (or whatever mixture you might be using as your own personal objective function(s)) and some set of fundamental or other inputs A, B, C is often not of a simple linear form like: Result = a1*A + b1*B, but something much trickier like: IF C is NOT present then Result = a1*A + b1*B ELSE IF C is present then Result = a2*A + b2*B + c123*C^2/B, etc, as a hypothetical example.

I have been trying to explore this problem in a structured way (as well as a"trial-and-error, mostly error" approach for a long time, but still have not come up with a conclusively good methodology. See separate thread on "Fundamentals & Experimental Design", which i have been hoping someone with expertise in that area might respond to .........

@Tony,

Indeed, that's been my experience as well (should increase turnover though regardless ;)).

Does the question then become: In the context of the Q Contest, how does one limit the universe, for each factor, to only stocks where that factor is predictable?

I think I've tried something like below previously, but with no success:

combined_factor = (
)



Not sure if using factor.rank() instead would make a difference?

One thing that i believe probably contributes to the problem is missing or NaN entries in some data fields of some stocks at some times. Often i have observed that inputting a factor as one of the terms in the "combined factor", and then inactivating that term later by multiplying by zero does NOT produce the same results as omitting the factor.

For example, i naievely assumed that:
combined_factor = (weight1 * factor1.zscore() + weight2 * factor2.zscore() + 0.0 * factor3.zscore())
"should" be the same as: combined_factor = (weight1 * factor1.zscore() + weight2 * factor2.zscore())
because "mathematically" the equation: Result = weight1*factor1 + weight2*factor2
is completely equivalent to: Result = weight1*factor1 + weight2*factor3 + 0.0*factor3

But in fact in they turn out NOT to be the same because each factor involves taking elements of a data set and if a data item is NaN then calculating 0*NaN is not the same as omitting the relevant item.

So, before coming to the issue that you @Joakim raise of factors being predictable, i sometimes get caught up with the problem of whether or not the factor even EXISTS at some times for some stocks. Every time i have tried cleaning up this problem by explicitly setting all NaNs to zero, i have also run into the problem that my code doesn't run properly. This is probably just because of limitations in my python skills, so if you can help me by showing me the required modification to your code snippet above to "zero the NaNs" correctly, then i would be most appreciative. Please ...

Now, coming on the the issue of factor PREDICTABILITY, let's consider using our general "combined_factor" and calling the result of it the "solution landscape". Just as with technical analysis inputs, i observe that with fundamental inputs the solution landscape is sometimes very smooth (good) and we can find high plateaux & gently sloping hills in multiple dimensions where the tops are meaningful extrema, whereas sometimes i infer that parts of the solution landscape surface are very jagged and ill-behaved indeed. That's not only a problem mathematically for optimizing, but also a problem in terms of robustness of the (financial) solution. Of course we would like to have smooth solution landscape surfaces.

One way of helping with this is to be very careful with ratios. A common but actually very nasty fundamental ratio is PE. Of course the stock price P is always positive > 0, but EPS can be positive, negative, zero, or missing. And so, if earnings decline, go negative, then rise again, we will have 2 singularities in PE, as well as whatever might happen with the NaNs for missing EPS data. Even if we avoid those singularities, we still get wild extreme values of PE at low values of EPS, so any solution landscape obtained from using PE as an input is potentially unstable, ill-behaved, or at least somewhat unpredictable.

The solution in this case is to use Earnings Yield rather than PE as input. The information content is essentially the same, but the stability or predictability characteristics are very different. Of course this is well-known in the case of PE ratio, but something similar happens in many other fundamental ratios as well. So, in answer to your question about factors & predictability, the first step is to take a lot of care of which ratios we use. Sometimes the solution is just to invert the ratio, or sometimes to choose a near-equivalent that is better-behaved.

Personally i always use zscore() rather than rank(), mainly just because i started that way, but also for two other reasons: