Using PCA in Quantitative Finance

Principal component analysis (PCA) allows you to understand if there are a small number of parts of your data which can explain a wide swath of all data points observed. More specifically, PCA is a common dimensionality reduction technique used in statistics and machine learning to analyze high-dimensional datasets. Principal components allow us to quantify the variability of the data, leading to low-dimensional projections of matrices that contain the bulk of the information contained within the original dataset. It is used in many scientific disciplines and is incredibly applicable across a wide variety of problems. In this lecture, we examine the use of PCA for image processing and for constructing statistical risk factors of a portfolio of securities.

The Principal Component Analysis lecture's landing page is here:
https://www.quantopian.com/lectures/principal-component-analysis

As always, our lectures are all available at:
https://www.quantopian.com/lectures

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

25 responses

Hi Maxwell -

I recently discovered that for certain types of problems, one should not use ordinary least squares, but rather some form of orthogonal regression to avoid a bias called regression dilution (attenuation). I'm wondering if when calculating beta, for example, a form of orthogonal least squares should be used to avoid the bias.

For example, perhaps the first principal component of returns (e.g. of XYZ versus SPY returns) to compute beta would be more appropriate, avoiding the regression dilution bias?

Seriously? You need to pick your target market better...

The other target was the journal of participatory medicine. Should be going for students.

1 result (0.42 seconds)
Search Results
ojs.jopm.org/index.php/jpm/comment/view/31/0

Quick Essay Help. by quick essay help (2017-12-04). Email Reply. Of course, nobody wants to get low grades in assignments,especially when the marks of the write up will be included in the final exam marks. But, with the advancement of technology, students with the help of... Read more ...

@Grant, OLS certainly has its its limitations. I guess that an analogue in finance to the measurement errors that cause regression dilution would likely be when we get incorrect pricing or other data from data providers.

It sounds like you're talking about Principal Component Regression, which is a pretty cool technique that has other useful characteristics. I wouldn't necessarily use something like this to compute beta-to-SPY, however, as it would technically not be beta. For computing risk factor exposures in general, PCA can absolutely help us out. Creating statistical risk factors after you have accounted for known risk factors helps you to further cover your bases.

@ Maxwell -

I’ll have to put together a notebook to illustrate. I could be off base, but the standard beta calculation is the slope of a plot of the daily returns of XYZ vs. SPY. If one assumes this as a scatter plot of two independent measures, then the first principle component represents a vector that is the direction of greatest variation. It can be used as a model to relate changes in XYZ to changes in SPY. It would seem to be a valid model beta and not suffer from the pitfall of regression dilution characteristic of OLS.

I have been trying PCA based models for over 2+ years on Quantopian. The fundamental problem is that if you fit PCA today and fit PCA tomorrow on same set of stocks, the factors vary a lot and so do the betas of PCR.

Has anyone found a solution to beta and factor stability? For now, I am running model once in X days and storing it (but that has issues with pipeline data changing often). Another alternative is functional PCA but I am not ready yet with it.

My intuitive 2 cents...

Regarding beta stability, it has been observed that the Optimize API is not very effective in controlling beta to SPY. For example, we've shown on the forum that for a trailing beta computed with SimpleBeta (which uses OLS, not PCA), and a beta constraint of +/-0.05, the resulting beta after a 2-year backtest can be considerably higher: beta ~ 0.3 in some cases. The suggestion is that forecasting beta to SPY with a simple trailing window regression is not so easy (assuming that there isn't a bug in the Quantopian implementation).

I guess it is not surprising that multidimensional regressions using PCA would show a similar problem with stability, since the underlying assumption is that there is no temporal component. The assumption is that the trailing window of data (e.g. 1-year) was statistically stable over the window, and will maintain its statistical characteristics going forward.

In the context of market efficiency and public information, it seems reasonable to think that it would be hard to find stable and profitable market attributes; any naive approach will fail, since all market participants have access to simple tools, and have already applied them to the widely available information, eliminating any advantage (if there ever was one in the first place).

The solution, perhaps, is to use dynamic models that are explicitly formulated to forecast time series that are not very statistically stable. Usually, dynamic models of complex phenomena (e.g. weather, fluid flow, etc.) are much more computationally challenging and expensive (and in science/engineering, one has the benefit of some set of equations based on the underlying physics). In other words, trying to gain an edge with widely available information will require something more than run-of-the-mill algorithms.

@Grant -

I like your 2 cents, especially this statement:

In the context of market efficiency and public information, it seems
reasonable to think that it would be hard to find stable and
profitable market attributes; any naive approach will fail, since all
applied them to the widely available information, eliminating any
advantage (if there ever was one in the first place).

Also I often find myself asking the question: is there any alpha left at all in the data available? Isn't that possible that the only viable alpha is in data not publicly available (or in very expensive data)? Or, given the increase of automated trading systems, isn't that possible the alpha is becoming more and more transient so that trading strategies have to constantly increase their trading frequency to be able to seize the alpha? You believe the alpha is actually in the data and we only need to build more sophisticated and dynamic methods to be able to detect it. That's reasonable and eventually I need to look for and read some papers to find out more about this topic.

Nota bene: I don't have a clue. It would be interesting to hear from Maxwell. Since this is a thread on PCA-based tools, are there case studies where someone has made money using them? Or maybe you talked with Jonathan Larkin or someone else who has traded for a living, and could provide some insight. No doubt PCA finds applicability, but I'm guessing it's not like "PCA...kaboom...manna from heaven!"

And here's my penny sense....

In search of finding beta stability through forecasting individual stock beta to SPY by regression, PCA or other techniques, a more efficient approach is to forecast SPY movements alone then drill down to the individual stocks. We already know what the general relationship is between the stock universe and the market, SPY, so if you can find a way to accurately forecast SPY, then controlling individual stock beta should follow. This is how we do it with our market timing models, we try to predict with some degree of accuracy what SPY will do in the near future and based on that, do portfolio construction on individual stocks since we already know the historical relationship between individual stocks and SPY.

The caveat to all this is in trying to achieve total beta neutrality, you arrive at risk free rates returns which means zero alpha. Only reason you get some alpha, in context of beta, is the wiggle room of +-0.3. Hope this helps explain the phenomena.

For more stable PCA, check out Robust PCA. The paper is pretty good and the method is better for handling outliers and strange perturbations.

I had to regrettably come to the conclusion that there was nothing there that could be of any use in trading. None of it could be applied successfully to any kind of worthwhile trading strategy.

It was interesting for image processing, but nonetheless, not of any practical use in trading. Anyone venturing on that route should be prepared to waste a lot of time which will end with absolutely nothing to show for it.

So, I wish well to anyone who dares try. They have my admiration for pursuing such an adventuress path.

@Karl here is the revised notebook. This is more accurate.

There are several prospective improvements:

1. Replace PCA with SparsePCA
2. Replace LinearRegression with other regularized methods.
38

Hi Pravin -

I'm an intellectually lazy technical layman (actually, I make a living at this stuff, but statistics/data science/finance are not my strong suit)--what is the executive summary of your work above, in simple terms? The paper you provided is a hefty 47 pages, and your notebook, while I'm sure technically valid, has a lot more code than explanatory prose. I'm not a Python expert and the techniques are unfamiliar--what are you showing and why would one think it is valid?

Best regards,

Grant

@ Maxwell -

I'd be interested in your take on the risk model and its relationship to dimensionality reduction. If one looks at the risk model factors, it seems that they are rather arbitrary, and may not explain the actual sources of risk. We have a long list of industry sectors, which are no doubt highly correlated, and might not explain anything. The style risk factors, I gather, are as old as the hills, and probably explain very little, as well.

I'm still getting up the learning curve on this dimensionality reduction stuff, but I'm wondering if it could be applied to the Q risk model, to understand if it makes sense, or if it is just adding noise?

@Grant Here is a summary of both papers:

1. First paper identifies the systematic risk factors that drive the market (PCA factors) and regresses returns of each security against these factors to find a hedge portfolio. The cumulative sum of residuals should be mean reverting (we test for them and select only those). We assume that if the factors explain the returns and if the regression is stable then the portfolio of security and factors are mean reverting.

2. The second paper contains a technique to improve mean reversion.

Hope that helps.

Thanks Pravin - I should probably roll up my sleeves and try to understand those papers.

@Grant, Pravin,

The approach that Pravin uses in his example is somewhat similar to what I described in my above post. It is one form of statistical arbitrage using PCA as a dimension reduction procedure to determine x principal components that can explain what drives the market (SPY, in this case) based on some lookback period of historical returns. It is then drilled down to individual stocks by regressing against these factors and Pravin explains the rest of the procedure. Stat Arb originated from pairs trading, in this example, the pairing is between SPY and individual stocks. There are many other Stat Arb approaches from simple distance measure to more complex ones, like cointegration or copulas.

It seems that for this PCA stuff to work, the factors need to be stable. As I think we're seeing for even a simple beta-to-SPY, it is not so easy. One suggestion would be to use beta-to-SPY as a simple test case, to sort out how to forecast the beta of a stock N days forward. It would seem that if this is hard, then forecasting more subtle relationships would be even harder, but maybe I'm thinking about this incorrectly. In other words, we know a priori that SPY is a dominant factor, so why not start with it, before getting fancy?

Here's a relevant reference for low-dimension PCA (2x2 or 3x3 covariance matrix) :

https://arxiv.org/abs/1306.6291

A Method for Fast Diagonalization of a 2x2 or 3x3 Real Symmetric Matrix
M.J. Kronenburg
(Submitted on 26 Jun 2013 (v1), last revised 16 Feb 2015 (this version, v4))

A method is presented for fast diagonalization of a 2x2 or 3x3 real symmetric matrix, that is determination of its eigenvalues and eigenvectors. The Euler angles of the eigenvectors are computed. A small computer algebra program is used to compute some of the identities, and a C++ program for testing the formulas has been uploaded to arXiv.

One crude way, I resolve the factor instability is to fit factors and regression only once in X days. I reuse the models for the next X days.

A more rigorous approach is here: https://arxiv.org/pdf/1001.2363.pdf
Python code here: https://github.com/dfm/pcp

The L matrix are your PCs.

@Pravin, the article you relate to is a resumé of the one @Maxwell cited. It does not lead anywhere as well.

Will you be able to detect from an nxm matrix its sparse outliers? I would say: easy, even just looking at it.
However, say you have a 1,000x500 stock price variation matrix: ΔP, and you want to find its sparse outliers. You could request something like: ΔP > mean ± ½ σ, or ΔP > mean ± 2 σ, your choice. The higher you will set the threshold, the sparser the S_0 matrix will become. Note that ΔP would have 500,000 data points. Even if you put 5,000 data points above 3σ, they will still be easy to detect. But, that is not the problem. Well, not the one worth something anyway.

The question is: will it improve anything knowing that: ΔP – S_0 = L_0 ? Or, will you not be looking at about the same almost randomly generated price matrix? And going forward, all the information you have accumulated over that ΔP matrix will not help you at the right hedge of the chart where you have to make your trading decisions on those 500 stocks.

Stock prices do not have sparse low density noise that dances randomly over a stable high level backdrop signal. On the contrary, whatever signal you might have is buried in high density noise to such an extent that the noise Z_0 itself is much greater than L_0.

If the authors had shown a PCP with high density noise, what their picture would have looked like would have just been static.

It makes the application of PCP strictly inapplicable to extracting useful predictive information from moving stock prices. Other methods than that would be required.

My two cents.

@Guy thanks. It looks like what you are saying holds true. I found this paper that talks about this in financial risk domain. Yet to follow everything the paper says but could be worth the attempt: http://cdar.berkeley.edu/wp-content/uploads/2016/09/risk_seminar_slides_041216.pdf

@Pravin, any trading strategy has for payoff matrix: Σ(H∙ΔP). Therefore an ongoing portfolio's assets can be expressed as: A(t) = A(0) + Σ(H∙ΔP) – ΣExp. The section of interest is: Σ(H∙ΔP) which characterize the trading strategy. And, it does reduce the portfolio management problem to some inventory management methods under uncertainty.

In a market-neutral strategy is requested a 50/50 long/short scenario:
A(t) = A(0) + 0.50∙(Σ(H∙ΔP) – ΣExp) + 0.50∙(Σ(-H∙-ΔP) – ΣExp)

where, -H is the short inventory where profits come from declining prices -ΔP. Evidently, if ΔP is positive on your short trades, you are losing money, just as if ΔP is negative on long.

Such a portfolio should end up with: A(t) = A(0) + Σ(H∙ΔP) – ΣExp.

This is great. It does say that you can have your lunch and eat it too. But, that is on the premise that you make as much on your longs as you do on your shorts.

You could be market-neutral, have low beta, low volatility and low drawdowns, and still maintain your expected long-term average market CAGR. There would be, evidently, trading expenses, but then nothing is perfect. There is always a price to pay.

However, it is not what I see in all those market-neutral strategies. They usually barely beat the risk-free rate, and that is if they do. Usually, performance is less than the average market return which could have been had simply by buying indexed funds.

Since the outcome of a trading strategy depends on the trading methods used, one would have to conclude that the methods themselves are at fault.

For instance, if the trading methods used are not that good at shorting, then we should not expect the short side to do its part. And if the strategy is not that good on the long side either, then we should not expect much from such a market-neutral portfolio.

As for the slide presentation you cited, their methods limit the size of the square matrix used since they need a matrix inverse. Also, they need really sparse matrices where anomalies would be really really sparse, and where background noise would be minimal. None of which the market can provide. And, even from where they are, they are trying to ascertain the size of the forest with their nose on a single tree.

It is like the imaging system. You might not need much math to differentiate photos. You take a snapshot with no people, and one with people. Then isolate the people with a simple subtraction... But, still, it won't tell me where they are going for lunch.

As a footnote. Where were those factors today? What did they predict?

BTW, I found the other mentioned paper on statistical arbitrage by Avellanda more interesting and quite well written. There is more that can be extracted from there.

Hi,
I used PCA a good bit since 2001 or so to manage interest rate derivative portfolios. I found it most useful to quickly hedge up the portfolio going into risk events - numbers, elections etc. I set it up so I could quickly hedge 1st, 2nd, 3rd factors or any combination...e.g. hedging 1st 3 factors would normally cover 97% of my expected risk. You could set up your hedging instruments to be any part of the curve depending on the portfolio - e.g. 2y, 10y, 30y. I set it up so I could look at factor weightings over different periods of time e.g 3 months, 1y, 2y etc - or vol, low vol environments or situations that I felt were close to the current environment. You can also apply it to rich / cheap analysis on the curve, butterflies etc but I didn't find much return from that. Am sure all of the above could be applied to stocks too. But again I found it most useful as a hedging tool for complicated portfolios when I was in a hurry!
Regards