Back to Community
sklearn.decomposition.PCA: Every 2nd results sign flipped


I ran the PCA method from sklearn on both Quantopian and on my local PC using Python 2.7. The resulting factors have the sign flipped for every second value. Can you please shed light on why this might be happening? I'm assuming Quantopian is still using Python 2.7

statTS = numpy.array....... (stationary time series)
factors = PCA(0.8, whiten=False).fit_transform(statTS)

Quantopian results: factors
array([[-0.17741941, -0.03666197, -0.02030477, 0.07525749, -0.05066121, -0.03248473],.......

Local PC results: factors
array([[0.17741941, -0.03666197, 0.02030477, 0.07525749, 0.05066121, -0.03248473],.......


5 responses

Q may provide a definitive answer, but for now my experience as a user may be helpful to you. The direction of PCA eigen vectors is implementation specific and meaningless. If two different implementations find the same eigen vector but in opposite directions, then their eigen values will have opposite signs. From that, I would gather that your local implementation has a code version that is different than the Quantopian version, and the relevant version difference is likely but not necessarily specific to sklearn only. The absolute values you found are an exact match, which is the expected result.

Doug, thanks for the input. I have seen the signs of all eigenvectors between two different implementations flipped. Here, however, only the odd eigenvector values are flipped, the signs on the even ones still match. Why would that be the case?
If I want to run a regression analysis using these factors (R version: lm(y ~ PC) , I get totally different results for the regression analysis depending upon which implementation of PCA I use. What can be done in this case?

I just re-ran PCA using a subset of the data I used before. This time, I got 4 factors and signs of the last 2 factors are flipped instead of only the odd ones previously. What is causing the (AFAIK) same implementation of PCA on QT and my pc to cause this?

I'd suggest studying eigen vectors, eigen values, and basis vectors to better understand what PCA does.What you have listed above appears to be the first entry of each column eigen vector.

Yes, I just put the first entry, but the flipped signs are being repeated across the whole array. I understand the concept of PCA, but not sure how to use it in regression ( lm (y ~ PC1) vs lm(y ~ PC2) ) if some signs gets flipped as it will give a very different regression result