Back to Community
What is the best way to deal with NaN values in price data?

I've spent quite a bit of time researching this on the forum but have yet to find a clear cut answer. The solutions seem to break down as follows:

  1. Remove any offending rows - this removes 'good data' for other stocks on that day
  2. Remove any offending stocks - getting rid of an entire stock seems drastic
  3. Mark as zero - this would screw up any factors calculated on that window
  4. Forward fill - might not that be representative
  5. Refill NaN values with the mean, e.g. .apply(lambda x: x.fillna(x.mean()),axis=0) - best I've seen...?

Is there a Quantopian recommended way to deal with NaN values in price data?

4 responses

You fill in gaps with simulated values derived from a random number generator, I'm not sure how this is done, but I've heard this is a solution many professionals use.

Thanks Jason. Will throw it into the mix :-)

Use a proxy, thats the best solution. If you use random data, even if fitted with the stocks' moments, this would not work. Imagine you want to compute correlation for example, using simulated value would result in an enormous error.

Mathieu M , could explain more about it giving an example ? thanks

Other solution , in my mind we can make a regression to fill up the NaN's values...to check...