Estimation of Asset Distribution

Hi everyone,

We came up with the idea to estimate the distribution of a given set of assets. This works at the moment in MATLAB and Python, I wanted to implement the code here in Quantopia to combine our model with different factors. The IDE algorithm works, but the it is not stable, due to the time limit in data handle (It works for 14 assets).

Now, I would like to move the code to the pipeline, because their I have no time restrictions.

1) Filters are applied to get the 2% of assets with the best liquidity
2) Calculate the log returns for a given window, where are nan are dropped or set to zero. (reg. returns would work as well, but I would prefer log).
3) Start with our estimation, where the log returns are the input parameter.

The problem is, that I don't know how to handle the isnan filter or apply. There is a schedule attached.

1
Notebook previews are currently unavailable.
4 responses

Take a look at the attached notebook. Some changes.

If you want the log returns with NaNs set to zero, here is a simple fast implementation.

class Log_Returns(CustomFactor):
# set default data to close price and return window to 2
inputs = [USEquityPricing.close]
window_length = 2
def compute(self, today, asset_ids, out, close_prices):
out[:] = np.nan_to_num(np.diff(np.log(close_prices), axis=0))


0
Notebook previews are currently unavailable.

Hi Dan,

thank you very much for your help. The estimation of the distribution parameters works, but I have still some questions.

• What happens if there is a NaN value in the close prices (e.g. [100, NaN, 101]) ? The log of day 1 and 3 exist, but for day 2 there is a a NaN. When the function is taking the differences then to my understanding, the result of the log returns will be [NaN, NaN]. So I will loose one data point, correct? Is there a filter, that replaces the NaN value with the previous close price?

• H is a dispersion Matrix of (asset x asset). Is it possible to change out[:] to (asset x asset)?

• The GIGparam contains lambda, chi, psi and is a 1x3 vector. This estimates are global parameters for the given set of assets. I think, that the easiest way is to create a 3 columns with the same value, right?

• A way to reduce the calculation time is to use the estimated parameters from t-1 as input parameters in t. The pipeline is called daily, how do I reference to access to the latest information?

Cheers Manuel

@Manuel

1. Pandas has a convenient and powerful method to forward fill (and backfill if desired) called 'fillna' (see http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.fillna.html). The following will get you the log returns with all the close prices forward filled.
class Log_Returns(CustomFactor):
inputs = [USEquityPricing.close]
window_length = 2
def compute(self, today, asset_ids, out, close_prices):
# The following will forward fill the prices. It also returns a pandas datafame.
# However, Numpy handles dataframes rather seamlessly so the last statement can remain unchanged.
filled_close_prices = pd.DataFrame(close_prices).fillna(method='ffill', axis= 'columns'))
out[:] = np.diff(np.log(filled_close_prices), axis=0)


1. The 'out[:]' object needs to always be a 1-dimensioned array of assets and a single associated value. It cannot return a 2-dimensioned array as you asked. You can however add any logic inside of a custom factor. Maybe do the calculations on the 2-dimensioned array inside of the custom factor then return a single value for each asset?

2. Not sure what you mean.

3. If I understand your question correctly, you want to use a previous days pipeline calculation to feed into the current days pipeline calculations? This cannot be done in the current pipeline implementation. A pipeline cannot 'store' or access previous calculations and then have the pipeline use them in subsequent iterations. There are several reasons for this but one issue is that, in the background, pipeline calculations are generally performed as 'vector' functions across multiple days (for speed). All the data must be in place BEFORE the calculations are performed (ie the data cannot change). Therefore one cannot calculate a value then use it for the next calculation.

@ Dan,

Thank you again for your help.

1. I am new to python and have to get used to the tools. Indeed pandas seems to very powerful

2. That's a pity, because data/handle can't calculate large problems and I need to store the dispersion matrix. Has no one used ever Autocorrelation Matrix?