help w/ universe sort algorithm

All,

I'm working on an algorithm to sort a set of securities (selected using set_universe), by comparing their price (or volume, etc.) histories. The basic outline is:

1. Get a trailing window of prices for all securities.
2. Code the prices with z-scores (see Ref. 2 in algorithm).
3. Convert the coded prices into text strings.
4. Rank the securities (sids), based on a similarity comparison of the text strings

One application would be to identify outliers that are not following the overall market trend.

I'm looking for advice on step 4 above. The ranking will be done based on a text compression based comparison function (NCD, per Ref. 1 in the algorithm). Perhaps a Python expert can tell me the best way to do this. Note that I need to know the new ordering relative to the original, by sid (i.e. rank the sids based on a similarity criterion).

Generally, I suspect that the algorithm could be sped up with better coding...any ideas?

Also, sometimes I get the error:

There was a runtime error.
ValueError: cannot convert float NaN to integer
USER ALGORITHM:47, in handle_dataGo to IDE
X[j] = X[j] + str(int(coded_d[i,j]))


Any idea if this is due to the set_universe changing the list of sids, or if the batch transform filling function is not working (as I understand, it should clean all of the NaN's due to missing trades)?

Grant

13
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 516928d605a08406521e1103
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
5 responses

Hello Greg,

The NaN issue can be reproduced like this:

bottom = 12.0
range=0.1


and running the backtest from 01/01/12 to 03/31/12. The NaNs originate with the

get_data(data, context.stocks)

command. The data looks like this:

2012-03-06 PRINT context.stocks:
2012-03-06 PRINT
2012-03-06 PRINT [23497, 30666, 4218, 35998, 36929]
2012-03-06 PRINT d:
2012-03-06 PRINT
2012-03-06 PRINT[[ 11.15 6.65 8.35 20.13 nan]
[ 11.65 6.7 8.5 20.93 nan]
[ 12. 6.68 8.7499 21.02 nan] ...


The NaNs then get propagated through z_d and coded_d.

Regards,

Peter

Thanks Peter,

Looks like I need to check for NaN's. Gonna have to poke around a bit at other code using set_universe to see if anybody else has coded in a NaN check...should be straightforward.

Grant

I've made some headway on this algorithm (see attached), but still have problems:

1. What is the best way to check for NaN's in the matrix returned by the batch transform?
2. The program runs fine for range = 0.1 (a handful of securities), but when I change to range = 0.2, it seems to hang. Any idea what's going on? Am I running out of memory?

Grant

13
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
 Returns 1 Month 3 Month 6 Month 12 Month
 Alpha 1 Month 3 Month 6 Month 12 Month
 Beta 1 Month 3 Month 6 Month 12 Month
 Sharpe 1 Month 3 Month 6 Month 12 Month
 Sortino 1 Month 3 Month 6 Month 12 Month
 Volatility 1 Month 3 Month 6 Month 12 Month
 Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 516df4efec6343065f7de6f0
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Hello Grant,

I thought this might remove the NaNs:

@batch_transform(refresh_period=R_P, window_length=W_L) # set globals R_P & W_L above
def get_data(datapanel,sids):
datapanel['price'] =(datapanel['price'].fillna(method='ffill')).fillna(method='backfill')
return datapanel['price'].as_matrix(sids)


But even so I can still an an error with:

set_universe(universe.DollarVolumeUniverse(90,90.3))


I've not had the algo hang with 0.2 or 0.3 as the range.

Regards,

Peter

Thanks Peter,

I'll have to continue digging into the NaN problem. When my batch transform returns NaNs, I'm not sure that filling is the right approach, since the NaNs may mean that the security is not actually trade-able at the time of the backtest. The best approach may be to exclude any security with NaN data (just ignore columns with NaNs in the matrix returned by the batch transform).

Regarding the algorithm hanging, I think that the issue is with:

all_sid_X = list((itertools.permutations(unranked_data)))


As the length of unranked_data grows, the number of permutations becomes unmanageable, since it is equal to N-factorial, where N is the length of the list to be permuted. I should have realized that this scaling would be problematic...duh! So much for the brute-force approach...

Grant