@Dan. Good call noticing that
(a * b).sum() is just
a.dot(b). I ended up simplifying the
vectorized_beta implementation in the post a bit in the interest of clarity, though I actually didn't think about just using
.dot! In the real implementation in zipline, we're not using
dotbecause there's no nan-aware version of it in numpy (at least not one that I'm aware of).
For what it's worth, I think this is the fastest pure vectorized beta implementation I can muster without resorting to Cython or C:
def fastest_vectorized_beta_I_can_muster(spy, assets):
# Allocate one len(assets) array and fill it initially with the
# column-means of assets. We'll re-use this buffer several times in the
# course of this function.
buf = assets.mean(axis=0)
# Subtract the means from each asset in-place.
# Note: This mutates the input in place, so don't do this if the caller
# expects to use `assets` again!
np.subtract(assets, buf, out=assets)
# Overwrite the output of the "covariance" dot product into `buf`.
# Overwrite the output of the division into `buf` again.
np.divide(buf, spy.dot(spy), out=buf)
# buf now holds our expected output.
On my laptop for a 504-day lookback with 1000 assets, this is about 3 times faster than the
vectorized_beta function in the notebook:
In : spy.shape
In : assets.shape
Out: (504, 1000)
In : spy2d = spy[:, np.newaxis]; spy2d.shape
Out: (504, 1)
In : def vectorized_beta(spy, assets): # This is the version from the notebook.
...: asset_residuals = assets - assets.mean(axis=0)
...: spy_residuals = spy - spy.mean()
...: covariances = (asset_residuals * spy_residuals).sum(axis=0)
...: spy_variance = (spy_residuals ** 2).sum()
...: return covariances / spy_variance
In : %timeit -n500 vectorized_beta(spy2d, assets)
500 loops, best of 3: 2.81 ms per loop
In : def fastest_vectorized_beta_I_can_muster(spy, assets):
...: buf = assets.mean(axis=0)
...: np.subtract(assets, buf, out=assets)
...: spy.dot(assets, out=buf)
...: np.divide(buf, spy.dot(spy), out=buf)
...: return buf
In : %timeit -n500 fastest_vectorized_beta_I_can_muster(spy, assets)
500 loops, best of 3: 936 µs per loop
I'm not sure whether
fastest_vectorized_beta_I_can_muster gets most of its speedup just from the cost of extra allocations, or if it's because we're getting better cache locality. I suspect it's a mix of both.
One thing I'd like to do in a future update would be to add a fast path to
SimpleBeta that uses something like the above if you pass
allowed_missing_percentage=0. The current implementation pays a steeper performance cost than I'd like in exchange for handling missing data robustly. We can probably claw some of that back my moving the implementation to Cython (which would let us remove at least one large allocation), but ultimately handling nans correctly requires a branch per array element, which is a real cost at this level of abstraction.