Back to Community
1st attempt: finding co-fluctuating stocks

This algo should, in theory, find stocks that tend to fluctuate with each other. Based on example from the sklearn website.

My program seems to hang sometimes for no reason - perhaps someone can help?

With a couple of stocks it seems to work fine:

2011-06-01 handle_data:45 INFO Start date: 2011-06-01 00:00:00+00:00  
2012-05-30 handle_data:61 INFO Finished recording data : 251 days  
2012-05-30 handle_data:65 INFO Have 7 complete histories  
2012-05-30 handle_data:81 INFO 3 groups found:  
2012-05-30 PRINT Cluster 1: 4707, 5061, 20486, 3149  
2012-05-30 PRINT Cluster 2: 24  
2012-05-30 PRINT Cluster 3: 18522, 5885  

One problem is not being able to look up the name of SIDs. And being limited to 10 SIDs in total means that more general analysis can't be done.

Interesting all the same :)

Perhaps someone could check it with a bunch of unrelated stocks and a couple known to co-fluctuate?

Clone Algorithm
Backtest from to with initial capital
Total Returns
Information Ratio
Benchmark Returns
Max Drawdown
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Information Ratio 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 50babca7e61d525089797913
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
8 responses

Hello James,

I was able to add three more securities and did not have a problem with the backtest hanging:

    c.sids = []  
    c.sids.append(sid(3149)) #any more than 7 seems to make it hang?  

Also, so that all references to sids are in initialize(context), I replaced line 52 of your code above with:

if timedelta(weeks=52) > (data[c.sids[0]].datetime - c.startDate):  

However, the backtest hangs if I then try these securities instead:

    c.sids = [sid(700),sid(8229),sid(4283),sid(1267),sid(698),sid(3951),sid(5923),sid(3496),sid(7792),sid(7883)]  

Hi Grant,

Good idea with line 52. Unfortunately, the backtester now hangs whatever I do for this algo - even for the version that originally worked.


First, thanks for a great share. This is a really interesting approach.

Sorry for the hiccough with your backtest. A few of us just ran through the logs, and I think we must be swallowing an exception from your algorithm. Would you mind clicking the feedback link while your backtest is hung? It captures a bit of information about the context that might help us a little bit.

I was able to reproduce the hang running a backtest from 7/30/2010 to 7/30/2012, so hopefully we can pinpoint the problem soon. We're on it!



When I encounter the hanging, our system logs a key error for line 96 of the algorithm. The error should be reported to you in the UI, and we need to figure out why it isn't, but in the meantime I thought this information might get you past the hangup.


Hi fawce,

I've submitted the feedback for you anyway. I'll have a think with the algo and see if I can work out why its broken


Hi James,

I'm not certain, but I don't think the block of code at lines 92-96 of your algo is doing quite what you intend. I think if you replace those five lines with this:


and delete the grp_idx variable initialization from a couple lines above there, it'll have the intended effect.

Going one step further, since the indices you're using into "groups" are sequential ascending integers, you should probably make it a list instead of a dictionary. You can do that by changing this:

            groups = dict()  
            for x in range(numGroups):  
                groups[x] = []

with this:

            groups = []  
            for x in range(numGroups):  

Hope this helps. We're still looking into why the error from the algo didn't bubble back to your browser.

HI again, James,

We've identified the bug within our application which is preventing the exception in your code from being reported back to you in the browser as it should be. We've got a fix in hand and it'll be in the next release we push.


Jonathan Kamens

Hi Jonathan,

Thanks for your updates. I had started to wonder if the code at the end was actually going into an infinite loop or something and indeed it isn't the best written code, especially for python. I'm a C programmer by heart, so not used to thinking on the higher-level yet ;)

This works fine:

from sklearn import cluster, covariance  
from datetime import timedelta  
import numpy as np

# based on the example at:  

# use in quick backtester with 12 months worth of data

def initialize(context):  
    c = context  
    c.started = False  
    c.stopped = False  
    c.startDate = None  
    c.sids = []  
    c.history = dict()      # place to store the data  
    c.incomplete = set()  
    c.days = 0  
    # some sids to look at  
    c.sids = [sid(4697),sid(18522),sid(5061),sid(20486),sid(5885),sid(4707),sid(3149),sid(35920),sid(5005),sid(13797)]  
    # create a list for each sid record  
    for s in c.sids:  
        c.history[s] = []

def handle_data(context, data):  
    c = context  
    # more init on first call  
    if c.started == False:  
        if c.sids[0] not in data:  
            log.error("no starting date")  
            c.startDate = data[c.sids[0]].datetime  
  "Start date: %s" % ( c.startDate ))  
            c.started = True  
    # normal case  
    if c.started == True:  
        if c.stopped == False:  
            # record everything for a period of 12 months  
            if timedelta(weeks=52) > (data[c.sids[0]].datetime - c.startDate):  
                c.days += 1  
                for s in c.sids:  
                    if s in data:  
                        # add the day's price range to the list for this sid  
                        c.history[s].append(data[s].close - data[s].open)  
                        log.error("%s sid data not found!" % (str(s)))  
      "Finished recording data : %s days" % (c.days))  
                c.stopped = True  
                numHistories = len(c.history)  
      "Have %s complete histories" % (numHistories))

                # create a variation matrix  
                # each row represents the time-series of (close - open) prices  
                variation = np.array([ c.history[v] for v in c.history ]).astype(np.float)  
                # tell it we're looking for a graph structure  
                edge_model = covariance.GraphLassoCV()  
                X = variation.copy().T  
                X /= X.std(axis=0)  
                # now process into clusters based on co-fluctuation  
                _, labels = cluster.affinity_propagation(edge_model.covariance_)  
                numGroups = labels.max() + 1

      "%i groups found:" % (numGroups))  
                # create structure to store groups  
                groups = []  
                for x in range(numGroups):  
                # filter the sids into the groups  
                for i, grp_idx in enumerate(labels):  
                    groups[grp_idx].append( int(c.sids[i]) )  
                # display stock sids that co-fluctuate:  
                for g in range(numGroups):  
                    print 'Cluster %i: %s' % (g + 1, ", ".join([str(s) for s in groups[g]]))