Back to Community
Differently sized arrays in CustomFactor

This block fails with "ValueError: setting an array element with a sequence."

class MyFactor(CustomFactor):  
     window_length = 1  
     inputs = [BuybackAuthorizations.previous_unit,BuybackAuthorizations.previous_amount]  
     def compute(self, today, assets, out, buyback_unit, buyback_amount):  
         out[:] = buyback_unit[0]  

This one works fine:

 class MyFactor(CustomFactor):  
     window_length = 1  
     inputs = [BuybackAuthorizations.previous_unit,BuybackAuthorizations.previous_amount]  
     def compute(self, today, assets, out, buyback_unit, buyback_amount):  
         out[:] = buyback_amount[0]  

Why is one a sequence and one not?

Using array2string() shows that BuybackAuthorizations.previous_unit seems to be giving me a giant array which looks like this:

 [None None u'$M' ..., None None None]  

How can I grab just the u'$M' (or whatever) and convert it to a scalar?

My end goal is to be able to run:
if buyback_unit[0] == '$M': <do something> But it is complaining about comparing an array to a string...

Would appreciate any help!

6 responses

out[:] expects an array of floats, but it appears that previous_unit is an array of strings, which is consistent with the error message because each string is a Python sequence. So, you will need to convert the strings to floats before assigning them to out[:], which may be some work on your part depending on what strings are in that array.

I'd guess previous_unit has currency '$', etc and a multiplier 'K', 'M', 'B', etc, can be mapped to to say 1000, 1000000, and 1000000000, and then multiplied by the array of buyback_amount, to yield the values you will want for out[:] , that is (units x amount).

See for instance this, https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array

And, you will need a function that maps the strings in the array to the values you want.
For example:

out[:] = np.array( [ u == '$M' for u in buyback_unit[0] ] )  

will yield an array of 1.0 if '$M', else 0.0.

You can also consider a python dictionary where the keys are all the possible strings and the values are the floats associated with the keys.

The problem is that "BuybackAuthorizations.previous_unit" is an array of strings (as you noted). Factors can only output scaler values. That is why it works with "BuybackAuthorizations.previous_amount". The error is a bit cryptic but that's the reason. The output data cannot be a string.

So, one can work with the string data inside the compute function of CustomFactor as much as you like. You can do your compare logic as you wish. Just make sure when you set the factor output that it's set to a scaler (non string) value.

This post may help explain the error? https://www.quantopian.com/posts/customfactor-valueerror-setting-an-array-element-with-a-sequence-dot

This post may help steer you towards a solution? https://www.quantopian.com/posts/issue-equity-dataset-available-in-pipeline-new-classifier-function-relabel

Thank you very much for the responses! I will do some more exploration as time allows and hopefully update with my findings.

This is the full logic actually of what I am trying to do, I should have posted this originally.

class PercentBought(CustomFactor):  
    window_length = 1  
    inputs = [USEquityPricing.close,BuybackAuthorizations.previous_unit,mstar.valuation.market_cap,BuybackAuthorizations.previous_amount]  
    def compute(self, today, assets, out, pricing, buyback_unit, market_cap, buyback_amount):  
        shares_outstanding = market_cap[0]/pricing[0]  
        if buyback_unit[0] == '$M':  
            total_bought = buyback_amount[0] * 1000000.0  
            percent_bought = (total_bought)/market_cap[0]  
        elif buyback_unit[0] == "Mshares":  
            percent_bought = buyback_amount[0]/shares_outstanding  
        elif buyback_unit[0] == '%':  
            percent_bought = buyback_amount[0]/100.0  
        else:  
            percent_bought = np.nan  
        out[:] = percent_bought  

But it results in this:

<ipython-input-170-e2bc33ec159a> in compute(self, today, assets, out, pricing, buyback_unit, market_cap, buyback_amount)  
      4     def compute(self, today, assets, out, pricing, buyback_unit, market_cap, buyback_amount):  
      5         shares_outstanding = market_cap[0]/pricing[0]  
----> 6         if buyback_unit[0] == '$M':  
      7             total_bought = buyback_amount[0] * 1000000.0  
      8             percent_bought = (total_bought)/market_cap[0]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()  

I'd recommend reading the CustomFactor tutorial; buyback_amount[0] is a Numpy array, and you are comparing this array to a string. Familiarity with Numpy is also needed. Here's one way to write your factor:

class PercentBought(CustomFactor):  
    window_length = 1  
    inputs = [USEquityPricing.close,BuybackAuthorizations.previous_unit,mstar.valuation.market_cap,BuybackAuthorizations.previous_amount]  
    def compute(self, today, assets, out, pricing, buyback_unit, market_cap, buyback_amount):  
        out[:] = np.nan  
        out[ buyback_unit[0] == '$M' ] = (  
            1e6 * buyback_amount[0] / market_cap[0]  
        )[ buyback_unit[0] == '$M' ]  
        out[ buyback_unit[0] == "Mshares" ] = (  
            1e6 * buyback_amount[0] * pricing[0] / market_cap[0]  
        )[ buyback_unit[0] == "Mshares" ]  
        out[ buyback_unit[0] == '%' ] = (  
            1e2 * buyback_amount[0]  
        )[ buyback_unit[0] == '%' ]  

I have been a python programmer for many years but I obviously have a lot to learn when it comes to numpy arrays. Thank you for providing an answer, I'll try to understand it.

The first line sets the out[:] array to all NaNs. The subsequent three lines use boolean array indexing and assignment, see...
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.indexing.html