Back to Community
Bug?! - TypeError: can't pickle builtin_function_or_method objects

I am trying to port parts of the "Alphalens - a new tool for analyzing alpha factors code" in research notebook that uses multiple factors for building an Alpha ranking model to an algo for determining feature importance. The attribute feature_importances_ after fitting a randomforest tree ML model throws a run time error.

"TypeError: can't pickle builtin_function_or_method objects"

Could someone at Q let me know why this attribute works in research and not in the IDE algo code. There should be no difference in the results !!!

Thanks

feature_importances = pd.Series(clf.feature_importances_)

6 responses

The feature_importances method is currently not supported for some sklearn models (including RandomForestClassifier). We are aware of the limitation and we have a fix in our queue of future improvements.

By default, feature_importances requires parallel processing and we have not yet implemented this functionality into the platform.

Thanks Ernesto.

However I am somewhat confused as I was told the research environment is not running using any parallel processing yet these functions work fine in research.

Additionally is there any sklearn modules that support feature_importances in the IDE algo environment including live trading.

Any timeline for these future improvements?

Thanks

Hi Kamran,

Sorry, you are correct. feature_importances works properly in the research environment for RadomForestClassifier. The issue seems to not be directly related to parallelization, but to the use of serialization by feature_importances implementation. The IDE environment has stricter restrictions regarding serialization than the research environment.

Another community member ran into the same problem with RandomForestsClassifier, but reported GradientBoostingRegressor feature_importances working fine in the IDE environment.

Unfortunately, we do not have a timeline yet for these improvements, but it is definitely on our radar.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Are there others aside from GradientBoostingRegressor that are use-able? I'd like to use ExtraTreesClassifier ideally so if you suggest something similar that is supported it would be appreciated, thanks.

This used to work about a year ago

Hey,

I thought I would leave this note for other folks. This is the cleanest work around I found:

random_forest_clf = RandomForestClassifier()  
random_forest_clf.fit(X_train, Y_train)  
 all_importances = []  
for tree in random_forst_clf.estimators_:  
     all_importances.append(tree.feature_importances_)  
    importances = sum(all_importances) / len(random_forest_clf.estimators_)