Back to Community
Sentiment Data: Comparing Accern and Psychsignal

Alternative data such a sentiment scores can be a good source of an alpha factor. Here, I will illustrate and compare Accern's D2 news sentiment scores to Psychsignal's stocktwits.

I have coded a ML algo that combines 7 alpha factors, one of which is sentiment data, that passes all Q contest constraints and thresholds using default commissions and slippage. The only limitation of this algo is the number of stocks in selection is limited to 100, going more than that, I get the Timeout Exception.

This notebook has the Pyschsignal stocktwits scores:

Loading notebook preview...
Notebook previews are currently unavailable.
12 responses

Here's the notebook with Accern's DS2 daily dataset:

Loading notebook preview...
Notebook previews are currently unavailable.

A subtle but remarkable improvements on almost all metrics was achieved using Accern's DS2 news sentiment daily dataset. In my opinion, Accern's news aggregation and predictive analytic process is more sophisticated than Psychsignal's aggregation of stocktwits.

Lastly, I would like to appeal to the Quantopian team regarding the Timeout Exception limitation I am encountering. I have no doubt that I can further improve this algo if I can overcome this limitation as more diversification would bring down the volatility and drawdowns along with the returns but would be more in line with an allocation prospect.

Comments and feedback are most welcome.

Great work James!

Thanks for your feedbacks and deeper levels of investigation into DS2. Do you mind talking a little bit more about other 6 alpha factors you are using in this strategy?

Best,
Brad

Thanks Brad. The six other factors are what I would call key ingredients to my secret sauce, so I can't divulge what they are. However, I will describe one factor that has the highest attribution to the predictive signal. It is the predictive result of a deep neural network that tries to predict the next day returns of SP500 Index using different data transformation of OHLC price action as inputs.

Thanks,

Totally understandable, no worries James. Interesting usage of the combination between deep neural network and OHLC, nice work.

@James: Great stuff! Thanks for sharing your tearsheets.

Regarding the Timeout Exceptions, are they occurring during your Pipeline execution in before_trading_start? If not, can you share a bit more information on the exceptions?

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

@Jamie,

Thanks for your prompt response.

Yes, the Timeout Exception occur during BTS. I get the follwing error message:

There was a runtime error.
TimeoutException:Call to before_trading_start timed out
Line: 155 in compute
df.dropna(inplace=True)

My current settings which is error free are:

NUM_LONG_POSITIONS = 50
NUM_SHORT_POSITIONS = 50
MAX_POSITION_SIZE = 0.035
In pipeline, I have a screen to select the top 100 of a factor.

However, if I change the settings to:
NUM_LONG_POSITIONS = 200
NUM_SHORT_POSITIONS = 200
MAX_POSITION_SIZE = 0.005
In pipeline, screen to select the top 200 of a factor.

I get theTimeout Exception.

James Villa, I presume you run the neural network outside of Quantopian and then read in the result within Quantopian through Fetcher? This Accern is a paid dataset right?

Hi Roy,

Yes, the neural network is ran offline outside of Q and then the results are uploaded to the new Self-Serve Data feature of Q which makes it now available for pipeline use, a limitation that the fetcher feature had. Accern data is also AI based and is a paid service.

Hi James,

Thanks for the extra info on the timeouts. You're not the only one who has run into timeout issues in before_trading_start, so our engineers are doing some investigating to see what we can do to solve the problem.

On your signal, have you tried incorporating some of the other psychsignal datasets? In particular, we have psychsignal datasets that also include twitter sentiment. I wonder if looking at the sentiment from both StockTwits and Twitter might be able to add a bit more to your signal.

Hi Jamie,

Thanks for the update. Yes, I've been following the Pipeline TimeoutException: any hope for a fix? thread. Aside from the technical issues that Scott Sanderson explained thoroughly, I would like to also point out that code efficiency and compactness play an important factor in avoiding Timeouts. I learned about this through trial and error. I advice authors to only load or import libraries that the algo needs to save on overhead resources most specially with ML algos which can be compute intensive. Hope Q engineering finds a suitable solution for this problem.

For psychsignal, I used:

from quantopian.pipeline.data.psychsignal import aggregated_twitter_withretweets_stocktwits as st

For sentiment factor, I did this: (bull_msgs - bear_msgs) / total_msgs
I will try your suggestion.

Hi James,

We've recently implemented a fix that may help with this timeout -- please see the original post here for more information.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.