Back to Community
Crowd-Sourced Stock Sentiment Using StockTwits

Hey Quantopians,

Pierce here, I work at StockTwits, a platform focused on real time trader and investor idea exchange.

Collaborating with Seong Lee of Quantopian, we’ve taken a sample of over 150,000 individual StockTwits messages for three highly followed stocks ($AAPL, $TSLA, and $FB), and one ETF ($SPY), aggregated them into daily signals (call it ‘bullish’ or ‘bearish’), and back tested them over the course of a year. The results were positive, and pretty awesome, so we wanted to give you a little insight to how it works.

In any given month, individual, institutional & hedge fund managers share over 1.3 million trading ideas on StockTwits. We aggregate the data to build consumer tools as well as help quantitative investors better grasp volume and trends in the overall market.

Investors on StockTwits add “bullish” and “bearish” tags to their trading ideas, implying a specific position towards the given stock, option, ETF, currency pair, or commodity. For the crowds, this is an easy way to quickly grasp StockTwits sentiment, without having to spend resources for conducting natural language processing.

To keep the initial analysis simple, we’ve aggregated the StockTwits messages (quite literally the number of bulls and bearish signals per day) into one daily signal that we use to enter positions at the beginning of market open and keep that sentiment until that sentiment changes. The sentiment used to calculate the trade - short, long or null - is for that given trading day up until the committed trade.

Bullish/Bearish Sentiment (-1 ~ 1): This metric is calculated by looking at the number of bullish/bearish tweets for that given day. If the number of bullish tweets are greater than the number of bearish tweets, the security receives a score of 1 and vice versa for bearish with -1.

Check out the attached backtest! We’d love to see if anyone can improve the algo and still avoid the look-ahead bias. Additionally, if this limited data set proves interesting to you and you’d like to access 1-3 years of raw, historical StockTwits data, send me an email at [email protected].

Keep building!
Pierce Crosby
Head of Partnerships, StockTwits

Clone Algorithm
743
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 55366701e82b470d4e8e067f
There was a runtime error.
46 responses

Here to answer all questions about backtest today. Holla at me.

Very interesting. But I can't run it myself:

19 Error Attempt to fetch_csv from a redirected URL. Change the URL to https://www.copy.com/error?id=1021&ref=/EIYprlaWKg9lC7Jm

Can you upload the csv file somewhere else or provide a direct link?

We were having some issues with the data-link, it's been fixed and is running now.

Give it a shot.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Just to be clear, we created a new URL for the data and attached a new backtest to the original post using that URL.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thanks for the share. Always interesting to see sentiment-based algos. Have you played with different time deltas, instead of just day-over-day triggers?

Also, if StockTwits allows users to flag their own sentiment, what can we attribute the March - April 2013 drawdown periods to? Macro headwinds?

Thanks

Ok, it's working now. The lines that puzzle me are these:

    #: For all our bulls, bears, and exits order the appropriate weight  
    for b in bulls:  
        order_target_percent(b, 1.0/len(bulls))  
    for b in bears:  
        order_target_percent(b, -1.0/len(bears))  

If you buy stocks for 100% of the portfolio, and then sell stocks for 100% of the portfolio, haven't you invested 200% of your available means? Or what's happening here?

Very cool to see such good returns, it's clearly more than I expected from such a strategy.

Hi Epic,

I wonder if that's case, the code is looking to see if you have 2 stocks that are going to be BULLS and 2 that are BEARS for the day - then you're long positions will be 50% in each of those bulls and 50% in each of the Bears. (1.0/2) for the weights

Our order_target_percent method (https://www.quantopian.com/help#api-order-target-percent) does all the smart ordering for you, unless you do a double order sometime in your algorithm (which isn't the case here as we only do the ordering once per day)

Hey @gregory, time deltas is something we actually pondered over the most, i.e. is it best to use daily timeframes, versus hourly, versus weekly. Would be great to run comparisons. The first take was surprisingly positive, so we'd expect further modeling would be even more successful.

To your point about the 13% drawdown in March 2013, it's very interesting because the overall market was quite positive, suggesting StockTwits' crowd was net negative during that period.

Thanks Pierce.

Could you share, for the back test data used, how many twits per day, per stock were bullish vs bearish? How long has StockTwits been allowing users to tag sentiment in their posts? I would imagine overtime the sample sizes (total twits) would be much larger. Perhaps as the sample size grew throughout 2013 (depending on when your website implemented this feature) the accuracy of sentiment improved?

The real interesting thing about sentiment is any user that posts about their trade idea or projection could be using a variety of methodologies. For example someone could say “buy AAPL due to crossing SMA” (technical indicator) or “short BBRY as share price exceeds book value of assets” (fundamental). In either case sentiment is formed which makes sentiment a really powerful indicator given enough diverse sources… weighted correctly.

It is easy to fool yourself when you compare a strategy trading high-beta or large-swing stocks with SPY as the benchmark. It is apples vs. oranges. During your study, TSLA gained 329%. You could have moved in and out at random and handily beat the S&P500. And FB was up 96%. AAPL was only up 4%. The return of your basket of stocks, with equal dollar weighting, was 114%! So, it appears your algorithm underperformed buy-and-hold. Can you separate out the components into separate charts (e.g. AAPL v. AAPL, SPY v. SPY, etc.)? Otherwise, you risk being fooled by randomness.

@Ken, good point on buy & hold return. The four $TICKERS we not however selected at random, they were based on the total size of message volume expressed on StockTwits in 2013. You could separate out FB from TSLA, sure, but you could also add 20 more tickers ranked by message volume. This is just an example bucket using the largest volume tickers.

Well, just using AAPL as the only security in the algo it does beat BUY & HOLD, quite considerably, see attached.

Clone Algorithm
92
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 55369832a2bbb30d51abc51e
There was a runtime error.

Wow @gregory, that's awesome, you just pulled the AAPL column from the CSV and ran the same backtest?

Also @gregory to your point: the explicit "Bullish/Bearish" tags were added as a feature and in the code in early 2012, so the total tags expressed per million messages has grown significantly. We hope to run similar backtests on 2014 data in near future as well.

@Gregory, Can you normalize the chart to start on Feb 25? You are starting with a 14.5% head start vs. buy-and-hold.

I just noticed the 2013 data provided by StockTwits actually starts 2/22/2013 while the backtest started earlier on 1/4/2013. The backtest missed the downturn in the beginning, artifically lifting the algo's results.

Attached is a more "apples-to-apples" comparison where the results are much less significant.

Clone Algorithm
92
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 55369b749ae6b60d3d7c6ec6
There was a runtime error.

Most of the difference can be accounted for in the one week period from 4/15-4/23, where the algorithm went short. The short from 3/4 - 3/6 hurt the algorithm. For the remainder of the sample period, no advantage.

@ Ken look's like I just missed your post by 1-minute, was doing exactly that :)

@Gregory. Yes, messages crossed. Thank you for taking the time to test this. Helps immensely in analyzing the algo.

I was digging into what could explain that April timeframe too :)

Found two things in the press:
1) On the 16th, the NYT was awarded a Pulitzer for reporting on the human costs associated to Apple's supply chain http://appleinsider.com/articles/13/04/16/nyt-wins-pulitzer-for-ieconomy-investigative-series-on-apples-supply-chain
2) Generally there was some (after the fact) press focused on the volume of short interest in Apple over the course of that month
http://fortune.com/2013/05/28/short-interest-how-apple-inc-got-mugged-in-april/

It might be that stock twits' data is expressing those signals, especially the 2nd which might be related to the first story.

Seems problematic that the algo both exits positions and cancels open orders at market_close.

Hi Mr ruiz

Good eye! That was an architect from a previous version of code. In the latest version of the algorithm, we simply do regime changes and don't exit positions at the end of day.

Taking a further look at this, the algorithm will deploy leverage perhaps unintentionally under certain situations. I ran this with just 2 stocks as well.

Sample log output:

2013-09-17PRINT Bears:0 | Bulls:2 | Leverage:0.9994452312  
2013-09-18PRINT Bears:1 | Bulls:1 | Leverage:1.00205033524  
2013-09-19PRINT Bears:0 | Bulls:1 | Leverage:2.05294101129  
2013-09-20PRINT Bears:0 | Bulls:2 | Leverage:2.17893779178  
2013-09-23PRINT Bears:0 | Bulls:1 | Leverage:0.998815154718  

If you run it with 3 or more the leverage changes, as expected. Since this algorithm holds overnight, the max leverage should be capped at 2.0x.

Trying to understand what the code is doing I look at the below snippet:

   #: For all our bulls, bears, and exits order the appropriate weight  
    for b in bulls:  
        order_target_percent(b, 1.0/len(bulls))  
    for b in bears:  
        order_target_percent(b, -1.0/len(bears))  

Not sure how it is getting to 2x+ leverage?

Think I figured it out...

Stocks can be 1... -1 ... or 0!

When a stock is 0 it is held, thus it is not in the calculation for bulls/bears. My fun fix is:

 if weight == 1:  
                        bulls.append(context.stock_twit_mappings[c])  
                    elif weight == -1:  
                        bears.append(context.stock_twit_mappings[c])  
                    elif weight == 0:  
                        neutr.append(context.stock_twit_mappings[c])  

and

    for b in bulls:  
        order_target_percent(b, 1.0/len(bulls))  
    for b in bears:  
        order_target_percent(b, -1.0/len(bears))  
    for b in neutr:  
        order_target_percent(b, 0)  

Definitely introduces some sort of bias but using the -1 weight as a sell signal seems to produce some pretty good numbers.

Clone Algorithm
76
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 55439bfdc9a84c0e4403f8a5
There was a runtime error.

Hey @Greg,

We noticed that too, during holding periods, leverage jumps significantly. To eliminate this, if a 0 score was produced, positions could be closed, rather than requiring -1 for sell signal. Let us know what you get.

@Ryan, can you elaborate a little on the update you made? You're canceling all previous orders at end of day?

@Pierce, I made a change to hold a stock long until it sees a -1 weight then it sells all of that stock instead of going short. Any stock in the bears list that were not in the portfolio will not be affected.

It seems that Monday morning sentiment might be better than each day. Setting the bulls_and_bears function to just run on week_start instead of every_day decreases volatility and returns increase significantly over the long term. Is this information going to be made available to the individual who is just looking to do some weekend trading? I was also working on a sentiment algorithm to scan RSS feed news headlines for sentiment but this data you have seems to be more reliable than my own analysis. You actually get sentiment by getting the bullish-bearish tags whereas news headlines tend to be more factual-neutral and opinions are left to the readers.

Clone Algorithm
62
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 5543c3ff4242e30e48220883
There was a runtime error.

@Ryan slightly different than my implementation above, but wouldn't you want to take advantage of the negative sentiment as well? Unless the trader has a reason to be long only, adding short positions to a portfolio could be beneficial. Agree?

@Ryan and Greg, the application of -1 as sold versus short becomes inherently net long, but if you're in a upward market, directionally, this could be a more accurate representation of the StockTwits crowd...

Generally speaking, the community is net bullish (on a per-sentiment-tag basis), so sentiment neutrality (i.e. -1) might actually suggest market neutral stance, versus "bearish" stance.

@Gregory I definitely agree, I was just messing around with different things and thought the results were worth sharing.

@Pierce that makes sense but I would be interested to see what results would look like with a wider set of stocks and/or over a longer time frame as I said there is probably some bias introduced by not going short since I know which overall direction the market was going in

Hey @Jason,

The lower trading frequency as a barometer for higher returns is an interesting point to be made (buy/sell on Monday). The sample looks at 4 TICKERS with the biggest message volume on StockTwits (which also happened to be wildly successful in 2014). As @Ken Simpson pointed out buy and hold of those 4 tickers for all of 2014 would have beaten the algo AND the benchmark.

Another interesting study we were thinking about doing, would be buy, sell or neutral the top 4 message volume TICKERS on a weekly basis (rolling into NFLX one week, and TSLA, the next, etc etc etc.). Lemme know if you want to explore more and I'll try to help!

@Pierce,

2013 Netflix did really well, do you have data from 2011 when they crashed their stock breaking their services apart? It would be interesting to see if you would have made a lot of money from those news headlines. I also like the idea of working with the 4 greatest volume. Perhaps a whole study on all of the data (this would be computationally burdensome) to see how much data you need to make a statistically significant decision and bet on a stock?

I've also been looking into a mathematical idea of polynomial chaos. Applying it to sentiment, it would be a way to say with small amounts of data that you may be more certain about the outcome than with datasets which were larger but may contain more noise. That way you wouldn't just be relying on the largest amount of data that you have, but have a mathematical way to say you are more certain about some of the data you have. Just some ideas I've had floating around in my head.

@Jason, we do have NFLX data from 2011 certainly. I would agree with you, looking at moving averages of sentiment would give you a good sense of whether or not sentiment fell 1 or 2 standard deviations outside the mean (and whether or not to execute a trade on it).

We've done a couple studies looking at the signal-to-noise ratio of StockTwits versus some structurally similar data providers, definitely a case to be made for StockTwits' quality... Be in touch! [email protected]

@Pierce, e-mail sent. Not wanting to rush you or pressure in any way, but 'm eager to try some backtesting with raw sentiment data as a source :)

@Pierce same here!

Hey @Rob and @Greg, you both have responses waiting for you. Happy trading!

@Pierce: Is it possible for anyone else to get a sample of the raw sentiment data? Does StockTwits have any plans to offer this as a service? I would love to play around with it.

Bob, feel free to email Pierce directly if you'd like to access the data in the short term. In the long term, we're working with data vendors to make this process much easier.

@Josh Thanks. Looking forward to it.

@Josh: Any updates on working with data vendors for accessing raw sentiment data? Would love to run some backtests.

@Pierce: Just sent you an email - it'd be great to get my hands on some data to backtest some ideas I have!

You can check out our data page where you'll see Accern's data available for use in Research.

We're also working with PsychSignal to incorporate their twitter and StockTwits sentiment feeds into the platform in a similar fashion. This should be available soon.

We're also working on exposing these data sources through the new Pipeline API for use in your algos.

Thanks Josh!

You can find PsychSignal's sentiment data for both their twitter and StockTwits sentiment feeds on Quantopian Data.

You can begin testing by going to Quantopian Data, choosing the dataset you want (StockTwits Trader Mood, Twitter & StockTwits Trader Mood, etc) and hitting Get Free Sample

These are available through the new Pipeline API

Hey guys, Pierce here from StockTwits.

Just wanted to circle back on this post - as we've recently gotten a lot more inbound - to make sure everyone can get access to this data...

While we have archived the direct data sample, our partner company, PsychSignal, now has Bullish & Bearish totals available via the Quantopian platform. You can find them here: https://www.quantopian.com/data/psychsignal/stocktwits

This is the easiest way to get free access to the historical totals for any given ticker, not JUST the sample of tickers we were using above. I hope this helps you all, and if you have any questions, keep em comin'.

Cheers,

And to piggy back on what Pierce said, this Psychsignal data is currently free to use on Quantopian, inclusive of paper trading in the contest.