Back to Community
Accern News and Blog Backtest Results using Quantopian (Link to PDF Report attached)

[Quantopian Update] - This algorithm is now outdated. While we haven't replicated this algorithm, we've provided a few other examples using Accern's Alphaone data feed in strategies. Check out this Earnings Drift strategy using Accern for example.

Hello Quantopians,

We have recently backtested over 1.5 million news and blog articles (2.5 years length) with the help of Quantopian community members. We have received very positive results in the backtest and I would like to share it with you all.

The news and blog dataset was designed by Accern. Accern specialized in big data media analytics. We monitor over 20 million news and blog sources each day and provide over 25+ fields of analytics designed specifically for quantitative trading. Accern currently serve some of the largest multi-billion AUM hedge funds worldwide.

The fields of analytics used in this backtest chart are: Article Sentiment, Impact Score on Entity, and Overall Source Rank.

Article Sentiment (-1 – 1): This metric calculated the sentiment score of an article which is relevant to a company.

• A positive sentiment score means that the article was written in a positive tone towards a company. • A negative sentiment score means that the article was written in a negative tone towards a company. • This can be used as a directional trigger.

Overall Source Rank (0-10): This metric calculated the timeliness and reposting of a source; can be used as a trust factor and a viral factor.

• A high overall source rank means that source x is usually first at releasing articles before other sources and other sources usually repost the same information after source x has posted it. • A lower overall source rank means that source x is usually late at releasing articles before other sources and other sources usually never repost the same information after source x has posted it. • This can be used as a trust filter.

Impact Score on Entity (1-100): This metric calculated if the article will have a greater-than-1% impact on the stock on the same trading day.

• A high impact score means that the article has a high probably of affecting the stock price by more than 1%. • A low impact score means that the article has a low probably of affecting the stock price by more than 1%. • This can be used as a decision maker to execute an order.

The backtest report explains it in more details. Please review the report and share it with anyone you like. If you would like to have access to our 2.5 years of news and blog data, send me an email and I will provide you access. We want more of the community to conduct further test on the data to exploit it's value. We have just scratch the surface.

Accern Backtest Report

Request access to over 2.5 years of news and blog history (7.5 million articles) by sending an email to [email protected].

Best,
Kumesh Aroomoogan
Co-Founder and CEO, Accern

Clone Algorithm
104
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 55d2522205f8f40c6fe0b87f
We have migrated this algorithm to work with a new version of the Quantopian API. The code is different than the original version, but the investment rationale of the algorithm has not changed. We've put everything you need to know here on one page.
There was a runtime error.
30 responses

Impressive . Thanks for sharing this

Cheers

Lionel

Hi Kumesh,

This is exciting. Edit - Your email is in your post, my mistake!

Thanks,
Seong

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Can we safely conclude from this data that positive news releases of a company (new product development, higher than expected earnings, buy status) tend to correlate with increasing share prices after the news release?

Since I am very inexperienced in coding (I have never coded in a particular language before), does your code immediately purchase the stock after the news release? Does it also go through several criteria such as credibility of the source (greater than 5)and article sentiment (positive) before it makes purchases of the stock?

The integration of sentiment is really a great direction and the work looks very promising. Doesn't Quantopian only fetch at the start of the trading day? Your methodology releases you from that constraint but I'm curious whether you looked at the implications of only seeing articles from the previous day and as a corollary whether longer delays in integrating sentiment data has a negative affects on returns. It also looks as if articles outside of the trading times (night/weekends) don't get processed.

@New Trader: Yes, you can use the metric "first_mention" to purchase stock right after the release of a unique story which haven't yet been exposed to millions of viewers. You can also combine various metrics such as source/author ranks and impact score to make decisions on important and credible stories. These metrics decreases your risk of trading on a false story (rumor).

@Carlie: Thank you! And yes, you are correct. We only trade on articles that were released between 9:30 AM EST to 4:00 PM. But would be happy to discuss alternative options as well. We want to try as many options and methods as possible on this data set to understand its true value.

For those of you who have requested our historical data to conduct your own backtest, I will get back to you very soon.

What is neat about this framework is its ability to investigate many areas around sentiment. Another interesting area is the value of predictive analytics or the "time machine scenario". For instance, marketing departments attempt to coordinate releases which creates both buzz and followup articles. Often these can be centered around predictable dates such as trade shows, earnings releases, and industry community events. Knowing the dates of these events and the value that predictors historically generate could support a predictive model approach. This framework provides a simple way to measure the value of predictors on markets and individual stocks by simply subtracting days from the date column and re-running.

Hi Kumesh,
Impressive work.What kind of data did you use for your back test? Daily? Intraday? HFT? Did it include commission/slippage at all?
Also, isn't there a risk that if many people follow your algo it'll dissipate at some point?
Thanks for sharing.

@Carlie: That's a great observation and it would be very interesting to see once we start testing event-driven strategies. We actually did a quick test using an event-driven strategy and found many credible rumors days before the actually release of major announcements such as mergers & acquisition, lawsuits, etc. We're currently testing intraday and will release a report on that then we will move on to event-driven. So stay tune! :)

@Carl: Thank you, I appreciate it! Our data are timestamp in seconds but we traded in minutes and applied a trend trading strategy. We did not close our positions end of day. We only long/short and exit positions based on the signals in our data set. It does include commission and slippage as well.

As for your last question, this is just a very simple strategy we used. Nothing special about it. You can call it a starter strategy. Many quants can exploit it's value by combining it with their own strategies, combine different metrics, set different conditions, etc. There are hundreds of combinations with metrics and conditions inputs you can use with our data set so it's very unlikely for everyone to be using the same type of strategy which will dissipate the returns/alpha.

There is currently a large demand for our historical data so for those of you who have requested it - I will reach out to you by this weekend. I really appreciate your patience and enthusiasm to backtest our data! This is what we wanted from the community :)

Is there any way to figure out if the juice in the sentiment is gone? I mean, the sentiment could have had its effect and it might already be in the price.

Thanks Kumesh for sharing your results. I did share myself some time back similar backtests in Quantopian on our own News and Blog sentiment data generated at InfoTrie with our engine FinSentS (portal.finsents.com or - APPS FINSENTS on Bloomberg - ) .

There are hundreds of simple and original applications to sentiment data - as a main strategy or as an add-on to mainstream strategies. Glad to see your own implementations.

A few additional remarks:

  • Sentiment quality is important, and can be boosted by relevant implementations of algorithms be them "statistical with light linguistic", "hardcode linguistic" or relying on Machine/Deep Learning;
  • Beyond the simple polarity of sentiment, news flow (volume) are also quite interesting and should be put in parallel with actual trade orders flows / prices changes. They are also a good predictor for volatility - which open doors to many additional applications;
  • Ranking is a interesting metric, but beyond free data people in the field also look at what can be done with private or premium datasets (like Bloomberg or Dow Jones);
  • Look at aggregates : Sentiment data is also very valuable (especially on stocks) for aggregates (for instance indexes, or at industry/sector level). It is also a way to overtake some of the noise which me lie in the sentiment analysis done at a single article level (an extremely complex topic...)
  • history history history :-) Like for all back testing a larger depth in the back testing is key. Our up to 15 years + of history within various datasets has proven infinitely valuable to refine trading strategies.

Thanks @Frederic for the reply. Glad to know about your app.
I completely agree with you that there are various sentiment data based implementations available. It is exciting to see that the modern investors are taking these factors into account. In a nutshell, the fundamentals are still human-centric; and how different information is perceived and acted upon drives the market.

  • Agreed with your remark on sentiment quality. A practical insight here is to migrate to Convolutional Neural Network approach as soon as possible. Deep Learning is the key, although the models are difficult to train, especially when you are dealing with over 100 Terabytes of data. We are trying to refrain from hardcoded linguistic, and will probably drop that altogether in our next version update.

  • News flow definitely plays an important role here, but it is really difficult to identify a baseline. For example, is 100 articles about AAPL comparable to 100 articles about BP? Maybe not. That's where we start building individual models for each security. At Accern, we go beyond that and build models for different "event" (like M&A) spread as well. Look at our "saturation" and "volume" metrices.

  • We agree, and that's why we have increased our data coverage recently. Speed is the key here, that's why we are also constantly looking for direct access of information (reports, analyses) from partner banks. As these reports are sent to firms like Bloomberg at the same time, we believe that we may have an upper hand there.

  • That's a very good point. We also aggregate sentiment by entities (e.g. stocks) as well as stories. As each article contains entity-related information, such as industry, index, sector, exchange, competitors etc., a trader may aggregate the sentiment by any of these additional attributes on the fly. We also have an average_day sentiment information, using which you can see how the sentiment for an entity evolved over time.

  • This is one point where I have a slightly different view. I think that it makes sense to run extensive backtest for strategies that utilize market data, and are solely dependent on them. But when we talk about data, such as social media interaction etc., that has evolved significantly in last 4-5 years, the rules of the game change. These days you will find young/old traders who trade solely on news information they find on social media and forums. They are affecting the market. So, you are automatically dependent on evolving algorithms that factor-in relevant information, like the rate of page view, or virtual hyperlinks/relationships between sources. That, in turn, means that you can not (should not?) backtest on 10 years older data. That's why we support out-of-sample or forward testing more, even though we also have access to older news corpus.

All that said, I would like to emphasize that there is higher risk, technology-wise and cost-wise, involved in developing systems that not only scale well, but also take into account hidden factors -- an example would be adjusting weights of sources that are isolated on web, or determining the credibility of the information posted (not only the source), or whether to trust the source (domain) or the author (person) of the article etc. There is a lot more to explore, but we are glad to be moving in the right direction.

@Kumesh

Convolutional Neural Network why not - many algorithms are out there! Linguistic is both very useful and precise but very difficult to scale indeed. To simplify the tradeoff is generally the following: if you want an approach that scales fast and "cheap" you go for statistics with light linguistic, the more you want to increase your precision keeping the ability to scale the more you go for Machine up to Deep Learning (will really need then more servers!). But for accuracy with the right linguists in your team you can do great things!

We have a similar approach. We build different models for various "dimensions" in the data (M&A is one) - Even though multiplying infinitely models is a pitfall one should avoid as it makes things difficult to understand from an external point of view (too much "blackbox").

Reports and analysis are easy to get from banks. I am talking about premium real-time news flows from news agency which are way more difficult and expensive...

Agreed.

I agree that casual (and less casual) investors rely more and more on multiple media. For the good - for instance social media (esp. Twitter) are great to analyze large macro event (earthquakes, election etc ... ), company products (Iwatch, Iphone 5 etc...) and ... US stocks. But - and that is often a bias my American friends forget to consider :-) the "liquidity" of relevant signals may be very poor on many markets or languages. Our policy is then more a case by case integration of social media derived signals for things we consider meaningful. For the rest 15 years + of backtesting on a signal derived for instance from Dow Jones (which we for instance offer) or all major news Agencies (BBG, Reuters, AP, AFP, ...) are both very powerful and stable. You will never convince a large portfolio manager or a large Hedge Fund only with a two year paper back test where you claim a 200% performance (even if well executed). They will for good or bad reason see no different with simply flipping a coin.

Nice example from a useful data source. Quick question, does your data extend to pre-2009/8 timeline? It would be nice to see how the algorithm performs in a bear market. Thanks for posting this.

If I am reading this correctly - when the sentiment analysis dictates that a long position should be taken and there is already one, the algorithm simply passes. Was there a test done to see the performance if the algorithm doubles the position upon a second favorable sentiment determination?

@Bharath: The article sentiment which we used was just one of the three metrics that acted as a decision factor for our trade execution. We mainly wanted to use article sentiment as a directional trigger only. This means that if an article is very positive about a company, we would go into a long position and vice-versa. We could try to backtest article sentiment alone with the price movement but we would be setting up ourselves for a lot of risk exposure. In order to minimize our risk, we needed to apply overall_source_rank which lets us know if the information itself is trustworthy and furthermore, our impact score lets us know if the information will have some sort of impact on the price of the company. That being said, we are currently working on reinventing the wheels for sentiment analysis specifically for trading itself. We're quite far along the way and we will be sure to give you guys an update and show some backtest results on its performance once its released. Stay tune :)

@Frederic: Thanks for reflecting on some of the very important issues. Of course linguistic approach adds to the accuracy, and you are right that it is very difficult to scale. We are just trying to automate that process. As for the premium real-time sources, we do cover them as well. They are included in the 20M+ sources we monitor, but they are fairly small as compared to other, low-traffic, early information sources. And for the last point, bernoulli distribution? :) We are sure that we perform better than that. :)

@Udhay: Thanks for replying to this post. Sorry, but we don't have the pre-2009/8 coverage in the data yet. We are in the process of covering those 2-3 years of additional history to get the performance benchmark for bear market. Stay tune :)

@Udhay - I can offer pre-2008 data. Simply contact me.

Thanks Kumesh. Would love to see something to help quantify the half life of news / sentiment.

Thanks Kumesh. Great works. Accern Backtest Report link seems that can not be accessed.

@Andrew: Yes, that is correct. Also, we did perform the test based on your second statement as well. The returns and alpha was higher but the rest of the performance metrics declined a bit.

@Qiang: Thank you! The link should work. Can you retry clicking it? If not, here is the link again. https://dl.dropboxusercontent.com/u/70792051/Accern%20Backtest/Accern%20Backtest%20Report.pdf

I apologize I haven't gotten a chance to reply to some of your emails yet. I will get to it soon but we are currently working with Quantopian to figure out the best approach to provide you all access to the 2.5 years of history to conduct your own backtest on the platform. I will update you on the progress.

Best,
Kumesh

Just released an article about an interesting finding in our data set: https://www.linkedin.com/pulse/accern-detects-major-story-103-minutes-before-media-kumesh-aroomoogan

Hey all,

Just a quick update: we've had a number of people looking to use Accern's data directly in their algorithms, especially for the contest.

You can do that now through Quantopian Data and Pipeline.

Here's a simple long/short algorithm that James Christopher put together to get you guys started: https://www.quantopian.com/posts/accern-alphaone-long-short

Let me know if you have any questions,
Seong

how do i forward test any of your system either in live trading or quantopian paper trading
what do i need

Hi Nurudeen,

We recently released a direct integration to Accern's Alphaone dataset. You can learn more about the data here: quantopian.com/data/accern/alphaone

That data can be used for out of sample paper trading. It is different from the data sample provided here so we published a simple sample algorithm for you to try here: https://www.quantopian.com/posts/accern-alphaone-long-short

You'll need to purchase a subscription to the data to get the most recent updates. It is a monthly subscription that you can cancel at any time.

Hope that helps,
Josh

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Hi, Kumesh

It's really impressive strategy! I found that the Alpha is low while the Beta is high. Dose this mean that the profit we got from the strategy mainly comes from the market. So I have two quick questions:
1. What's the performance for a bearish market?
2. What's leverage level did you use for the strategy?

Is this algo re-written using pipeline and Quantopian2? I followed the link above but couldn't find a strategy on the page that replicates this performance.
https://www.quantopian.com/posts/news-and-blog-sentiment-pipeline-factors-with-accern

Hi Kiran, this algorithm is now outdated. While we haven't replicated this algorithm, we've provided a few other examples using Accern's Alphaone data feed in strategies. Check out this Earnings Drift strategy using Accern for example.

Clone Algorithm
701
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# Backtest ID: 578f6e03d0ba390fa2bf287b
There was a runtime error.