Sentdex News Sentiment

Overview

Sentdex is a sentiment analysis algorithm, termed by the meshing of "sentiment" and "index." It understands the emotions people use in their online communication and translates them to computer language. This data can be used to gain a deeper understanding of the world, by showing you how people across the world feel about publicly traded companies. The sentiment scores are generated from four simple moving average (SMA) factors over the last 100, 250, 500, and 5000 news events for each stock. News events are pulled from over 20 sources including The Wall Street Journal, CNBC, Forbes, Business Insider, and Yahoo Finance.

Psychsignal data is available via the Pipeline API, which means it can be accessed in Research and the IDE.

Properties

  • Coverage: US only
  • Data Frequency: Daily
  • Update Frequency: Daily (updated every morning at ~7am ET)
  • Timespan: June 2013 to present.
  • Point-In-Time Start: February 2016

Methodology

Point-In-Time

Starting in February 2016, Sentdex data is collected and surfaced in a point-in-time fashion on Quantopian. This corresponds to when Quantopian started downloading and storing Sentdex data on a nightly basis. Timestamps for historical data prior to February 2016 are approximated by adding 24 hours to the asof_date of each record.

Usage

The sentiment dataset is a pipeline DataSet located in the quantopian.pipeline.data.sentdex module. The sentiment dataset provides access to Sentdex news sentiment scores for US equities. The sections below provide more information on the Sentdex sentiment dataset including a code example.

Import

from quantopian.pipeline.data.sentdex import sentiment

Example

This code snippet constructs and runs a pipeline that computes the mean sentiment score of assets over the last 5 trading days. Note that this example uses the builtin SimpleMovingAverage pipeline factor to compute the 5-day mean.

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data.sentdex import sentiment
from quantopian.pipeline.domain import US_EQUITIES
from quantopian.pipeline.factors import SimpleMovingAverage
from quantopian.research import run_pipeline

# Define a 5-day simple moving average sentiment factor.
mean_sentiment_5day = SimpleMovingAverage(inputs=[sentiment.sentiment_signal], window_length=5)

# Add the sentiment factor to a pipeline.
pipe = Pipeline(
    columns={
        'mean_sentiment_5day': mean_sentiment_5day,
    },
    domain=US_EQUITIES,
)

# Run the pipeline for a year and print the first few rows of the result.
df = run_pipeline(pipe, '2017-05-05', '2018-05-05')
print(df.head())

Pipeline Datasets & Columns

Datasets

sentiment - A pipeline dataset that provides access to the Sentdex sentiment signal derived from major news sources.

Fields

The sentiment dataset has the following fields (accessible as BoundColumn attributes):

  • asof_date (dtype datetime64[ns]) - The effective date of the sentiment record (date when the record first applies).
  • sentiment_signal (dtype float) - Sentiment signal determined by the Sentdex algorithm. This is a continuous value ranging from -3 to 6.