Back to Community
Deriving trading signals from Wikipedia page views (new feature!)

The stock price is the result of the trading decisions of many individuals. But what if you could peek into the information gathering process that precedes these decisions?

A recent paper uses Wikipedia page views to predict market changes.

You can try out how different Wikipedia pages affect the market using by clicking “Clone Algorithm” and editing the specified Wikipedia page.

Clone Algorithm
321
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# (c) 2013 Thomas Wiecki, Quantopian Inc.

import numpy as np

# How many weeks to average over
delta_t = 5

def initialize(context):
    context.article = 'Opportunity cost'
    fetch_wikipedia(context.article)
    context.order_size = 10000
    context.sec_id = 8554
    context.security = sid(8554) # S&P500

    context.history = []
    context.weekly_history = []

def handle_data(context, data):
    c = context
  
    if c.article not in data:
        return
    daily_views = data[c.article]['views']
    
    # Create a window of 5 days.
    weekly_full = append_window(c.weekly_history, daily_views, 5)
    if not weekly_full:
        return
      
    # Only trade on the first trading day in the week
    if data[c.security].dt.weekday() == 0:
        weekly_views = np.mean(c.weekly_history)
        # Run a window over delta_t weeks 
        #(+1 because the last one will be the current one we want to ignore).
        full = append_window(c.history, weekly_views, delta_t+1)
        if not full:
            return
        
        # Exit any prior positions
        amount = c.portfolio['positions'][c.sec_id].amount
        order(c.security, -amount)

       # Sell if weekly_views are higher than the past weeks
        if weekly_views > np.mean(c.history[:-1]):
            order(c.security, -c.order_size)
        # Buy otherwise.
        else:
            order(c.security, c.order_size)
        
def append_window(window, item, length):
    """Moving window that drops old items longer than length.

    :Arguments:
        window : List to append to.
        item : Item to append.
        length : Maximum length of the window.

    :Returns:
        True if window is full, False if len(windows) < lenght.
    """
    window.append(item)
    if len(window) < length:
        return False
    while len(window) > length:
        window.pop(0)
    return True
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.
Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

14 responses

This algorithm looks at page view counts of specific Wikipedia pages. The theory is that spikes in the number of page views can be used to predict a price change.

The hard part of this is collecting the data, but we did that already for you. We extracted page viewing history of certain Wikipedia pages from http://stats.grok.se/. You can use the data by calling the function fetch_wikipedia(). As arguments it either takes the name of a single Wikipedia page or a list of Wikpedia pages. The average daily viewing history is then made available in handle_data().

For this algorithm I used the wikipedia page 'Opportunity cost'. Once the weekly average of page views is smaller than the moving average of the delta_t (in this case delta_t == 5 weeks), we buy and hold the S&P500 for one week. If the weekly average is higher than the moving average, we sell and re-buy the S&P500 after one week.

Suggestions for improvement (please share improvements in this thread):

  • The authors used many different Wikipedia pages, listed here
  • Can you find a wikipedia page that outperforms the one I found?
  • delta_t == 5 is what the authors of the paper used. It would be interesting to see how the algorithm performs when this is changed.
  • The underlying algorithm is a very basic moving average cross-over. A more clever strategy might be able to do a much better job.

If this idea is interesting to you, please also see the example using Google page views.

'Fear' performs pretty well

Clone Algorithm
34
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# (c) 2013 Thomas Wiecki, Quantopian Inc.

import numpy as np

# How many weeks to average over
delta_t = 5

def initialize(context):
    context.article = 'Fear'
    fetch_wikipedia(context.article)
    context.order_size = 10000
    context.sec_id = 8554
    context.security = sid(8554) # S&P500

    context.history = []
    context.weekly_history = []

def handle_data(context, data):
    c = context
  
    if c.article not in data:
        return
    daily_views = data[c.article]['views']
    
    # Create a window of 5 days.
    weekly_full = append_window(c.weekly_history, daily_views, 5)
    if not weekly_full:
        return
      
    # Only trade on the first trading day in the week
    if data[c.security].dt.weekday() == 0:
        weekly_views = np.mean(c.weekly_history)
        # Run a window over delta_t weeks 
        #(+1 because the last one will be the current one we want to ignore).
        full = append_window(c.history, weekly_views, delta_t+1)
        if not full:
            return
        
        # Exit any prior positions
        amount = c.portfolio['positions'][c.sec_id].amount
        order(c.security, -amount)

       # Sell if weekly_views are higher than the past weeks
        if weekly_views > np.mean(c.history[:-1]):
            order(c.security, -c.order_size)
        # Buy otherwise.
        else:
            order(c.security, c.order_size)
        
def append_window(window, item, length):
    """Moving window that drops old items longer than length.

    :Arguments:
        window : List to append to.
        item : Item to append.
        length : Maximum length of the window.

    :Returns:
        True if window is full, False if len(windows) < lenght.
    """
    window.append(item)
    if len(window) < length:
        return False
    while len(window) > length:
        window.pop(0)
    return True
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

Neat, but I keep getting timeouts for most search terms. Even "fear" timed out on me. I'm not sure what the solution here is as the reliability of results seems low and this is likely dependent on Wikipedia's servers, right?

The data comes from http://stats.grok.se/, I also got quite a lot of timeouts when I played with it

You should be able to use a deque instead of a list to hold the history, which would make your state tracking a lot simpler:

from collections import deque

c.history = deque(maxlen=delta_t)

c.history.append(weekly_views)  
if len(c.history) < delta_t:  
    return  

@John: Neat, I was aware of deque but not the maxlen kwarg. Thanks! Do you want to post an updated, simplified version?

Btw. we are also increasing the time-out for the fetching so hopefully it'll work more reliably soon!

@Thomas If I did this properly, the backtest below should contain the simpler approach using deques.

Clone Algorithm
18
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# (c) 2013 Thomas Wiecki, Quantopian Inc.

import numpy as np

from collections import deque

# How many weeks to average over
delta_t = 5

def initialize(context):
    context.article = 'Opportunity cost'
    fetch_wikipedia(context.article)
    context.order_size = 10000
    context.sec_id = 8554
    context.security = sid(8554) # S&P500

    context.history = deque(maxlen=delta_t)
    context.weekly_history = deque(maxlen=5)

def handle_data(context, data):
    c = context
  
    if c.article not in data:
        return
    daily_views = data[c.article]['views']
    
    # Create a window of 5 days.
    c.weekly_history.append(daily_views)
    if len(c.weekly_history) < c.weekly_history.maxlen:
        return
      
    # Only trade on the first trading day in the week
    if data[c.security].dt.weekday() == 0:
        weekly_views = np.mean(c.weekly_history)
        
        # When our history buffer contains delta_t weeks of data
        if len(c.history) == c.history.maxlen:
            
            # Exit any prior positions
            amount = c.portfolio['positions'][c.sec_id].amount
            order(c.security, -amount)

            # Sell if weekly_views are higher than the past weeks
            if weekly_views > np.mean(c.history):
                order(c.security, -c.order_size)
            # Buy otherwise.
            else:
                order(c.security, c.order_size)
          
        # Append the current observation, pushing any observations
        # older than delta_t off the end of the deque.
        c.history.append(weekly_views)
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

@John: Yep, looks correct. Thanks! You might want to use delta_t instead of 5 though to make it parameterizable (or remove the delta_t even).

Agreed, although I do use delta_t as a parameter for context.history, but context.weekly_history is hard-coded to 5 (weekdays in a week).

Sorry about the performance problems. Thank you all for bringing them up. Some of this is limitations we knew about, some of it is a bug. I'll start with a bit more info about what's happening under the covers. For hundreds of wikipedia pages, particularly the ones in the paper that Thomas refers to, we maintain a local cache of the data. If we have a local cache, there shouldn't be a data problem.

For the ones that aren't in the local cache, we query http://stats.grok.se/. The data is organized by month. So we issue serial requests to get each month. Each request takes 2-5 seconds, and we give one minute for the data to load. So there is a limit on the number of "page-months" that can be loaded if the pages aren't in our cache.

We found a couple problems:

  1. We have a bug where we're being case sensitive where we shouldn't be. For instance, you'll see very different behavior if you look at 'debt' v. 'Debt'. We'll fix that.
  2. We're finding sometimes the individual page calls are taking longer than 5 seconds, so we're bumping that to 10 seconds. The overall limit of 1 minute is unchanged, though.

Other things to make this better:

  1. If you have other pages you'd like us to cache, let me know at [email protected]. We're happy to cache more, but we aren't in a place where we can cache them all!
  2. We find that if we query the website a second time it gets faster. So you may be able to succeed on a 2nd attempt that failed on the first when you're looking at non-cached pages.

Quick edit: overall limit is 1 minute, not 2.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I tried to live trade Wiecki's code and received this error

FunctionCalledOutsideOfInitialize:
File test_algorithm_sycheck.py:10, in initialize
File algoproxy.py:1332, in fetch_wikipedia

Anyways, I found impressive results from using these words for context.article:

  • positive
  • confident
  • defeated
  • cautious

Source: BlackRock

Clone Algorithm
31
Loading...
Backtest from to with initial capital
Total Returns
--
Alpha
--
Beta
--
Sharpe
--
Sortino
--
Max Drawdown
--
Benchmark Returns
--
Volatility
--
Returns 1 Month 3 Month 6 Month 12 Month
Alpha 1 Month 3 Month 6 Month 12 Month
Beta 1 Month 3 Month 6 Month 12 Month
Sharpe 1 Month 3 Month 6 Month 12 Month
Sortino 1 Month 3 Month 6 Month 12 Month
Volatility 1 Month 3 Month 6 Month 12 Month
Max Drawdown 1 Month 3 Month 6 Month 12 Month
# (c) 2013 Thomas Wiecki, Quantopian Inc.

import numpy as np

# How many weeks to average over
delta_t = 5

def initialize(context):
    context.article = 'positive'
    fetch_wikipedia(context.article)
    context.order_size = 10000
    context.sec_id = 8554
    context.security = sid(8554) # S&P500

    context.history = []
    context.weekly_history = []

def handle_data(context, data):
    c = context
  
    if c.article not in data:
        return
    daily_views = data[c.article]['views']
    
    # Create a window of 5 days.
    weekly_full = append_window(c.weekly_history, daily_views, 5)
    if not weekly_full:
        return
      
    # Only trade on the first trading day in the week
    if data[c.security].dt.weekday() == 0:
        weekly_views = np.mean(c.weekly_history)
        # Run a window over delta_t weeks 
        #(+1 because the last one will be the current one we want to ignore).
        full = append_window(c.history, weekly_views, delta_t+1)
        if not full:
            return
        
        # Exit any prior positions
        amount = c.portfolio['positions'][c.sec_id].amount
        order(c.security, -amount)

       # Sell if weekly_views are higher than the past weeks
        if weekly_views > np.mean(c.history[:-1]):
            order(c.security, -c.order_size)
        # Buy otherwise.
        else:
            order(c.security, c.order_size)
        
def append_window(window, item, length):
    """Moving window that drops old items longer than length.

    :Arguments:
        window : List to append to.
        item : Item to append.
        length : Maximum length of the window.

    :Returns:
        True if window is full, False if len(windows) < lenght.
    """
    window.append(item)
    if len(window) < length:
        return False
    while len(window) > length:
        window.pop(0)
    return True
This backtest was created using an older version of the backtester. Please re-run this backtest to see results using the latest backtester. Learn more about the recent changes.
There was a runtime error.

@Ethan: I was able to reproduce the problem. We'll fix it, thanks for reporting!

wikipedia page no longer... supported... why?