Back to Community
New book on Quantopian/Zipline backtesting and modeling

Hi guys,

My third book is just released today. Trading Evolved is an in-depth guide into the world of Python based backtesting. Starting with the assumption of little to no prior knowledge, I'll take you on a ride which will eventually show you how to construct advanced trading models for equities and futures.

I started writing this book a year ago after finding a lack of this type of explanatory documentation in the Python world. The tools available to us are incredibly powerful, but it is not easy for most people to really get into this way of working. It can be a daunting process to learn about Python and backtesting, both from a technical and financial perspective.

My new book is highly practical and full of source code and detailed explanations. It's my intention that by the end of this book, you will be well on your way to becoming a professional systematic trader.

You can find the new book here: https://amzn.to/31tDkwn

A considerable amount of work has gone into writing this book and I hope that the community out there will have use for it. I would greatly appreciate reviews, comments and feedback.

Thank you,

Andreas

81 responses

I just ordered the book for delivery in start of September. I really look forward to reading it and promise to come back with feedback.

Thanks!
Fredrik

Much appreciated, Fredrik! I hope you'll like it.

Hej Andreas,

That looks awesome! I’ll most likely order it either way, but I’m curious if there’s a table of contents available, and also if the book covers alpha research and how to best avoid/limit overfitting?

Mvh,
Joakim

Tjena Joakim,

You'll find a complete table of contents here: https://www.followingthetrend.com/trading-evolved/

I don't cover those topics though. I find that Rob Carver is better than I at explaining those things, and would very much recommend his books.

ac

Tackar!

That looks like just what I need. Looking forward to going through it! I might pick up one of Rob's books too while I'm at it. :)

I already ordered the book and am looking forward to it.

Quick question - for the code in the book, will there be .py or .ipynb files available for download?

Either way, I cant wait to get my hands on this! Thanks again Andreas!

Both, but mostly Notebook files. Most of the models and demos is in Jupyter, with things like bundles and the like in .py.

All the code is in the book, with explanations, and also downloadable with inline comments in the source.

@Andreas, just ordered too, and looking forward to reading your new book. It is like it is coming out just went I have a use for such a book. Thanks for putting it out.

Andreas - your site seems to have stopped serving https - I can access the site with http - although the browser still keeps spinning I can see most of the content . Also - where is the code - I can't seem to find it on your site

Thanks, Geoff. You're absolutely right, and I'm getting the techies on it. It's either a total coincidence, or the release of the book a few days ago actually generated so much traffic that my site broke. A self-DoS...

Once the site is operational again, and hopefully that's within a couple of hours, you'll find the code as well as random sample data here: https://www.followingthetrend.com/trading-evolved/

Oh, and as you rightly point out, http still works even if it is really slow. So you can get the code here http://www.followingthetrend.com/trading-evolved/ under headline Downloads.

Thanks Andreas - managed to get the code. Going through your book and examples now. BTW - appreciate the insights in your books and good to have you in the python community

With a book like this, there's sure to be updates needed and errors corrected. I've started on the list already: https://www.followingthetrend.com/2019/08/trading-evolved-errata-and-updates/

I will keep a running log here of errors, updates and changes. As this list grows large enough, I'll push out and updated version of the book.

A curious new error to highlight: Seems like Zipline won't install on Conda 4.7.10. Solution is either to downgrade Conda, as show in the article (thanks to Richard Dale), or to use pip. The latter is a bit more messy though, so downgrading Conda is easiest for now.

i just bought it and looking forward to reading it

@Andreas, just finished your book. Much appreciated. Great job.

I bought your book with the anticipation that you would cover the stuff I needed now, and it is exactly what you provided. Your book, for me, will be a great time-saver by opening doors to greater possibilities.

Having Zipline on my machine is to keep all my stuff on my machine as well (total and guaranteed program privacy).

Thank you for all the code examples and explanations. Be assured I will find good use of them.

Again, congratulation for a job well done.

Thank you, Guy! I'm glad you liked it.

Book writing is a hobby for me, and I'll keep doing it as long as it's fun. And getting positive feedback is the fun part. :)

I am brand new here. This is one of the first posts that I read and I am happy about that. I just bought your book!

Does the book cover the issue how to use proprietary/own data with zipline?

Sure does. Two chapters dedicated to hooking up your own data source. I also show how to set up a local MySql securities database, populate it with data and use that data for Zipline. Equities and futures are covered.

I Just purchased this book. Looks like it will provide some great information to learn from. Thank you, and will let you know what I think of it soon!

I have Python 3.6 on my PC and I don't want to try and install Conda or Zipline. Let's say right now I'm too lazy/wary of installing issues.
Is it possible to follow your book here on, and only on, Quantopian, as the title of the post suggests?

The models in the book should be possible to replicate and run on Quantopian with minor modifications.

Also, I'm sure you're aware, but you can install a Python 3.5 environment without risking any issues with your current 3.6 environment.

Bought today! Thank you very much for your great work Andreas. It seems really good, especially for people like me who are not IT programmers. The chapters on custom data with Zipline and MySql seem very interesting!

Hi Andreas,

I just purchased the book, thank you.

I wish I saw the updates page earlier, I spent quite some time creating a Dockerfile to install zipline with conda. I ended up just using pip.

Looking forward to the read, thanks again!

Can we access the source code if we got the book on amazon?

Great book, it looks like it was a great deal of work

@Julian Sure, you can download the code even if you didn't buy the book. You can find the link here: https://www.followingthetrend.com/trading-evolved/

@Ryan Thanks, it took a lot of time and effort to put together. An Amazon review is a good way to vote for more books like this. :)

Hi Andreas,I have downloaded code all the data and processed both the stocks and futures data. I've run the stocks simulations etc and have had a productive time learning the thought process you are taking us through.
I am now on Chapter 15 and am having a few issues running the futures backtest.
1) In the book and code you have in agriculture 'BL, but that is not in the data.zip - I've just commented that out as an easy work around
2) In the zipline backtest - you start from - start = datetime(2001, 1, 1, 8, 15, 12, 0, pytz.UTC), but the futures in the data.zip file seem to start at about 2015 or so (they seem to start at different dates). I've played changing the date to (2015 etc but then I get

~/anaconda3/envs/Zipline/lib/python3.5/site-packages/zipline/data/dispatch_bar_reader.py in load_raw_arrays(self, fields, start_dt, end_dt, sids) 110 for i, asset in enumerate(assets):
111 t = type(asset)
--> 112 sid_groups[t].append(asset)
113 out_pos[t].append(i)
114

KeyError:

So I'm wondering if there is some missing data? If not - I can upload the full error logs, but I'm hoping that its just missing data

thanks in advance
Geoff

Hi Geoff,

The data is just random anyhow, and not even very realistic random. I merely provided that so that those who really have no access to any data can still play with Zipline. I generated this data myself with a simple Python based random walk script.

With the futures, it would be too much data for an easy download if I generate too far back in time. For my actual backtests, I used about 20,000 individual futures contracts.

My recommendation is this:

  • Read through the book first, until chapter 23.
  • If you want to replicate the models locally, find and subscribe to a data source for that type of data.
  • Construct your own bundle, as outlined in chapters 23 and 24.
  • Run the models on your own data, and update bundle name and instrument universe as needed.

One welcome development is that a data provider is soon releasing their own Zipline interface to their data, which would make things much easier. I have been testing Norgate's new interface, and it seems really good so far.

Thanks for taking up time over your weekend to answer these questions and give advice. Norgate looks to be reasonably priced, however it looks to be windows only installer and not practical for the likes of me (you'll find with python there are a lot of linux people out there). I can get my data from other sources and its not too hard to script.

Hi,

I bought the book and have read through about 1/3 of it and am really enjoying the author takes you through learning the python and the related libraries. I downloaded the sample code and data and have been able to get up through Chapter 6 samples working. For context, I've worked in the software industry for 30+ years and have lost track of how many computer languages I know, and I have used Python in the past.

So was enjoying the book and moved on to backtesting... and then Zipline. I have spent the better part of an afternoon trying every possible work around posted online to getting zipline working on my system. Yes, I've set up a Python 3.5 environment with Anaconda and tried all the variations on the errata page. I've had the most success with "pip install zipline" and while it seems to make it pretty far it eventually fails with a series of "Failed building the wheel..." errors that all end with "error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/"

I have MS VS 2019 Community edition installed. I still get the error. I found an artricle on this at https://www.scivision.co/python-windows-visual-c-14-required/ and it basically implied installing MS VS 2015 or 2017 might help, but I don't have a license for these so at this point I am SOL and calling it quits for today.

I humbly request the author (or someone) try install the latest version of Python, Anaconda 3, VS 2019 Comm Edition on the latest patch of Windows 10 and publish how they made zipline work. I'd like to know!

I'm looking forward to solving the zipline installation issue and moving on in the book :-)

Regards,

Laura P.

Hi Laura, i had similar issues installing Zipline, but downgrading Conda to 4.6.11 solved the problem. Also make sure you're running python 3.5 and the latest 3.7. Hope it works for you as well.
conda config --set allow_conda_downgrades true
conda install conda=4.6.11

Hi Maxim,
I managed to (sort of) fix the zipline installation issue by 1) finding a VS 2014 link on Stack Overflow (it's a MS download, but I could not find the link through the MS website) at http://go.microsoft.com/fwlink/?LinkId=691126&fixForIE=.exe. 2) fixing the VS 2014 installation as it was missing "rc.exe" on the PATH - instructions here: https://stackoverflow.com/questions/14372706/visual-studio-cant-build-due-to-rc-exe After 1) + 2) I was able to "conda install -c quantopian zipline" and zipline and all of it's required packages appeared to install. However, when I go to use zipline in Jupyter Notebook I get an error about LRU not being available when I import anything from zipline.
I did try your suggestion of downgrading Conda, but that generated a new set of errors.

I may just clean install python+anaconda+VS14 and try over. Tomorrow.

Again, it would be nice to get a set of instructions for fresh install of all the latest sw on the latest os. The book, errata, etc. are a bit adhoc in their suggested approach to getting zipline workling.

Regards, Laura P

A very good book! I tried a few days and finally got the zipline 1.3 installed ……. :)

For those who use Win10 and Anaconda, you can try this:
https://anaconda.org/robinszeto/env_zipline

For Zipline

I haven't read the book by Andreas, but these might help.

Python
I always install Anaconda to a directory with short name, like C:\Python36.
Then I don't use any of the special Conda aspects of it, I treat it like straight Python.
I'm using Anaconda 3 version 5.2 from https://repo.continuum.io/archive/, that is, Anaconda3-5.2.0-Windows-x86_64.exe

Compiler
Laura, correct, that link/solution you posted already, in my experience has been the simplest, quickest and most surefire way to take care of the LRU problem with a C compiler on Windows 10: http://go.microsoft.com/fwlink/?LinkId=691126&fixForIE=.exe. So I'm surprised there were further problems and would check these ...

Path
Elements of the Windows path involving python in mine working fine with zipline:
C:\Python36;C:\Python36\DLLs;C:\Python36\Lib;C:\Python36\Library\bin;C:\Python36\Library\mingw-w64\bin;C:\Python36\Scripts;C:\Python36\bin

Dependencies
Another thing that can sometimes help, checking dependencies:

python.exe -m pip install --upgrade pip  
pip install pipdeptree  
pipdeptree -p zipline  

Supposing pipdeptree reports any conflicts, let's say, a module named zoo with a message looking something like this ...

C:\> pipdeptree -p zipline  
Warning!!! Possibly conflicting dependencies found:  
 - zoo [required: >=2.0,<3.0, installed: 1.5.10]  

... that can be resolved like:

pip install "zoo>=2.0"

Debugger
Once past all those, seems to me zipline is remarkably reliable.
If someone wants to dig into zipline with breakpoints, to my knowledge VS Code cannot, but PyCharm does.
(the name must be from snake charming since python is a snake)

The instructions above on how to get Anaconda 3 version 5.2 combined with Robin's YML files at https://anaconda.org/robinszeto/env_zipline got me past zipline installation issues. Thank you!

I created a quandl account and successfully got zipline to ingest the quandl bundle (with some warnings). Now trying to write a backtest using zipline with quandl data and am having what appears to be some errors parsing the quandl data. So progress, but still not quite there...

I did eventually get zipline working :-)

I had to apply the patches to the zipline files as outlined in “Patching the Framework” on Mr Clenow's errata page as the final fix. So the full install that worked for me was

uninstall all prior copies of python, anaconda
install VS14 build tools from here http://go.microsoft.com/fwlink/?LinkId=691126&fixForIE=.exe. and patch VS 2014 installation to add missing "rc.exe" on the PATH - instructions here: https://stackoverflow.com/questions/14372706/visual-studio-cant-build-due-to-rc-exe
Run installer Anaconda3-5.2.0-Windows-x86_64.exe from here https://repo.continuum.io/archive/
Launch Anaconda Navigator and open a terminal window per the instructions in the book
Update pip to latest version: python.exe -m pip install --upgrade pip
Use conda to create the Python environ "env_zipline" using Robin Szeto's YML https://anaconda.org/robinszeto/env_zipline
Install matplotlib 3.0.0 using Anaconda Navigator
Follow the book instructions to get a Quandl account and ingest the quandl bundle using zipline
Follow instructions for “Patching the Framework” on Mr Clenow's errata page
Use "env_zipline" when launching Jupyter Notebook

At that point zipline should work in Python examples per the book on a Win10 machine.

Good luck!

Update: Anaconda-3 2019.07 (the latest version) works just as well as Anaconda-3 version 5.2 in the above instrsuctions

Dear all,

Great to see some of you had used my "env" to get the zipline installed.
Here is an updated one for your reference:

https://anaconda.org/robinszeto/env_zipline
https://anaconda.org/robinszeto/env_zipline/2019.09.04.052239/download/env_zipline_20190904b_office.yml

Thanks and regards,
Robin

I'm glad it worked out, Laura.

Issues like these are the main reason why everyone advised me against writing this book. The problem is that any number of things can go wrong depending on local environments and unexpected changes in software or API calls, and much of it can't be predicted or preempted.

Luckily, there are kind and helpful people like Robin and Mr. Seahawk out there to assist! Thanks guys!

Andreas

And yet you wrote the book - thank you, I'm enjoying it - so here we are!

The problems are very normal (albeit icky) software dev problems and are largely addressable with a systematic and published approach to tool versioning and package management and documenting of the install process on a clean machine.

You have a good start in your book. It would be nice to have a repository for instructions and conda YML for the top couple OS variants across Mac/Windows/Linux, maybe linked to on your website? :-) Just an idea.

Thank you for your response,

Laura P

Ha, something tells me you've got a far deeper software engineering background than I do.

Although I've been programming since the 80s and started my first IT firm in the 90s, I really have no formal background in tech. My actual background is in finance and business, and on the tech side I've always taken a 'fly by the seat of your pants' approach. Which by the way is one of my favorite anglosaxan idioms since it makes no sense whatsoever. My approach is completely result focused and that often leads to doing things less by the book than a proper software engineer would have done.

Well, if you or anyone else out there would like to contribute to such a project, I'd be happy to help out, host it and give it visibility. :)

Funny thing though. So far, the book sold extremely well. Surprisingly well. I got a large amount of feedback, but only two really negative. One was quite upset because the book was far too difficult. The other was equally upset over the book being far too simplistic.

Hi,

The "Momentum Model" example in Ch 12 uses the bundle "ac_equities_db" (which, of course, we don't have) and changing it to "quandl" generates errors due to the symbol names you use in the index universe. An AC blessed fix for this would be appreciated.

Related, is there a better place to post errata than this thread?

I managed to get zipline working on a 2nd Win10 machine without any issues by following the instructions I posted (and updated). Yes, I have a sw eng background. BTW - "flying by the seat of your pants" is rooted in pilot lingo in reference how accurately your ass tells you some things about your flying that instruments don't capture. Modern F1 drivers talk about the same phenomenon and if you've ever raced a car around a track you quickly understand the term.

Regards, Laura

Ah, yes, I've realized that I was perhaps not clear enough on that one. I got a few questions like that already.

From chapter 12 on, you will need better data than you'll find for free. That means that you'll need to get your own data and build your own bundle to hook it up. That part is explained in chapters 23 and 24.

Initially I had those chapters up front, explaining how to make bundles and hook up your own data early on. I moved it to the back on advice from a fellow author, who pointed out that those technical chapters will make many casual readers drop off without getting to the models and trading stuff.

This is a good place to discuss, but I'm following the threads on other places as well. There's some activity around these things on my own site, followingthetrend.com, and my errata page there. If you find errors, and hopefully solutions, please let me know and I'll update the official errata page.

Thanks for the background on that phrase! Next time my wife tells me to slow down my BMW, I'll use that. "But honey, I'm flying by the seat of my pants, just like pilots and F1 drivers do!"

Hi, I am having some problems running file Portfolio Backtest.ipynb from chapter 8. Even after modifying file Benchmarks.py as recommended by Andreas, the code is still unable to execute correctly on my PC. Running zipline 1.3.0 with python 3.5.6 in Anaconda virtual environnement. All previous codes from the book worked correctly. Error is when running "run_algorithm" function. See below. Any one having a similar issue?

---------------------------------------------------------------------------  
ValueError                                Traceback (most recent call last)  
<ipython-input-2-b68d206bfb01> in <module>()  
    103     capital_base=10000,  
    104     data_frequency = 'daily',  
--> 105     bundle='quandl'  
    106 )

C:\Anaconda3\envs\quantopian\lib\site-packages\zipline\utils\run_algo.py in run_algorithm(start, end, initialize, capital_base, handle_data, before_trading_start, analyze, data_frequency, data, bundle, bundle_timestamp, trading_calendar, metrics_set, default_extension, extensions, strict_extensions, environ, blotter)  
    428         local_namespace=False,  
    429         environ=environ,  
--> 430         blotter=blotter,  
    431     )

C:\Anaconda3\envs\quantopian\lib\site-packages\zipline\utils\run_algo.py in _run(handle_data, initialize, before_trading_start, analyze, algofile, algotext, defines, data_frequency, capital_base, data, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, environ, blotter)  
    227     ).run(  
    228         data,  
--> 229         overwrite_sim_params=False,  
    230     )  
    231 

C:\Anaconda3\envs\quantopian\lib\site-packages\zipline\algorithm.py in run(self, data, overwrite_sim_params)  
    760             daily_stats = self._create_daily_stats(perfs)  
    761  
--> 762             self.analyze(daily_stats)  
    763         finally:  
    764             self.data_portal = None

C:\Anaconda3\envs\quantopian\lib\site-packages\zipline\algorithm.py in analyze(self, perf)  
    474  
    475         with ZiplineAPI(self):  
--> 476             self._analyze(self, perf)  
    477  
    478     def __repr__(self):

<ipython-input-2-b68d206bfb01> in analyze(context, perf)  
     87 def analyze(context, perf):  
     88     # Use PyFolio to generate a performance report  
---> 89     returns, positions, transactions = pf.utils.extract_rets_pos_txn_from_zipline(perf)  
     90     pf.create_returns_tear_sheet(returns, benchmark_rets=None)  
     91 

ValueError: too many values to unpack (expected 3)

I am not an expert but "too many values to unpack" may imply a version mismatch in the libraries. Check your install vs. Robin Szeto's YML and pyfolio 0.9.2

Yes, my thoughts as well. You might have accidentally installed an earlier version of PyFolio.

Note that, at least last time I checked, the PyFolio package on the conda channel isn't updated. Installing through conda will give you an old version.

Thanks a lot for your help Laura & Andreas! I run pyfolio 0.5.1 so that's probably why... I confirm it is impossible to upgrade using conda update as you mentioned earlier, Andreas Conda Installation Version Inconsistency

That's definitely the issue. And I think I know why...

The PyFolio package on conda isn't updated for some reason, so if you install using conda you'll get 0.5.1. You need to use pip to install this package.

I've raised this a long time ago, but it's not updated yet. I was hoping to avoid using pip for the book, but for this package I had to.

I'd uninstall PyFolio and then install it again, using pip.

Just to confuse matters: it appears conda will run pip to install packages if you provide a YML with a pip section. So it's possible to get all the right libraries with a single YML and conda.

Andreas, I saw your post about Ch23/24. I kind understand why you might put these end of the book. imho, this is a technical books for a technical audience so before Chap 12 would have been fine for me. Anyway, I set up a trial Norgate Data account and am working through modifying your sample code to get zipline to ingest the data. I believe you need a line of code after you read the CSV file for zipline to work with Norgate:

df.rename(columns={"Close":"close","Open":"open","High":"high","Low":"low","Volume":"volume"},inplace=True)

Also, some validity checking of the dataframe returned by pd.read_csv() will help filter out corrupt data files (I had one in my download for some reason). I added a line to ensure the df return from read_csv() had at least 5 columns, which is pretty minimal error checking but got me past the corrupt file. So I now have a Norgate trial data bundle working and I'm debating whether to go through Ch24 and get MySQL working or skip that step go back to Ch12. which leads me to my question.

Will zipline run_algoritm() run any faster if I take the extra step of importing the data into MySQL and creating a MySQL based bundle? It looks like the ingest process will be faster, but the actual in Python use in handle_data() and data.history() will run at the same speed as they are mapping internally in zipline/pandas to bcolz and there's no direct linkage to MySQL after the ingest. Is this correct?

Have a great weekend :-)

Laura P

My problem has now been solved. He is what I did.

  1. I first created a new anaconda virtual environment for zipline using line "conda env create -f env_zipline_20190904b_office.yml" into Anaconda Prompt. zipline yml file here
  2. As it seemed that pyfolio was not included, I installed the latest version of this library using command "pip install pyfolio"
  3. I then run again file "Backtest Analysis.ipynb" in Jupyter Notebook and everything is fine now :-)

Thanks so much for you help Laura & Andreas!

Cyril

I have tried installing and uninstalling and reinstalling in various iterations, but I seem to keep ending up with a variation of the same error.

Usually I can do an ingest in zipline as indicated in the book and don't get a "ImportError: cannot import name 'load_prices_from_csv'" until running the example code available on the book website.

I have tried importing the environment method as above and seem to getting "ImportError: cannot import name 'load_prices_from_csv'" when I try to ingest the quandl data.

(env_zipline) C:\Users\Jason>zipline ingest -b quandl
Traceback (most recent call last):  
  File "C:\Users\Jason\Anaconda3\envs\env_zipline\Scripts\zipline-script.py", line 11, in <module>  
    load_entry_point('zipline==1.3.0', 'console_scripts', 'zipline')()  
  File "C:\Users\Jason\Anaconda3\envs\env_zipline\lib\site-packages\pkg_resources\__init__.py", line 484, in load_entry_point  
    return get_distribution(dist).load_entry_point(group, name)  
  File "C:\Users\Jason\Anaconda3\envs\env_zipline\lib\site-packages\pkg_resources\__init__.py", line 2707, in load_entry_point  
    return ep.load()  
  File "C:\Users\Jason\Anaconda3\envs\env_zipline\lib\site-packages\pkg_resources\__init__.py", line 2325, in load  
    return self.resolve()  
  File "C:\Users\Jason\Anaconda3\envs\env_zipline\lib\site-packages\pkg_resources\__init__.py", line 2331, in resolve  
    module = __import__(self.module_name, fromlist=['__name__'], level=0)  
  File "C:\Users\Jason\Anaconda3\envs\env_zipline\lib\site-packages\zipline\__init__.py", line 23, in <module>  
    from . import data  
  File "C:\Users\Jason\Anaconda3\envs\env_zipline\lib\site-packages\zipline\data\__init__.py", line 2, in <module>  
    from .loader import (  
ImportError: cannot import name 'load_prices_from_csv'  

And the error in JupyterLab that has been consistent across various installation attempts:

ImportError                               Traceback (most recent call last)  
<ipython-input-2-d870b0556631> in <module>()  
      1 # Import a few libraries we need  
      2 get_ipython().run_line_magic('matplotlib', 'inline')  
----> 3 from zipline import run_algorithm  
      4 from zipline.api import order_target_percent, symbol,      schedule_function, date_rules, time_rules  
      5 from datetime import datetime

~\Anaconda3\envs\env_zipline\lib\site-packages\zipline\__init__.py in <module>()
     21 from trading_calendars import get_calendar  
     22  
---> 23 from . import data  
     24 from . import finance  
     25 from . import gens

~\Anaconda3\envs\env_zipline\lib\site-packages\zipline\data\__init__.py in <module>()
      1 from . import loader  
----> 2 from .loader import (  
      3     load_prices_from_csv,  
      4     load_prices_from_csv_folder,  
      5 )

ImportError: cannot import name 'load_prices_from_csv'  

Any help would be appreciated!

@Andreas:
I've just read the index of your book. But it seems there is intruduction how to connect your program/algo to IB for live trading?

There is nothing about live trading in the book. Mostly because I have an allergy against law suits and I believe that including a chapter on live trading would give me quite a rash. Publishing a book with code for live trading would be a legal nightmare.

I am not sure if there is misunderstanding here. Ok, I change my question as follow:
In your book is there describtion about how to connect the zipline platform or the algo-program to the paper account by IB?

Besides, as to the backtesting, I wonder if there is describtion how to do the auto- optimation. Formerly one can do this with the Quantopian notebook. But later Quantopian has taken out and switch off a component and it is not possible any more to do the auto-optimation of backtesting.

In your book is there describtion about how to connect the zipline platform or the algo-program to the paper account by IB?

No.

I wonder if there is describtion how to do the auto- optimation.

I'm not familiar with that term.

To the "auto- optimation" I mean:
Assumed my algo is based on SMA1 cross over SMA2. But I am not sure what values are the "best" for SMA1 and SMA2. So I set the SMA1 to 10 and SMA2 to 50 first and do a backtesting. Then I will change the SMA1 and SMA2 to other values. After each changing I do the backtesting again. Surely I can do this manually but this is quite boring. So if I use a for loop, then I can just start the backtesting one time (but in fact there will be hundres of backtesting running one after the other) and at the end select the "best" combination of SMA1 and SMA2.

Hope you understand what I mean here. One can also call this as parameter optination for backtesting. The Ninjatrader has such a function. One could do this by Quantopian formerly. But the Quantopain turned it off later.

Surely this could lead to over-fitting, but this is another theme.

To the question of connecting the zipline to paper account by IB:
Formerly one can do the live-trade here by Quantopian. Since the Quantopian turned off this function, many people looking for another altervatives. Me too. I've heard the zipline is one of them. But at that time the zipline was still in developing phase. I am not sure if the zipline is now mature enough. I though you know zipline well. It is not matter if you haven't described this in your book. I just want to know if one can use the zipline to do live-trade.

Thomas, Have you looked at IBridgePy http://www.ibridgepy.com/ ?

Hi, I've enjoyed the book to the last page and am going back through and working with the examples. The symbols CU and NE are listed as currencies and TW is shown as an equity in the book, but I don't seem to have these in the couple data feeds I'm using. What are they? In general, some comments in the code next to the futures symbols would make them easier to map to other data feeds.

Best, LP

@Lee
Yes, I am using now IbridgePy. But I find it is not stable and robust enough. My algo is broken up quite often bcz this or that problem. The documentation for backtesting is poor.

But the IBridgePy is easy to use and it is free.

CU is the Euro currency futures and NE is New Zealand currency futures. TW is MSCI Taiwan index futures.

The ticker symbols may appear slightly differently for some data providers.

Yes, mapping the codes is quite a joy :-p Two more: BL and LR?

Are you aware of any sort of Zipline upgrade that does incremental data updates instead of the full ingest?

BL is milling wheat. LR is Robusta.

I'm not sure, but I think the upcoming Norgate Data plugin for Zipline works with incremental updates. I don't know how they solved it though.

a nice ft article on hedge funds following the trend. Unfortunately its paywall

https://www.ft.com/content/916ed2e0-d63f-11e9-a0bd-ab8ec6435630

I completed my first reading. Congrat again for this great book Andreas. I created my first custom Bundle and it seems ok. Without your help it would have taken months! Of course there is still a lot of homework to do to use and customize all the valuable source codes from the book.
I am also new in MySQL and found this part is a bit light for a beginner. I have issues running the codes of chapter 24, probably because of my low level. Any book or materials to recommend to start learning mySQL combined with python?
It would have also been great to focus on how to use pipeline with custom bundles. Everything I read on this topic is very technical and difficult to understand for someone with an intermediate level in python. Maybe Andreas will need to write another book :-)
Any interest to build a community around this book? I am not sure Q forum is the best place to discuss all these topics but maybe it is...
Feedbacks welcomed.

Andreas, thank you. LR - of course - thank you. I still can't map BL to Norgate Data futures - the closest seems to be KE, "KC HRW Wheat" but I am not confident it's the right mapping. Any comments?

Also, great news about the potential ND plugin to zipline. Have been contemplating finding a way to do fast and incremental "zipline ingests", more like "zipline snack" (or maybe nibble?) but would be happy if someone else solves the problem.

Cyril - you are in the deep end of the pool now! 1) as far as I can see, you are not required to use MySQL to work with the samples in the book, it's just a very useful tool for all the reasons Andreas lists in Ch24. I'm using MySQL for a variety of reasons beyond what the book covers for stock/futures metadata for example. 2) keep in mind that MySQL is a universe of learning unto itself and just about any book on MySQL will help you because you need to learn how to set up the database and run queries, all of which have little to do with Python or zipline. So find a book (or website) on MySQL that you like and just start experimenting. I should mention that with all things Python, getting the right versions of the MySQL libraries installed can be a challenge.

The idea to start a community/forum around the book other than this Q thread is good. But where?

-Laura P

Greetings all !

Andreas, with Laura Peterson's help I was finally able to install Zipline and get it working. She came out of nowhere and responded to one of my questions on Anaconda Community giving me the missing piece of the puzzle. Laura, again, thanks very much for your help.

I seem to be part of a very small group using Python on a Mac. Most of the fixes noted here and elsewhere are not relevant to the Mac OS, and some of the issues I experienced on my Mac cannot be answered by Windows users. Andreas, I continue to believe that posting the YML's you used to test your code on Windows, Mac OS and Linux machines would be very helpful to your readers.

I am an accountant, not a coder. So much of the code displayed ion the book is directionally understandable, but I doubt I will be able to generate my own code after reading this book. Nevertheless, I an enjoying reading the book and the ideas presented.

I do have a few immediate questions:

  1. Do the returns using the strategies in the book include dividends?

  2. If so, how are the dividend declaration and payment dates captured and used in determining total returns?

  3. Is there a way to separate dividends from Total Return?

Thanks for your help.

Ed

Thanks Laura! You are right I must learn swimming by myself but the deepness of the pool is a bit frightening :-) This book was the best I can get because written by a senior manager coming from the finance industry and sharing the same language. Happy to discuss here as main topic is zipline / quantopian unless Andreas finds a better way to communicate. I found a couple of libraries on Github to improve communication between zipline and local cvs files but will have to test them.

The MySql chapter nearly got cut. I had a couple of technical reviewers of the manuscript telling me that it's irrelevant to the core topic and would only confuse people. They may very well be right, and it's certainly not required. But I did take their advice and moved the chapter to the end of the book, instead of the middle where it originally was.

I find MySql really useful for maintaining a local securities database. You don't really need to go very deep into this topic to have use for it. You might want to look into the Head First book series on topics like this. I find that series really great for getting into a totally new technical subject. It's not written for software engineers, but rather people with a more casual background.

Discussion forum: I'd be happy to take suggestions. I had a similar thought, and a few people suggesting it.

At first, I considered setting up a vBulletin or similar on my own server, but I had some bad experiences in the past trying to police a site like that. Second, on suggestion I looked into using GitHub for the code sharing and discussion, but it doesn't seem like a great place for this kind of purpose.

I'm sure there must be a great existing site somewhere, allowing creation of sub communities with all kinds of built in functionality. I normally try my best to stay far away from the whole social media scene, and I've lost track of the players in the space. If anyone knows of a good site for this kind of purpose, please let me know.

Thank you very much for your answer Andreas. Writing a chapter on MySql was an excellent idea and we can easily feel that your initial plan was to spend more time on this topic. Thank you for recommending the Head First book!
Regarding forum, best would be to use this thread but I am afraid it will become complex to update & read. Not sure talking about zipline is the main goal of this community. Could be better to use google group dedicated to zipline. I would love to have the opinion of Quantopian team!
Moderation is a full-time job without any value added and potential legal hurdles so if you wish to create an independant forum, it is probably best to keep it private and small. Once again Google Groups seem to be a good option but I will ask my children tonight as they are my best social network advisors :-)

Hi Andreas

Let you know I solved the technical problem myself. This little tip could be helpful to anyone who is a Python beginner so I will share it here. You need to patch benchmark.py and loader.py like the book said to make the "Your First Zipline Backtest" work. However, you can't patch it then run the test while Python is running. You need to exit everything first. Then patch. Then reload the whole python zipline, set quandl API key, ingest quandl bundle. then run the test, then it will work.

Also I notice that I need to set quandl API key and ingest quandl bundle EACH TIME I start python zipline backtesting. Once I quit Anaconda, everything is lost.

Now I am onto Portfolio Backtest. Hopefully everything will work fine from here onwards. Fingers Crossed.

Andreas, since yesterday, I cannot load/access your website: www.followingthetrend.com and your book Errata and Updates page. It says "The connection has timed out. The server at www.followingthetrend.com is taking too long to respond." Could you please have a look at that? Thank you.

Henry

Ordered the book. Thank you!

Ordered and started reading.

For the market data part, I am wondering if you have come across some good source of fundamental data apart from just market data.

The best free fundamental data sources I've found must be screen scraped with something like the Python BeautifulSoup package (which I certainly can't recommend as a practice).

The book website is slow and does timeout at times. Patience is a virtue.

The quandl API key issue that Henri Luk had could be any number of things. One possible solution is to set the environment variable at the system level (how you do this depends on whether you are running Windows/Mac/Linux).

Personally, it seems like I've spent way too much time on data source issues but I have mostly solved them. I have a nice automated system that runs overnight to obtain equity, futures, and fundamental data, cache it, insert in MySQL and Zipline and a minmal logging facility so I can check the correctness of the data. It's a work in progress and the samples in the book were helpful in getting me started.

Hi Laura

Thank you for your advice. I am running Windows 10 Pro 64bits. I am a beginner with Python. Would you mind if you could elaborate on how I can set the environment variable at the system level please?

YES! what you are doing with your data capturing is exactly how I want to do it for myself. I am still slowly working through the book at the moment (up to page around 200) so I am nowhere near that but when I try to do that later. May I ask you for help if I am stuck somewhere please?

Thank you for your help Laura.

Henry

Hi Andreas I just bought your book, I read all the comments. So I am excited to whats in store :)

Henri, as with many things the internet is a better source of "how to" than I can type here and that is true of setting a system environment var.

Pulling together a custom overnight job is not too difficult. You make sure all the python applets you want to execute run cleanly from the command line, put them all in a single batch (.bat) file - including the call to anaconda "activate" to set the Python environment - and schedule it using the Windows "Task Scheduler". You may have to play around with the batch file and the task scheduler job properties a bit to get it to work the way you want. The process on a Mac or Linux is a bit different but well documented on the interwebs.

Very helpful thread!

I've gotten most of the code examples to work. I was able to ingest the futures bundle correctly (or so it appeared) using the random futures data provided.
However, despite patching the framework, when I try to run the code for the basic futures momentum model, I get lookup errors for dates not being in DatetimeIndex. I thought that after I re-indexed the dates to match the valid session dates this would be solved.

Anyone see something I'm missing?
I was able to ingest the random_stock_data bundle and run the models that relied on that bundle just fine. So it's something with my futures bundle...


LookupError Traceback (most recent call last)
in ()
310 capital_base=starting_portfolio,
311 data_frequency = 'daily',
--> 312 bundle='random_futures_data' )

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/utils/run_algo.py in run_algorithm(start, end, initialize, capital_base, handle_data, before_trading_start, analyze, data_frequency, data, bundle, bundle_timestamp, trading_calendar, metrics_set, default_extension, extensions, strict_extensions, environ, blotter) 429 local_namespace=False,
430 environ=environ,
--> 431 blotter=blotter,
432 )

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/utils/run_algo.py in _run(handle_data, initialize, before_trading_start, analyze, algofile, algotext, defines, data_frequency, capital_base, data, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, environ, blotter) 228 ).run(
229 data,
--> 230 overwrite_sim_params=False,
231 )
232

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/algorithm.py in run(self, data, overwrite_sim_params) 754 try:
755 perfs = []
--> 756 for perf in self.get_generator():
757 perfs.append(perf)
758

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/gens/tradesimulation.py in transform(self) 204 for dt, action in self.clock:
205 if action == BAR:
--> 206 for capital_change_packet in every_bar(dt):
207 yield capital_change_packet
208 elif action == SESSION_START:

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/gens/tradesimulation.py in every_bar(dt_to_use, current_data, handle_data) 132 metrics_tracker.process_commission(commission)
133
--> 134 handle_data(algo, current_data, dt_to_use)
135
136 # grab any new orders from the blotter, then clear the list.

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/utils/events.py in handle_data(self, context, data, dt) 214 context,
215 data,
--> 216 dt,
217 )
218

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/utils/events.py in handle_data(self, context, data, dt) 233 """
234 if self.rule.should_trigger(dt):
--> 235 self.callback(context, data)
236
237

in daily_trade(context, data)
196 fields=['close','volume'],
197 frequency='1d',
--> 198 bar_count=250,
199 )
200

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/protocol.pyx in zipline._protocol.check_parameters.call_.assert_keywords_and_call (zipline/_protocol.c:3747)()

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/_protocol.pyx in zipline._protocol.BarData.history (zipline/_protocol.c:10190)()

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/data/data_portal.py in get_history_window(self, assets, end_dt, bar_count, frequency, field, data_frequency, ffill) 965 else:
966 df = self._get_history_daily_window(assets, end_dt, bar_count,
--> 967 field, data_frequency)
968 elif frequency == "1m":
969 if field == "price":

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/data/data_portal.py in _get_history_daily_window(self, assets, end_dt, bar_count, field_to_use, data_frequency) 804
805 data = self._get_history_daily_window_data(
--> 806 assets, days_for_window, end_dt, field_to_use, data_frequency
807 )
808 return pd.DataFrame(

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/data/data_portal.py in _get_history_daily_window_data(self, assets, days_for_window, end_dt, field_to_use, data_frequency) 827 field_to_use,
828 days_for_window,
--> 829 extra_slot=False
830 )
831 else:

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/data/data_portal.py in _get_daily_window_data(self, assets, field, days_in_window, extra_slot) 1115 days_in_window,
1116 field,
-> 1117 extra_slot)
1118 if extra_slot:
1119 return_array[:len(return_array) - 1, :] = data

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/data/history_loader.py in history(self, assets, dts, field, is_perspective_after) 547 dts,
548 field,
--> 549 is_perspective_after)
550 end_ix = self._calendar.searchsorted(dts[-1])
551

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/data/history_loader.py in _ensure_sliding_windows(self, assets, dts, field, is_perspective_after) 399
400 assets = self._asset_finder.retrieve_all(assets)
--> 401 end_ix = find_in_sorted_index(cal, end)
402
403 for asset in assets:

/Users/calebjohnson/anaconda3/envs/zip35/lib/python3.5/site-packages/zipline/utils/pandas_utils.py in find_in_sorted_index(dts, dt) 139 ix = dts.searchsorted(dt)
140 if ix == len(dts) or dts[ix] != dt:
--> 141 raise LookupError("{dt} is not in {dts}".format(dt=dt, dts=dts))
142 return ix
143

LookupError: 2001-01-02 00:00:00+00:00 is not in DatetimeIndex(['2015-01-02', '2015-01-05', '2015-01-06', '2015-01-07',
'2015-01-08', '2015-01-09', '2015-01-12', '2015-01-13',
'2015-01-14', '2015-01-15',
...
'2020-10-05', '2020-10-06', '2020-10-07', '2020-10-08',
'2020-10-09', '2020-10-12', '2020-10-13', '2020-10-14',
'2020-10-15', '2020-10-16'],
dtype='datetime64[ns, UTC]', length=1460, freq='C')

Hi Laura (and anyone else who has faced this issue!), Did you have to modify the "random futures data" bundle (or whatever bundle you used to load your futures contracts) to ensure that the futures contracts loaded in the correct order to ensure the proper rolling of continuous futures contracts? Right now my continuous futures contracts are rolling on a yearly basis (from one "F" contract" to the next "F", for example) and that seems to be a result of the order in which the contracts were ingested. Can you share how you were able to overcome this issue? (Caleb, this might be related to your issue above too...? Unless you have already encountered/addressed the issue I am describing?) Thanks for your input!