Thanks for the quick response, James. You are absolutely right that this particular example does not show a difference between SPY and benchmark on the first data point. However, when choosing different time spans, you can observe a difference, e.g., when selecting 12/14/2016 to 12/16/2016.
I like your analysis that the difference might stem from different treatment of afterhours changes. However, I implemented your suggestion of opening the position in the morning and closing it in the evening. Every day, it purchases at 9.32am and sells at 4.00pm. Unfortunately, the graphs still don't match up (see below). I am wondering what I miss here.