Sorry about the performance problems. Thank you all for bringing them up. Some of this is limitations we knew about, some of it is a bug. I'll start with a bit more info about what's happening under the covers. For hundreds of wikipedia pages, particularly the ones in the paper that Thomas refers to, we maintain a local cache of the data. If we have a local cache, there shouldn't be a data problem.
For the ones that aren't in the local cache, we query http://stats.grok.se/. The data is organized by month. So we issue serial requests to get each month. Each request takes 2-5 seconds, and we give one minute for the data to load. So there is a limit on the number of "page-months" that can be loaded if the pages aren't in our cache.
We found a couple problems:
- We have a bug where we're being case sensitive where we shouldn't be. For instance, you'll see very different behavior if you look at 'debt' v. 'Debt'. We'll fix that.
- We're finding sometimes the individual page calls are taking longer than 5 seconds, so we're bumping that to 10 seconds. The overall limit of 1 minute is unchanged, though.
Other things to make this better:
- If you have other pages you'd like us to cache, let me know at [email protected]. We're happy to cache more, but we aren't in a place where we can cache them all!
- We find that if we query the website a second time it gets faster. So you may be able to succeed on a 2nd attempt that failed on the first when you're looking at non-cached pages.
Quick edit: overall limit is 1 minute, not 2.