Last week I was doing some performance work with a client,
and one of the big improvements we made was making http requests
in parallel. If your server needs to hit two or three APIs before
it can render (the bane of the mashup crowd), then making
sequential requests can be taking a huge bite out of your
performance. With the client, the solution needed to be in PHP,
but this evening I decided to whip up a similar solution for
Python (someone might have suggested it as well).
Using the threading and urllib modules it
turned out to be a fairly straight forward task.
My (very basic) strategy was to create a thread for each
request, poll the threads until they finished, and then
return the received data as a list of two tuples in the
form of (url, data). The one other item on my wish
list is that I wanted a timeout that applied to all of
the threads (again, to keep things feeling snappy).
My code ended up looking like this (let's say that it is
stored in a file named multi_get.py for the following
frommulti_getimportmulti_getsites=['http://msn.com/','http://yahoo.com/','http://google.com/']requests=multi_get(sites,timeout=1.5)forurl,datainrequests:print"received this data %s from this url %s"%(url,data)
I did some comparison testing against this straightforward
and the results were what one would expect. On my connection it was taking MSN 1.14 seconds to load, while Yahoo and Google took between .1 and .2 seconds (who knows, maybe they were rejecting the user agent with an error page ;). The first script, executing the retrievals in parallel, was able to retrieve all requests ten or twenty miliseconds slower than the slowest result, while the sequential script took a good bit longer (the sum of all response times).
As you build your next Django or TurboGears mashup--or even plan your
next doomed foray into screenscaping Google--give parallel requests
a try and see just how helpful they can be.
(You could also approach this problem using the asyncore module, but it started throwing some weird errors my way and I went with the threaded approach. Given that the threaded approach performs admirably, I decided to leave it as it is; using asyncore should be almost identical but you'd have to write a bit more glue code for the HTTP aspects, especially if you wanted to get more complex than GET requests.)