BossArray for list-like Yahoo search results
Its been a while since I've worked on any small fun projects just for my own enjoyment, and this is something I've been wanting to put together for a couple of weeks and finally had a couple chunks of time to do it. I am pleased to announce boss_array.py
, which is a pleasant wrapper around the Yahoo BOSS Mashup Framework, which allows search results to be treated like a regular list. Lets start off with an example.
>>> from boss_array import BossArray
x>>> x = BossArray("Tokyo cheap hotel")
>>> x[0]
{u'dispurl': u'<b>directrooms.com</b>/japan/<b>hotels</b>/<wbr><b>tokyo</b>-<b>hotels</b>/price1.htm', u'title': u'<b>Tokyo</b> <b>Hotels</b> in Japan - DirectRooms', u'url': u'http://directrooms.com/japan/hotels/tokyo-hotels/price1.htm', u'abstract': u'<b>Tokyo</b> discount <b>hotel</b> reservations from DirectRooms. Save <b>...</b> The <b>hotel</b> is located on the eastern side of <b>Tokyo</b>, Nihonbashi area, <b>...</b> 5 >all <b>Tokyo</b> <b>hotels</b> on 1 <b>...</b>', u'clickurl': u'http://directrooms.com/japan/hotels/tokyo-hotels/price1.htm', u'date': u'2008/05/03', u'size': u'53291'}
>>> len(x)
298267
>>> x[0:10]
u"All ten results would display, removed for readability."
>>> x[20:60]
u"Results 20-60 would display..."
>>> x[0:200]
u"Results 0-200 would display..."
There are a couple of convenient things happening here:
You specify the query once when you create the
BossArray
, and afterwards it remembers your search terms.It allows very easy access to the number of search terms (via the
len
function).Allows you to retrieve more than 50 results at once (the search api is limited for 50 results in one query, but the
BossArray
will break large requests apart into multiple queries, at the moment they are processed sequentially, so it can get a bit slow if you are retrieving a very large quantity of results at once).All search results are cached by the
BossArray
. That means if you retrievex[0:20]
and then retrievex[5]
it doesn't require an http request. If you attempted to retrievex[5:15]
it would use the cached copy as well. It works well in more complex situations as well, consider this:>>> _ = x[0:50] >>> _ = x[100:150] >>> _ = x[0:200]
BossArray
is smart enough to handle that correctly. In the final lookup,x[0:200]
, it will use the cached results from 0-50, and 100-150, and perform two queries to fill in the missing gaps between 50-100 and 150-200.
Usage
You've already seen the usage in the above examples, but here are a few more examples. First we'll open the fifteenth search result in a web browser.
>>> from boss_array import BossArray
>>> x = BossArray("Python")
>>> import webbrowser
>>> webbrowser.open(x[10]['url'])
Next, lets look at displaying all the urls for the first one hundred search results.
>>> from boss_array import BossArray
>>> x = BossArray("Restaurants near Suitengumae Station")
>>> urls = [ a['url'] for a in x[:100] ]
>>> for url in urls:
>>> print url
u"url 1"
u"url 2"
u"..."
Basically, you use it like a Python list.
Setup
Setting up BossArray
is as simple as setting up the Yahoo BOSS Mashup framework, and then starting Python from a directory containing a config.json
file (as explained in the Yahoo BOSS Mashup framework setup instructions).
Repository and Download
You can download and contribute to BossArray
at its github repository.