BossArray for list-like Yahoo search results
Its been a while since I've worked on any small fun projects just for my own enjoyment, and this is something I've been wanting to put together for a couple of weeks and finally had a couple chunks of time to do it. I am pleased to announce boss_array.py, which is a pleasant wrapper around the Yahoo BOSS Mashup Framework, which allows search results to be treated like a regular list. Lets start off with an example.
>>> from boss_array import BossArray
x>>> x = BossArray("Tokyo cheap hotel")
>>> x[0]
{u'dispurl': u'<b>directrooms.com</b>/japan/<b>hotels</b>/<wbr><b>tokyo</b>-<b>hotels</b>/price1.htm', u'title': u'<b>Tokyo</b> <b>Hotels</b> in Japan - DirectRooms', u'url': u'http://directrooms.com/japan/hotels/tokyo-hotels/price1.htm', u'abstract': u'<b>Tokyo</b> discount <b>hotel</b> reservations from DirectRooms. Save <b>...</b> The <b>hotel</b> is located on the eastern side of <b>Tokyo</b>, Nihonbashi area, <b>...</b> 5 >all <b>Tokyo</b> <b>hotels</b> on 1 <b>...</b>', u'clickurl': u'http://directrooms.com/japan/hotels/tokyo-hotels/price1.htm', u'date': u'2008/05/03', u'size': u'53291'}
>>> len(x)
298267
>>> x[0:10]
u"All ten results would display, removed for readability."
>>> x[20:60]
u"Results 20-60 would display..."
>>> x[0:200]
u"Results 0-200 would display..."
There are a couple of convenient things happening here:
- You specify the query once when you create the - BossArray, and afterwards it remembers your search terms.
- It allows very easy access to the number of search terms (via the - lenfunction).
- Allows you to retrieve more than 50 results at once (the search api is limited for 50 results in one query, but the - BossArraywill break large requests apart into multiple queries, at the moment they are processed sequentially, so it can get a bit slow if you are retrieving a very large quantity of results at once).
- All search results are cached by the - BossArray. That means if you retrieve- x[0:20]and then retrieve- x[5]it doesn't require an http request. If you attempted to retrieve- x[5:15]it would use the cached copy as well. It works well in more complex situations as well, consider this:- >>> _ = x[0:50] >>> _ = x[100:150] >>> _ = x[0:200] - BossArrayis smart enough to handle that correctly. In the final lookup,- x[0:200], it will use the cached results from 0-50, and 100-150, and perform two queries to fill in the missing gaps between 50-100 and 150-200.
Usage
You've already seen the usage in the above examples, but here are a few more examples. First we'll open the fifteenth search result in a web browser.
>>> from boss_array import BossArray
>>> x = BossArray("Python")
>>> import webbrowser
>>> webbrowser.open(x[10]['url'])
Next, lets look at displaying all the urls for the first one hundred search results.
>>> from boss_array import BossArray
>>> x = BossArray("Restaurants near Suitengumae Station")
>>> urls = [ a['url'] for a in x[:100] ]
>>> for url in urls:
>>>    print url
u"url 1"
u"url 2"
u"..."
Basically, you use it like a Python list.
Setup
Setting up BossArray is as simple as setting up the Yahoo BOSS Mashup framework, and then starting Python from a directory containing a config.json file (as explained in the Yahoo BOSS Mashup framework setup instructions).
Repository and Download
You can download and contribute to BossArray at its github repository.