Irrational Exuberance!

BossArray for list-like Yahoo search results

July 28, 2008. Filed under pythonboss

Its been a while since I've worked on any small fun projects just for my own enjoyment, and this is something I've been wanting to put together for a couple of weeks and finally had a couple chunks of time to do it. I am pleased to announce boss_array.py, which is a pleasant wrapper around the Yahoo BOSS Mashup Framework, which allows search results to be treated like a regular list. Lets start off with an example.

>>> from boss_array import BossArray
x>>> x = BossArray("Tokyo cheap hotel")
>>> x[0]
{u'dispurl': u'<b>directrooms.com</b>/japan/<b>hotels</b>/<wbr><b>tokyo</b>-<b>hotels</b>/price1.htm', u'title': u'<b>Tokyo</b> <b>Hotels</b> in Japan - DirectRooms', u'url': u'http://directrooms.com/japan/hotels/tokyo-hotels/price1.htm', u'abstract': u'<b>Tokyo</b> discount <b>hotel</b> reservations from DirectRooms. Save <b>...</b> The <b>hotel</b> is located on the eastern side of <b>Tokyo</b>, Nihonbashi area, <b>...</b> 5 &gt;all <b>Tokyo</b> <b>hotels</b> on 1 <b>...</b>', u'clickurl': u'http://directrooms.com/japan/hotels/tokyo-hotels/price1.htm', u'date': u'2008/05/03', u'size': u'53291'}
>>> len(x)
298267
>>> x[0:10]
u"All ten results would display, removed for readability."
>>> x[20:60]
u"Results 20-60 would display..."
>>> x[0:200]
u"Results 0-200 would display..."

There are a couple of convenient things happening here:

  1. You specify the query once when you create the BossArray, and afterwards it remembers your search terms.

  2. It allows very easy access to the number of search terms (via the len function).

  3. Allows you to retrieve more than 50 results at once (the search api is limited for 50 results in one query, but the BossArray will break large requests apart into multiple queries, at the moment they are processed sequentially, so it can get a bit slow if you are retrieving a very large quantity of results at once).

  4. All search results are cached by the BossArray. That means if you retrieve x[0:20] and then retrieve x[5] it doesn't require an http request. If you attempted to retrieve x[5:15] it would use the cached copy as well. It works well in more complex situations as well, consider this:

    >>> _ = x[0:50]
    >>> _ = x[100:150]
    >>> _ = x[0:200]
    

    BossArray is smart enough to handle that correctly. In the final lookup, x[0:200], it will use the cached results from 0-50, and 100-150, and perform two queries to fill in the missing gaps between 50-100 and 150-200.

Usage

You've already seen the usage in the above examples, but here are a few more examples. First we'll open the fifteenth search result in a web browser.

>>> from boss_array import BossArray
>>> x = BossArray("Python")
>>> import webbrowser
>>> webbrowser.open(x[10]['url'])

Next, lets look at displaying all the urls for the first one hundred search results.

>>> from boss_array import BossArray
>>> x = BossArray("Restaurants near Suitengumae Station")
>>> urls = [ a['url'] for a in x[:100] ]
>>> for url in urls:
>>>    print url
u"url 1"
u"url 2"
u"..."

Basically, you use it like a Python list.

Setup

Setting up BossArray is as simple as setting up the Yahoo BOSS Mashup framework, and then starting Python from a directory containing a config.json file (as explained in the Yahoo BOSS Mashup framework setup instructions).

Repository and Download

You can download and contribute to BossArray at its github repository.