Irrational Exuberance!

Yahoo's Build your Own Search Service in Django

July 10, 2008. Filed under djangoboss

In this tutorial we are going to look at building a simple Django application that integrates with the Yahoo BOSS search framework. More specifically we're going to be using the BOSS Mashup Framework.

First, lets address the most pressing question: What the hell is Yahoo BOSS? BOSS is Build Your Own Search Service and presents us with a fairly low level interface with Yahoo's search engine, not just to search our own site, but to search pretty much anything. The BOSS Mashup Framework, which is what we are going to be using, is open for any developers and has very few restrictions.

Fussy Details

First lets get all the little configuration stuff out of the way. There is a fair bit, but none of it is very difficult. As a warning, I'll point out that the BOSS Mashup Framework requires Python 2.5, and won't work with previous versions without some changes1.

  1. Sign up for a BOSS App ID.

  2. Create a new Django project, lets call it my_search.

    django-admin.py startproject my_search
    
  3. Create a Django app inside my_search, lets name it yahoo_search.

    python2.5 manage.py startapp yahoo_search
    
  4. Download the Python library for controlling BOSS.

  5. Unzip it into the my_search/yahoo_search folder, and rename it to boss.

    unzip boss_mashup_framework_0.1.zip
    rm boss_mashup_framework_0.1.zip
    mv boss_mashup_framework_0.1 boss
    
  6. Yahoo didn't do a great job of packaging something that just works, so we have to go through a few steps to build the framework. (Although, these sub-instructions here are lifted almost directly from the included README file, so its not that they didn't document it, just that its a bit of a pain to get working.) In Yahoo's defense, I think the reason they did a 'bad' job of packaging is that they probably ran into some incompatable licenses.

    1. Install Simple JSON if you don't have it installed. You can check if you have it installed by entering a Python2.5 prompt and typing

      import simplejson
      

      If that didn't work, download Simple JSON. And then install it.

      python2.5 setup.py build
      python2.5 setup.py install
      
    2. Create the folder my_search/yahoo_search/boss/deps/.

    3. Download dict2xml and xml2dict, and extract them into the deps folder, remove the .tgz files, and return to the boss directory.

      tar -xzvf dict2xml.tgz
      tar -xzvf xml2dict.tgz
      rm *.tgz
      cd ..
      
    4. Now we can finally build the framework.

      python2.5 setup.py build
      python2.5 setup.py install
      
    5. Next, we have to update the settings in boss/config.json. I only changed the first three settings: appid, email, and org. The appid is the one you were given upon signing up for BOSS.

    6. Check that it all worked by running (from within the boss directory):

      python2.5 examples/ex3.py
      
    7. From here on things are going to deviate from the README a bit, we're going to move example and yos into our yahoo_search directory, move config.json into our my_search directory and get rid of everything else (well, you might want to keep the examples folder for your own benefit).

      mv config.json ../../
      mv yos ../
      mv examples ../
      cd ..
      rm -r boss
      

Okay, now we're all done with the setup, and are ready to move on to putting together a simple Django application that uses the BOSS Mashup Framework.

Defining our App

Now that we have all the setup out of the way, we need to decide exactly what our app is going to do. To begin with (however, fear not, this is posed to turn into a multi-part series where we gradually put together a more interesting app) we're going to do something really simple: search Yahoo News based on the results of a posted form.

Yep. As simple as you can get. We'll make it more interesting afterwards, when we have something that works.

URLs

First lets edit our project's urls.py to include urls from our yahoo_search app. my_search/urls.py is should look like this:

from django.conf.urls.defaults import *
urlpatterns = patterns('',
    (r'^', include('my_search.yahoo_search.urls')),
)

However, we haven't actually created my_search/yahoo_search/urls.py yet, so lets do that real quick.

from django.conf.urls.defaults import *
urlpatterns = patterns('',
    (r'^$', 'my_search.yahoo_search.views.index'),
)

As you can see by looking at urlpatterns we're only going to have one view index, and it is going to be handling everything for us.

The index view

Now we're going to write the index view, which will be handling everything for us. Start out by opening my_search/yahoo_search/views.py. Lets start out with all the imports we're going to need.

from django.shortcuts import render_to_response
from django import newforms as forms
from yos.boss import ysearch
from yos.yql import db

We're going to use render_to_response to render templates, newforms to query our user for their search term, ysearch for retrieving data from BOSS, and db to format those retrieved results into something a bit more managable.

Writing the search function

Now lets write a simple search function we'll use for querying BOSS.

def search(str):
    data = ysearch.search(str,vertical="news",count=10)
    news = db.create(data=data)
    return news.rows

Brief Aside

If you wanted to search from Yahoo's web results instead of their news, you'd simply change the line

data = ysearch.search(str,vertical="news",count=10)

to

data = ysearch.search(str,count=10)

The data returned by the search function is a list of dictionaries that look like this:

{
  u'sourceurl': u'http://www.channelweb.com/',
  u'language': u'en english',
  u'title': u'Google Works With eBay And PayPal To Curtail Phishing',
  u'url': u'http://www.crn.com/security/208808698?cid=ChannelWebBreakingNews',
  u'abstract': u'Google Gmail requires eBay and PayPal to use DomainKeys to authenticate mail in an anti-phish effort',
  u'clickurl': u'http://www.crn.com/security/208808698?cid=ChannelWebBreakingNews',
  u'source': u'ChannelWeb',
  u'time': u'22:26:08',
  u'date': u'2008/07/11'
}

The search function is very basic, but will be enough for this initial version of the application. Lets move forward.

A simple newform

Next we need to create a (very) simple newform that we will use for querying our users' for their search terms.

class SearchForm(forms.Form):
    search_terms = forms.CharField(max_length=200)

Thats all we'll need for now, carry on. (I said it was simple.)

Actually implementing the index view

Okay, now lets stop for a moment and consider what the index view needs to accomplish.

  1. It needs to check if there are any incoming POST parameters.
  2. If there are POST parameters, it needs to validate them using SearchForm, and then use search to put together the results.
  3. It needs to use render_to_response to render a template contain a SearchForm, and any search results (if applicable).

Okay, translating that into Python we get our index function:

def index(request):
    results = None
    if request.method == "POST":
        form = SearchForm(request.POST)
        if form.is_valid():
            search_terms = form.cleaned_data['search_terms']
            results = search(search_terms)
    else:
        form = SearchForm()
    return render_to_response('yahoo_search/index.html', 
                              {'form': form,'results': results})

Admittedly we haven't written the index.html template yet, that will be our next task. Beyond that, this is a pretty standard Django view.

Filling in the index.html template

First, we need to create the template directory for our yahoo_search app. From inside the my_search/yahoo_search directory:

mkdir templates
mkdir templates/yahoo_search

And then create the file templates/yahoo_search/index.hml, and open it up in your editor. This is going to be a simple template, containing only an input box for searching, and a listing of the results.

It'll look like this:

<html> <head>
<title>My Search</title>
</head>
<body>
<h1>My Search</h1>

<form action="/" method="post">
<table>
  {{ form }}
  <tr><td><input type="submit" value="Search"></td></tr>
</table>

{% if results %}
<ol>
  {% for result in results %}
  <li>
  {% comment %}
  Notice we are using {{ result.clickurl }} instead of
  {{ result.url }}. You might wonder why we are doing
  that, and the answer is pretty simple: because thats
  what Yahoo is asking us to.
  http://developer.yahoo.com/search/boss/boss_guide/univer_api_query.html#url_vs_clickurl
  {% endcomment %}
  <span class="title">
    <a href="{{ result.clickurl }}">{{ result.title }}</a>
  </span>
  <span class="date"> {{ result.date }} </span>
  <span class="time"> {{ result.time }} </span>
  <span class="source">
    <a href="{{ result.sourceurl }}">{{ result.source }}</a>
  </span> 
  <p class="abstract"> {{ result.abstract }} </p>
  </li>
  {% endfor %}
</ol>
{% endif %}
</body> </html>

Download Zip of Files

If you haven't been keeping up, or if your code is behaving strangely, you can grab a zip of all these files. Just unzip these somewhere, fill in the first three entries (your BOSS appid, email, and org) in my_search/config.json, and you'll be ready to take a look at the app in the next step.

Update 7/12/2008: Unfortunately, the way the BOSS library has been built it isn't enough to simply copy over yos folder, and instead you will need to follow the installation steps for the BOSS Framework listed above (step #6). Specifically, you need to work through those steps and finish with:

python2.5 setup.py build
python2.5 setup.py install

Its a bit of a pain, and I'll see if I can clean things up to make it simpler.

Seeing it work

Now we've finished building the app, lets fire it up.

python2.5 manage.py runserver

Navigate over to http://127.0.0.1:8000/, and you'll see a friendly search box waiting for you. Type in a search term, hit enter, and voila, you'll see a list of your results. I searched for iPhone and got a page of results like this:

A screenshot of a Django app using the BOSS Mashup Framework.

One gotcha I'll point out is that the helper library Yahoo has supplied relies on config.json being in the base directory where the Python is being run from. This will be true for your development setup, but won't necessarily be the case on your deployment server. I believe the best solution here would be to add the contents of config.json to your project's settings.py file and tweak the yos/boss/ysearch.py file to load the settings using django.conf.settings instead of from disk.

Let me know if you have any questions, and I'll try to answer them. Time permitting, I'll continue with another segment or two working on building a slightly more compelling search service than what we have created so far.

Update 7/12 Thanks to Wayne's comments I was able to simplify the search function quite a bit. Specifically, he pointed out that I was using the library to prepend ynews$ to all the dictionaries' keys, then getting upset it was there and removing it manually. Woops.


  1. I accidentally installed it under Python 2.4 at first, and the first problem it runs into is the renaming of the ElementTree package between 2.4 and 2.5. I didn't go any further with that, so I'm unsure if there is anything else causing problems.