Yahoo's Build your Own Search Service in Django
In this tutorial we are going to look at building a simple Django application that integrates with the Yahoo BOSS search framework. More specifically we're going to be using the BOSS Mashup Framework.
First, lets address the most pressing question: What the hell is Yahoo BOSS? BOSS is Build Your Own Search Service and presents us with a fairly low level interface with Yahoo's search engine, not just to search our own site, but to search pretty much anything. The BOSS Mashup Framework, which is what we are going to be using, is open for any developers and has very few restrictions.
Fussy Details
First lets get all the little configuration stuff out of the way. There is a fair bit, but none of it is very difficult. As a warning, I'll point out that the BOSS Mashup Framework requires Python 2.5, and won't work with previous versions without some changes1.
Create a new Django project, lets call it
my_search
.django-admin.py startproject my_search
Create a Django app inside
my_search
, lets name ityahoo_search
.python2.5 manage.py startapp yahoo_search
Unzip it into the
my_search/yahoo_search
folder, and rename it toboss
.unzip boss_mashup_framework_0.1.zip rm boss_mashup_framework_0.1.zip mv boss_mashup_framework_0.1 boss
Yahoo didn't do a great job of packaging something that just works, so we have to go through a few steps to build the framework. (Although, these sub-instructions here are lifted almost directly from the included
README
file, so its not that they didn't document it, just that its a bit of a pain to get working.) In Yahoo's defense, I think the reason they did a 'bad' job of packaging is that they probably ran into some incompatable licenses.Install Simple JSON if you don't have it installed. You can check if you have it installed by entering a Python2.5 prompt and typing
import simplejson
If that didn't work, download Simple JSON. And then install it.
python2.5 setup.py build python2.5 setup.py install
Create the folder
my_search/yahoo_search/boss/deps/
.Download dict2xml and xml2dict, and extract them into the deps folder, remove the
.tgz
files, and return to theboss
directory.tar -xzvf dict2xml.tgz tar -xzvf xml2dict.tgz rm *.tgz cd ..
Now we can finally build the framework.
python2.5 setup.py build python2.5 setup.py install
Next, we have to update the settings in
boss/config.json
. I only changed the first three settings:appid
,email
, andorg
. Theappid
is the one you were given upon signing up for BOSS.Check that it all worked by running (from within the
boss
directory):python2.5 examples/ex3.py
From here on things are going to deviate from the
README
a bit, we're going to moveexample
andyos
into ouryahoo_search
directory, moveconfig.json
into ourmy_search
directory and get rid of everything else (well, you might want to keep theexamples
folder for your own benefit).mv config.json ../../ mv yos ../ mv examples ../ cd .. rm -r boss
Okay, now we're all done with the setup, and are ready to move on to putting together a simple Django application that uses the BOSS Mashup Framework.
Defining our App
Now that we have all the setup out of the way, we need to decide exactly what our app is going to do. To begin with (however, fear not, this is posed to turn into a multi-part series where we gradually put together a more interesting app) we're going to do something really simple: search Yahoo News based on the results of a posted form.
Yep. As simple as you can get. We'll make it more interesting afterwards, when we have something that works.
URLs
First lets edit our project's urls.py
to include urls from our yahoo_search
app. my_search/urls.py
is should look like this:
from django.conf.urls.defaults import *
urlpatterns = patterns('',
(r'^', include('my_search.yahoo_search.urls')),
)
However, we haven't actually created my_search/yahoo_search/urls.py
yet, so lets do that real quick.
from django.conf.urls.defaults import *
urlpatterns = patterns('',
(r'^$', 'my_search.yahoo_search.views.index'),
)
As you can see by looking at urlpatterns
we're only going to have one view index
, and it is going to be handling everything for us.
The index
view
Now we're going to write the index
view, which will be handling everything for us. Start out by opening my_search/yahoo_search/views.py
. Lets start out with all the imports we're going to need.
from django.shortcuts import render_to_response
from django import newforms as forms
from yos.boss import ysearch
from yos.yql import db
We're going to use render_to_response
to render templates, newforms
to query our user for their search term, ysearch
for retrieving data from BOSS, and db
to format those retrieved results into something a bit more managable.
Writing the search
function
Now lets write a simple search function we'll use for querying BOSS.
def search(str):
data = ysearch.search(str,vertical="news",count=10)
news = db.create(data=data)
return news.rows
Brief Aside
If you wanted to search from Yahoo's web results instead of their news, you'd simply change the line
data = ysearch.search(str,vertical="news",count=10)
to
data = ysearch.search(str,count=10)
The data returned by the search
function is a list of dictionaries that look like this:
{
u'sourceurl': u'http://www.channelweb.com/',
u'language': u'en english',
u'title': u'Google Works With eBay And PayPal To Curtail Phishing',
u'url': u'http://www.crn.com/security/208808698?cid=ChannelWebBreakingNews',
u'abstract': u'Google Gmail requires eBay and PayPal to use DomainKeys to authenticate mail in an anti-phish effort',
u'clickurl': u'http://www.crn.com/security/208808698?cid=ChannelWebBreakingNews',
u'source': u'ChannelWeb',
u'time': u'22:26:08',
u'date': u'2008/07/11'
}
The search
function is very basic, but will be enough for this initial version of the application. Lets move forward.
A simple newform
Next we need to create a (very) simple newform
that we will use for querying our users' for their search terms.
class SearchForm(forms.Form):
search_terms = forms.CharField(max_length=200)
Thats all we'll need for now, carry on. (I said it was simple.)
Actually implementing the index
view
Okay, now lets stop for a moment and consider what the index
view needs to accomplish.
- It needs to check if there are any incoming POST parameters.
- If there are POST parameters, it needs to validate them using
SearchForm
, and then usesearch
to put together the results. - It needs to use
render_to_response
to render a template contain aSearchForm
, and any search results (if applicable).
Okay, translating that into Python we get our index
function:
def index(request):
results = None
if request.method == "POST":
form = SearchForm(request.POST)
if form.is_valid():
search_terms = form.cleaned_data['search_terms']
results = search(search_terms)
else:
form = SearchForm()
return render_to_response('yahoo_search/index.html',
{'form': form,'results': results})
Admittedly we haven't written the index.html
template yet, that will be our next task. Beyond that, this is a pretty standard Django view.
Filling in the index.html
template
First, we need to create the template directory for our yahoo_search
app. From inside the my_search/yahoo_search
directory:
mkdir templates
mkdir templates/yahoo_search
And then create the file templates/yahoo_search/index.hml
, and open it up in your editor. This is going to be a simple template, containing only an input box for searching, and a listing of the results.
It'll look like this:
<html> <head>
<title>My Search</title>
</head>
<body>
<h1>My Search</h1>
<form action="/" method="post">
<table>
{ { form }}
<tr><td><input type="submit" value="Search"></td></tr>
</table>
{% if results %}
<ol>
{% for result in results %}
<li>
{% comment %}
Notice we are using { { result.clickurl }} instead of
{ { result.url }}. You might wonder why we are doing
that, and the answer is pretty simple: because thats
what Yahoo is asking us to.
http://developer.yahoo.com/search/boss/boss_guide/univer_api_query.html#url_vs_clickurl
{% endcomment %}
<span class="title">
<a href="{ { result.clickurl }}">{ { result.title }}</a>
</span>
<span class="date"> { { result.date }} </span>
<span class="time"> { { result.time }} </span>
<span class="source">
<a href="{ { result.sourceurl }}">{ { result.source }}</a>
</span>
<p class="abstract"> { { result.abstract }} </p>
</li>
{% endfor %}
</ol>
{% endif %}
</body> </html>
Download Zip of Files
If you haven't been keeping up, or if your code is behaving strangely, you can grab a zip of all these files. Just unzip these somewhere, fill in the first three entries (your BOSS appid
, email
, and org
) in my_search/config.json
, and you'll be ready to take a look at the app in the next step.
Update 7/12/2008: Unfortunately, the way the BOSS library has been built it isn't enough to simply copy over yos
folder, and instead you will need to follow the installation steps for the BOSS Framework listed above (step #6). Specifically, you need to work through those steps and finish with:
python2.5 setup.py build
python2.5 setup.py install
Its a bit of a pain, and I'll see if I can clean things up to make it simpler.
Seeing it work
Now we've finished building the app, lets fire it up.
python2.5 manage.py runserver
Navigate over to http://127.0.0.1:8000/, and you'll see a friendly search box waiting for you. Type in a search term, hit enter, and voila, you'll see a list of your results. I searched for iPhone
and got a page of results like this:
One gotcha I'll point out is that the helper library Yahoo has supplied relies on config.json
being in the base directory where the Python is being run from. This will be true for your development setup, but won't necessarily be the case on your deployment server. I believe the best solution here would be to add the contents of config.json
to your project's settings.py
file and tweak the yos/boss/ysearch.py
file to load the settings using django.conf.settings
instead of from disk.
Let me know if you have any questions, and I'll try to answer them. Time permitting, I'll continue with another segment or two working on building a slightly more compelling search service than what we have created so far.
Update 7/12 Thanks to Wayne's comments I was able to simplify the search
function quite a bit. Specifically, he pointed out that I was using the library to prepend ynews$
to all the dictionaries' keys, then getting upset it was there and removing it manually. Woops.
I accidentally installed it under Python 2.4 at first, and the first problem it runs into is the renaming of the
ElementTree
package between 2.4 and 2.5. I didn't go any further with that, so I'm unsure if there is anything else causing problems.↩