In this tutorial we are going to look at building a simple Django application that integrates with the Yahoo BOSS search framework. More specifically we're going to be using the BOSS Mashup Framework.
First, lets address the most pressing question: What the hell is Yahoo BOSS? BOSS is Build Your Own Search Service and presents us with a fairly low level interface with Yahoo's search engine, not just to search our own site, but to search pretty much anything. The BOSS Mashup Framework, which is what we are going to be using, is open for any developers and has very few restrictions.
Fussy Details
First lets get all the little configuration stuff out of the way. There is a fair bit, but none of it is very difficult. As a warning, I'll point out that the BOSS Mashup Framework requires Python 2.5, and won't work with previous versions without some changes1.
Create a new Django project, lets call it
my_search.django-admin.py startproject my_search
Create a Django app inside
my_search, lets name ityahoo_search.python2.5 manage.py startapp yahoo_search
Unzip it into the
my_search/yahoo_searchfolder, and rename it toboss.unzip boss_mashup_framework_0.1.zip rm boss_mashup_framework_0.1.zip mv boss_mashup_framework_0.1 boss
Yahoo didn't do a great job of packaging something that just works, so we have to go through a few steps to build the framework. (Although, these sub-instructions here are lifted almost directly from the included
READMEfile, so its not that they didn't document it, just that its a bit of a pain to get working.) In Yahoo's defense, I think the reason they did a 'bad' job of packaging is that they probably ran into some incompatable licenses.Install Simple JSON if you don't have it installed. You can check if you have it installed by entering a Python2.5 prompt and typing
import simplejson
If that didn't work, download Simple JSON. And then install it.
python2.5 setup.py build python2.5 setup.py install
Create the folder
my_search/yahoo_search/boss/deps/.Download dict2xml and xml2dict, and extract them into the deps folder, remove the
.tgzfiles, and return to thebossdirectory.tar -xzvf dict2xml.tgz tar -xzvf xml2dict.tgz rm *.tgz cd ..Now we can finally build the framework.
python2.5 setup.py build python2.5 setup.py install
Next, we have to update the settings in
boss/config.json. I only changed the first three settings:appid,email, andorg. Theappidis the one you were given upon signing up for BOSS.Check that it all worked by running (from within the
bossdirectory):python2.5 examples/ex3.py
From here on things are going to deviate from the
READMEa bit, we're going to moveexampleandyosinto ouryahoo_searchdirectory, moveconfig.jsoninto ourmy_searchdirectory and get rid of everything else (well, you might want to keep theexamplesfolder for your own benefit).mv config.json ../../ mv yos ../ mv examples ../ cd .. rm -r boss
Okay, now we're all done with the setup, and are ready to move on to putting together a simple Django application that uses the BOSS Mashup Framework.
Defining our App
Now that we have all the setup out of the way, we need to decide exactly what our app is going to do. To begin with (however, fear not, this is posed to turn into a multi-part series where we gradually put together a more interesting app) we're going to do something really simple: search Yahoo News based on the results of a posted form.
Yep. As simple as you can get. We'll make it more interesting afterwards, when we have something that works.
URLs
First lets edit our project's urls.py to include urls from our yahoo_search app. my_search/urls.py is should look like this:
from django.conf.urls.defaults import *
urlpatterns = patterns('',
(r'^', include('my_search.yahoo_search.urls')),
)
However, we haven't actually created my_search/yahoo_search/urls.py yet, so lets do that real quick.
from django.conf.urls.defaults import *
urlpatterns = patterns('',
(r'^$', 'my_search.yahoo_search.views.index'),
)
As you can see by looking at urlpatterns we're only going to have one view index, and it is going to be handling everything for us.
The index view
Now we're going to write the index view, which will be handling everything for us. Start out by opening my_search/yahoo_search/views.py. Lets start out with all the imports we're going to need.
from django.shortcuts import render_to_response
from django import newforms as forms
from yos.boss import ysearch
from yos.yql import db
We're going to use render_to_response to render templates, newforms to query our user for their search term, ysearch for retrieving data from BOSS, and db to format those retrieved results into something a bit more managable.
Writing the search function
Now lets write a simple search function we'll use for querying BOSS.
def search(str):
data = ysearch.search(str,vertical="news",count=10)
news = db.create(data=data)
return news.rows
Brief Aside
If you wanted to search from Yahoo's web results instead of their news, you'd simply change the line
data = ysearch.search(str,vertical="news",count=10)
to
data = ysearch.search(str,count=10)
The data returned by the search function is a list of dictionaries that look like this:
{
u'sourceurl': u'http://www.channelweb.com/',
u'language': u'en english',
u'title': u'Google Works With eBay And PayPal To Curtail Phishing',
u'url': u'http://www.crn.com/security/208808698?cid=ChannelWebBreakingNews',
u'abstract': u'Google Gmail requires eBay and PayPal to use DomainKeys to authenticate mail in an anti-phish effort',
u'clickurl': u'http://www.crn.com/security/208808698?cid=ChannelWebBreakingNews',
u'source': u'ChannelWeb',
u'time': u'22:26:08',
u'date': u'2008/07/11'
}
The search function is very basic, but will be enough for this initial version of the application. Lets move forward.
A simple newform
Next we need to create a (very) simple newform that we will use for querying our users' for their search terms.
class SearchForm(forms.Form):
search_terms = forms.CharField(max_length=200)
Thats all we'll need for now, carry on. (I said it was simple.)
Actually implementing the index view
Okay, now lets stop for a moment and consider what the index view needs to accomplish.
- It needs to check if there are any incoming POST parameters.
-
If there are POST parameters, it needs to validate them using
SearchForm, and then usesearchto put together the results. -
It needs to use
render_to_responseto render a template contain aSearchForm, and any search results (if applicable).
Okay, translating that into Python we get our index function:
def index(request):
results = None
if request.method == "POST":
form = SearchForm(request.POST)
if form.is_valid():
search_terms = form.cleaned_data['search_terms']
results = search(search_terms)
else:
form = SearchForm()
return render_to_response('yahoo_search/index.html',
{'form': form,'results': results})
Admittedly we haven't written the index.html template yet, that will be our next task. Beyond that, this is a pretty standard Django view.
Filling in the index.html template
First, we need to create the template directory for our yahoo_search app. From inside the my_search/yahoo_search directory:
mkdir templates
mkdir templates/yahoo_search
And then create the file templates/yahoo_search/index.hml, and open it up in your editor. This is going to be a simple template, containing only an input box for searching, and a listing of the results.
It'll look like this:
<html> <head>
<title>My Search</title>
</head>
<body>
<h1>My Search</h1>
<form action="/" method="post">
<table>
{{ form }}
<tr><td><input type="submit" value="Search"></td></tr>
</table>
{% if results %}
<ol>
{% for result in results %}
<li>
{% comment %}
Notice we are using {{ result.clickurl }} instead of
{{ result.url }}. You might wonder why we are doing
that, and the answer is pretty simple: because thats
what Yahoo is asking us to.
http://developer.yahoo.com/search/boss/boss_guide/univer_api_query.html#url_vs_clickurl
{% endcomment %}
<span class="title">
<a href="{{ result.clickurl }}">{{ result.title }}</a>
</span>
<span class="date"> {{ result.date }} </span>
<span class="time"> {{ result.time }} </span>
<span class="source">
<a href="{{ result.sourceurl }}">{{ result.source }}</a>
</span>
<p class="abstract"> {{ result.abstract }} </p>
</li>
{% endfor %}
</ol>
{% endif %}
</body> </html>
Download Zip of Files
If you haven't been keeping up, or if your code is behaving strangely, you can grab a zip of all these files. Just unzip these somewhere, fill in the first three entries (your BOSS appid, email, and org) in my_search/config.json, and you'll be ready to take a look at the app in the next step.
Update 7/12/2008: Unfortunately, the way the BOSS library has been built it isn't enough to simply copy over yos folder, and instead you will need to follow the installation steps for the BOSS Framework listed above (step #6). Specifically, you need to work through those steps and finish with:
python2.5 setup.py build
python2.5 setup.py install
Its a bit of a pain, and I'll see if I can clean things up to make it simpler.
Seeing it work
Now we've finished building the app, lets fire it up.
python2.5 manage.py runserver
Navigate over to http://127.0.0.1:8000/, and you'll see a friendly search box waiting for you. Type in a search term, hit enter, and voila, you'll see a list of your results. I searched for iPhone and got a page of results like this:
One gotcha I'll point out is that the helper library Yahoo has supplied relies on config.json being in the base directory where the Python is being run from. This will be true for your development setup, but won't necessarily be the case on your deployment server. I believe the best solution here would be to add the contents of config.json to your project's settings.py file and tweak the yos/boss/ysearch.py file to load the settings using django.conf.settings instead of from disk.
Let me know if you have any questions, and I'll try to answer them. Time permitting, I'll continue with another segment or two working on building a slightly more compelling search service than what we have created so far.
Update 7/12 Thanks to Wayne's comments I was able to simplify the search function quite a bit. Specifically, he pointed out that I was using the library to prepend ynews$ to all the dictionaries' keys, then getting upset it was there and removing it manually. Woops.
I accidentally installed it under Python 2.4 at first, and the first problem it runs into is the renaming of the
ElementTreepackage between 2.4 and 2.5. I didn't go any further with that, so I'm unsure if there is anything else causing problems.↩
Thanks Will,
that wha exactly what i was planning to do when i find time to do it. You saved me quite some hassle.
I'm wondering how Yahoo's authentication scheme and the two URLS you have to provide when signing up for the API key fit in?
The authentication scheme and two URLs provided when you sign up for the API play no role whatsoever once you have signed up. However, you need to be able to verify a domain with Yahoo (by placing a static file at the root of the domain you're trying to register for), so you'll need to use a domain you have control over.
That's good to hear. I already verified my API key via the static file, so i'm all set.
I just gave it a quick try. Downloaded the zip, moved it to a fairly recent trunk export and got the following error.
Exception Value: Could not import my_search.yahoo_search.views. Error was: No module named yos.crawl
Couldn't find any references to yos.crawl in the views.py
I'm a django newbie, hence it is very likely that i'm missing something trivial.
Gerd,
I'm not exactly sure, but it seems quite possible that the problem has to do with installing the BOSS framework, and if you work through the subset of instructions on installing BOSS (this is step #6 in the setup instructions) everything should be okay.
Sorry that things aren't quite working as planned.
Okay, I took a look at what was going wrong and fixed it up, it will now work correctly from my download.
Will,
downloaded the new version and gave it a try. I am now getting:
Could not import my_search.yahoo_search.views. Error was: No module named util.typechecks
Am going to look into this tomorrow. BTW. I'm using a django svn checkout at revision 7823.
Just copied the util files from the mashup framework and am up and running
Thanks a lot
Great Post!
Was wondering how long it would take for someone to cover this :) Awesome!
this is really great. well done integrating the boss mashup framework with django! the boss page needs to link to this asap.
btw, no need to strip the prefix field names out (before the "$")
the "name" parameter is optional (useful for joining)
Hi, This is a wonderul tutorial. I learnt most of Python and Boss from here.
I followed the steps exactly as above. But, struck with the following error when I try to open thru the browser.
Exception Type: TemplateDoesNotExist Exception Value: yahoo_search/index.html Exception Location: C:Python25Libsite-packagesdjangotemplateloader.py in find_template_source, line 73
I greatly appreciate in any inputs to fix this problem.
Thanks
Is there a way to use my own database of web pages using yahoo's BOSS Mashup Framework?
It kind of depends on what you mean. If you want to mix your own results with Yahoo!'s results, then that is definitely possible. Just mix 'em in as you please. If you want to use the mashup framework to access your pages, that is more complicated.
Fundamentally the mashup framework is just a wrapper around an API hosted by Yahoo, so you would need to recreate a similar API for your pages, and then you would need to make the query syntax and result schema match the current API's syntax/schema or modify the mashup code to work with your new syntax/schema, at which point you're probably better off rolling your own wrapper library for your own system.
Reply to this entry