For quite some time I've been wanting to put together
a pluggable Django application for querying Yahoo! BOSS.
In itself doing that is pretty trivial though, so the app needed
to include some kind of special sauce to sweeten the deal.
I hope you'll find the taste agreeable.
springsteen provides a trivial wrapper for Yahoo! BOSS,
but goes further and provides a simple framework for
building distributed search networks. If you dream
of a world where every blog network is searchable, and each
niche has its own vertical search, then springsteen is for you.
Let's start with some examples.
Querying BOSS for Web Results
springsteen has prebuilt views for searching Yahoo! BOSS for
web, images and news results, making this the simplest usecase.
Then navigate to http://yourproject.com/search/web/
(or /search/images/ or /search/news/ and you'll
immediately have a search page waiting for you.
The search results for Web--as well as all implemented
services--are cached using the caching backend specified
in your settings.py file. The speed benefits of caching
Yahoo! BOSS may be fairly minimal, but for more
exotic services (and frequent searches) the caching may
become more of a feature.
To clean up the appearance override either
the springsteen/base.html or springsteen/results.html
templates. (You can also override the *_result.html templates
to customize differerent result types.)
BOSS Results with Site Restrict
If you only want web results on a single site (the poor man's
site search), you can subclass the springsteen.services.Web
class (you could restrict news or images, by subclassing the
springsteen.services.News and springsteen.services.Images
One of the frequent mistakes I've made as a web developer
is to make http requests sequentially when they could have
been done concurrently. springsteen aims to aggregate
numerous search services, so it needs to be able to request
and process them in parallel.
To perform concurrent requests simply specify multiple services.
(Note that the below values defined in settings are not
standard, but you can put them in your settings.py if that's
how you like to organize globals.)
By default results from services are stacked on
one another. For example, results from the above
my_search would return all results from Images
and then begin showing results from Web.
Ranking results is the hardest part of search,
and springsteen won't solve that. Instead it'll
give you the levers to do it yourself. For most
small scale situations it should be possible to
write fairly concise ranking logic that is specific
to the services you're querying that will outperform
any generic genius that springsteen might try
Exposing Results via a Springsteen Service
Because springsteen is all about aggregating search
services, it will be gradually extended to understand
new formats. However, sometimes you just want to expose
new data to springsteen, and haven't already decided
on a format.
For those situations, you can use a Springsteen Service.
Cool name aside, they are about as simple as it gets.
Let's imagine that you can somehow get search results
in CSV format (no it doesn't make sense, it's an example).
Perhaps your data looks like this:
title, url, text
abc, http://yadayad/abc/, some text here
efg, http://yadayad/efg/, some text here as well
and you have a function csv_search which returns
relevant rows. You could expose that via a Springsteen Service
springsteen already knows how to display results from
a Springsteen Service, so integration is rather concise.
Accessing a Custom Service
It's always easiest when you can get partners
to expose a service in the format you want
(in this case, a SpringsteenService),
but sometimes you have to get in there and
parse the data yourself.
In springsteen.services both the
SpringsteenService and BossSearch
classes provide examples of interfacing
with different data formats.
The key point is to write a run method
that retrieves results and converts them into
a Python list of dictionaries. If you want
to render them with one of the existing template
fragments (web, news, image or springsteen results)
then you should add the corresponding value to
the source key for each result's dictionary.
Let me know if it proves challenging to follow
the existing results, and I'll gladly provide
a complete walkthrough of subclassing Service
The Future of springsteen
At the moment the core of springsteen is
nearly complete, I just need to refactor slightly
to facilitate inserting custom ranking logic.
Beyond that, there are an infinite number of
services that springsteen would like to
know how to query and display.
There is a working example of both exposing data
via a Springsteen Service as well as querying and
aggregating results, and hopefully it'll be sufficiently
composed for revealing by this upcoming Monday.
I hope that springsteen and its vision of
distributed search by small-time players is something
that you find exciting, I know I'm excited about the
prospect of creating targeted and relevant search boxes
powered not by thousands of commodity servers
in datacenters but instead by my vps, and yours.