Findjango, A Week of Progress

March 16, 2009. Filed under djangofindjango

Findjango, the Django vertical search I've been working on, has been in the open for about one week. It's moving in a good direction, and--I'd like to say--improving at a good pace. I'm really interested in seeing Findjango become a community tool, so let me give a quick overview of the impediments to that happening, and also what I'm doing to wear them down.

Over the first week there have been about 2500 queries, a number of results flagged for relevancy tuning, and one kind soul who decided to run a load testing script against it with fifty concurrent clients.

Expanding Content

I've tried to keep content extremely Django focused, and one of the side effects of doing so is that there simply isn't enough content available for some queries. Search for Marty Alchin's Pro Django and you get 258 extremely relevant results. Search for tutorial and you get 4218 exceedingly mediocre results. Search for performance and you get 3402 awful results.

To improve upon this situation, I spent much of Sunday getting to know whoosh (a pure-Python full-text indexer and searcher) and feedparser, and now have the ability to persistently index RSS (or Atom) feeds. That is to say, as of yesterday all content that passes through the Django Community feed, This Week in Django feed, and a couple of smaller feeds have been--and will remain--indexed for searching.

This approach won't allow me to pick up older content, but it is one useful tool in the toolkit, and going forward will make it possible to index content for sites that are unable to expose a native search api to Findjango.

Please let me know if you have an RSS feed with Django content which isn't already picked up by the Django community feed, and I'll add it as well!

The next step will be to setup a small web-crawling process using Scrapy and make it possible to extract older or non-feed based content. This is probably a week or two away.

(I'm also working in a few situations to get APIs opened up to Findjango, and that is still the best way to give Findjango access to your content.)

Increasing Awareness

The other major stumbling block for Findjango is that the people who would benefit most from using it--those who are new to developing with Django--have the smallest opportunity to discover it.

If I was Yahoo! or Google, I'd be wheeling and dealing with search deals, but since my budget is markedly smaller, I've come up with a different strategy, one which will hopefully be helpful for both site-owners and Findjango.

As I mentioned last week, I am opening up an API to Findjango.

curl http://findjango.com/service/?query=app%20engine
curl http://findjango.com/service/?query=app%20engine&start=10

The api returns JSON formatted results in this format:

{
  'total_results':200,
  'start':0,
  'count':10,
  'results':[
    {'title':'a','url':'http://etc','text':'yada yada'}
  ]
}

You can use it to add a Django-specific search to your own site. I realize that alone isn't necessarily enough added value, so here something of a special sauce as extra enticement: you can pass your domain along using the site parameter, and results from your site will bubble to the top of the result set.

This means--as long as you let me know where I can find an RSS feed for your site or you expose a search api--you can use Findjango to power an enhanced site search for your Django related site.

curl http://findjango.com/service/?query=tutorial&site=lethain.com

For a lot of sites, I think this will allow Findjango to be genuinely useful, and I'll be working at putting together some default templates and/or templatetags to make using Findjango to power site-search increasingly trivial.

Needs A Better Design

The new design should be in place before Findjango turns two weeks old. Sorry that you've been forced to deal with the current iteration for so long, but things will improve soon.


The above overview wasn't an exhaustive list of changes over the past week (integration with GitHub search api, minor DjangoCodeSearch integration, Findjango became an autodiscoverable search engine in Firefox, and so on), but hopefully it gives some reassurance that Findjango is moving in a good direction, and evolving towards a useful solution.

As always, feedback is welcome.