Findjango, the Django vertical search I've been
working on, has been in the open for about one week. It's moving in
a good direction, and--I'd like to say--improving at a good pace. I'm really
interested in seeing Findjango become a community tool, so let
me give a quick overview of the impediments to that happening,
and also what I'm doing to wear them down.
Over the first week there have been about 2500 queries,
a number of results flagged for relevancy tuning, and
one kind soul who decided to run a load testing script against
it with fifty concurrent clients.
I've tried to keep content extremely Django focused,
and one of the side effects of doing so is that there
simply isn't enough content available for some queries.
Search for Marty Alchin's Pro Django and you get
258 extremely relevant results. Search for tutorial
and you get 4218 exceedingly mediocre results. Search for
performance and you get 3402 awful results.
To improve upon this situation, I spent much of Sunday
getting to know whoosh
(a pure-Python full-text indexer and searcher) and
feedparser, and now have the ability
to persistently index RSS (or Atom) feeds. That is
to say, as of yesterday all content that passes through
the Django Community feed, This Week in Django feed,
and a couple of smaller feeds have been--and will remain--indexed
This approach won't allow me to pick up older
content, but it is one useful tool in the toolkit, and going
forward will make it possible to index content for sites that
are unable to expose a native search api to Findjango.
Please let me know if you have an RSS feed with Django content
which isn't already picked up by the Django community feed,
and I'll add it as well!
The next step will be to setup a small
web-crawling process using Scrapy and make it possible
to extract older or non-feed based content. This is probably
a week or two away.
(I'm also working in a few situations to get APIs opened up to
Findjango, and that is still the best way to give Findjango access
to your content.)
The other major stumbling block for Findjango is that
the people who would benefit most from using it--those
who are new to developing with Django--have the smallest
opportunity to discover it.
If I was Yahoo! or Google, I'd be wheeling and dealing with
search deals, but since my budget is markedly smaller, I've
come up with a different strategy, one which will hopefully
be helpful for both site-owners and Findjango.
As I mentioned last week, I am opening up an API to Findjango.
The api returns JSON formatted results in this format:
You can use it to add a Django-specific search to your
own site. I realize that alone isn't necessarily enough added
value, so here something of a special sauce as extra enticement:
you can pass your domain along using the
parameter, and results from your site will bubble to the top of the result set.
This means--as long as you let me know where I can find an RSS feed
for your site or you expose a search api--you can use Findjango to power
an enhanced site search for your Django related site.
For a lot of sites, I think this will allow Findjango to be
genuinely useful, and I'll be working at putting together
some default templates and/or templatetags to make using
Findjango to power site-search increasingly trivial.
Needs A Better Design
The new design should be in place before Findjango turns two
weeks old. Sorry that you've been forced to deal with the
current iteration for so long, but things will improve soon.
The above overview wasn't an exhaustive list of changes over the past week
(integration with GitHub search api, minor DjangoCodeSearch integration, Findjango became an autodiscoverable search engine in Firefox, and so on),
but hopefully it gives some reassurance that Findjango is moving in a good
direction, and evolving towards a useful solution.
As always, feedback is welcome.