Welcome to the final installment of this series, which has looked at the advantages of Django's loose binding philosophy. At first, we looked at replacing Django's templating system with Jinja2. Next, we looked at using SQLAlchemy instead of Django's ORM, and in this third segment we are going to look at using CouchDB instead of a traditional object relational database for data storage.
CouchDB is one of the more exciting projects--measured in number of semi-plausible day dreams it has inspired--I've run into in the past year. It is an ideological rival to relational databases, and instead of being focused on highly detailed schemas (long ints, varchars of length 25, and blobs of binary data, oh my) it deals in documents. These documents may be comprised of identical fields, but may also contain dissimilar fields, allowing the same flexibility as Google's BigTable. Beyond that, CouchDB brings some other unexpected goodies to the table like document versioning1.
Before we get started you'll need to install CouchDB, which won't be covered in this tutorial (install on OS X, install on Windows, install on Ubuntu).
Our Project
In this tutorial we're going to use CouchDB to store the data for a simple Django webapp which will allow us to post and edit simple text documents online. Before we get started we have a bit of setup to do.
First we need to have Django installed. We'll be using the SVN version, which can be installed like this (note that you'll bee to replace
SITE-PACKAGES-DIRwith your Python's site-packages directory):svn co http://code.djangoproject.com/svn/django/trunk/ django-trunk ln -s `pwd`/django-trunk/django SITE-PACKAGES-DIR/django
Next, we need to install the Python library, couchdb-python, which interfaces with CouchDB.
easy_install couchdb
Third, we need to create a Django project (for those who don't want to go through all the steps themselves, you can download a zip file containing the code created in this tutorial).
django-admin.py startproject comfy_django_example
You may not have
django-admin.pyin your system path, in which case you'll need to run it from inside your local checkout of Django. For me that looks like:~/svn/django-trunk/django/bin/django-admin.py startproject comfy_django_example
Now, we need to create a Django application inside of our
comfy_django_exampleproject.cd comfy_django_example python manage.py startapp couch_docsWe're almost done getting setup, just need to configure a few settings. Go ahead and open up the
settings.pyfile in thecomfy_django_examplefolder. We'll make this change:INSTALLED_APPS = ( 'comfy_django_example.couch_docs', )
We'll also edit the
comfy_django_example/urls.pyfile to look like this:urlpatterns = patterns('comfy_django_example', (r'^', include('couch_docs.urls')), )
Getting to Know CouchDB
Before we go on and start fleshing out the webapp, lets spend a few minutes playing with CouchDB at the Python command line.
First, we need to start running CouchDB. If you have symlinks properly setup, it should be as simple as:
some-computer:~ user$ couchdb
Apache CouchDB 0.8.0-incubating (LogLevel=info)
Apache CouchDB is starting.
Apache CouchDB has started. Time to relax.
Now fire up a Python interpreter.
python
First you need to create an instance of Server which represents the local CouchDB server running on your computer.
>>> from couchdb import *
>>> s = Server('http://127.0.0.1:5984/')
>>> s
<Server 'http://127.0.0.1:5984/'>
>>> len(s)
0
Now lets create a couple of databases, iterate through all the databases, and delete the database.
>>> s.create('users')
>>> s.create('docs')
>>> len(s)
2
>>> for x in s:
... print x
...
docs
users
>>> del s['users']
>>> del s['docs']
>>> len(s)
0
Now lets create another database, and actually store some data in it.
>>> db = s.create('docs')
>>> len(db)
0
>>> db.create({'type':'Document','title':'Document One','txt':"This is some text."})
u'fd179491f0d95268eb1761e0439cf3e2'
>>> len(db)
1
We can also create named documents as well.
>>> db['manifesto'] = {'type':'Document','title':'Personal Manifesto','txt':'I strongly believe in something. I think.'}
>>> db['manifesto']
<Document u'manifesto'@u'818144524' {u'txt': u'I strongly believe in something. I think.', u'type': u'Document', u'title': u'Personal Manifesto'}>
Retrieving and extracting data from documents is easy as well.
>>> a = db['manifesto']
>>> a['title']
u'Personal Manifesto'
>>> a['title'] = "Ehm. Lame title."
>>> a
<Document u'manifesto'@u'818144524' {u'txt': u'I strongly believe in something. I think.', u'type': u'Document', u'title': 'Ehm. Lame title.'}>
If we want to run a specific query, we write a JavaScript function for this. For example, lets say wanted to retrieve all documents with a title greater than length 4, we'd write this JavaScript:
function(d) {
if (d.title.length > 4) emit(d.name, null);
}
Now, to run that query in Python we have to do this:
>>> func_str = "function(d) { if (d.title.length>4) emit(d.title,null); }"
>>> for row in db.query(func_str):
... print row.key
...
Document One
Personal Manifesto
Its a bit hard to understand whats going on in that query, so lets break it down. emit is used for returning the data you want; the first argument is the key in the result set, and the second argument is the value in the result set. The next example should help clarify things.
>>> func_str = "function(d) { if (d.title.length>4) emit(d.title,d); }"
>>> results = db.query(func_str)
>>> results
<ViewResults <TemporaryView 'function(d) { if (d.title.length>4) emit(d.title,d); }' None> {}>
>>> for row in results:
... print row.key
... print row.value
...
Document One
{u'txt': u'This is some text.', u'_rev': u'709275850', u'_id': u'fd179491f0d95268eb1761e0439cf3e2', u'type': u'Document', u'title': u'Document One'}
Personal Manifesto
{u'txt': u'I strongly believe in something. I think.', u'_rev': u'818144524', u'_id': u'manifesto', u'type': u'Document', u'title': u'Personal Manifesto'}
We can also access the data using list comprehensions.
>>> [ x.key for x in results]
[u'Document One', u'Personal Manifesto']
Now for the last example we'll walk through all the steps at once.
>>> from couchdb import *
>>> s = Server('http://127.0.0.1:5984/')
>>> db = s.create('software')
>>> db['FireFox'] = {'type':'browser','title':'FireFox'}
>>> db['Safari'] = {'type':'browser','title':'Safari'}
>>> db['Aquamacs'] = {'type':'editor','title':'Aquamacs'}
>>> len(db)
3
>>> only_browsers = 'function(d) { if (d.type == "browser") emit(d.title,d); }'
>>> [ x.key for x in db.query(only_browsers) ]
[u'FireFox', u'Safari']
>>> del s['software']
With that command line experimentation under our belt, its pretty easy to imagine how to use CouchDB for most of your data storing, retrieval and manipulation needs. Now on to our coup de grace2: integrating CouchDB with Django.
Integrating CouchDB with Django
We're going to put together a very simple application. It'll have two views: the index view will display a list of all available documents and allow you to create a new document, and the detail view will allow you to edit an existing document.
First, lets create the couch_docs/urls.py file.
from django.conf.urls.defaults import *
urlpatterns = patterns('comfy_django_example.couch_docs.views',
(r'^doc/(?P<id>\w+)/','detail'),
(r'^$','index'),
)
Next, we need to edit the couch_docs/views.py file. We'll start with the imports and also some code to create a CouchDB database named docs if it doesn't already exist.
from django.http import Http404,HttpResponseRedirect
from django.shortcuts import render_to_response
from couchdb import Server
from couchdb.client import ResourceNotFound
SERVER = Server('http://127.0.0.1:5984')
if (len(SERVER) == 0):
SERVER.create('docs')
If you were following best practices you'd probably want to create a COUCHDB_SERVER entry in your settings.py file and then do something like this:
from django.conf import settings
SERVER = Server(getattr(settings,'COUCHDB_SERVER','http://127.0.0.1:5984'))
But for the time being we'll stick with the simpler, albeit less flexible, solution of specifying the server in the views.py file itself. Now, lets write the views.
The index view does two things. On a GET request it displays all the existing documents in the database, and on a POST request it creates a new document and redirects to that document's detail view.
def index(request):
docs = SERVER['docs']
if request.method == "POST":
title = request.POST['title'].replace(' ','')
docs[title] = {'title':title,'text':""}
return HttpResponseRedirect(u"/doc/%s/" % title)
return render_to_response('couch_docs/index.html',{'rows':docs})
Notice that we're passing rows as extra context for the index.html template in the exact same way we'd pass data queried using the Django ORM.
The detail view has two functions as well. If it gets a GET request, then it will display the document (along with a form for editing it), and on POST requests it will update the document. We'll also want to throw a 404 error code if someone requests a document that doesn't exist.
def detail(request,id):
docs = SERVER['docs']
try:
doc = docs[id]
except ResourceNotFound:
raise Http404
if request.method =="POST":
doc['title'] = request.POST['title'].replace(' ','')
doc['text'] = request.POST['text']
docs[id] = doc
return render_to_response('couch_docs/detail.html',{'row':doc})
Please note that the docs[id] = doc line is not optional, and the entry for the document will not be updated without that line, despite update the documents keys. I'm uncertain if that is an oversight in the library or an intentional decision to cut back on http requests, but it can be a confusing gotcha.
Now we just have to write the templates. First, create the template directories in the comfy_django_example/couch_docs/ folder.
cd comfy_django_example/couch_docs/
mkdir templates
mkdir templates/couch_docs
cd templates/couch_docs/
And then create the index.html template.
<html> <head>
<title>Comfy Django</title>
</head>
<body>
<h1>CouchDB in Django</h1>
<form method="post" action=".">
<table>
<tr>
<td> Title for new document </td>
<td><input type="text" name="title"></td>
<td><input type="submit"></td>
</tr>
</table>
</form>
<hr>
<ol>
{% for row in rows %}
<li><a id="title" href="/doc/{{ row }}/">{{ row }}</a></li>
{% endfor %}
</ol>
</body> </html>
And the detail.html template as well.
<html> <head>
<title>CouchDB in Django: {{ row.title }}</title>
</head>
<body>
<h1>CouchDB in Django: {{ row.title }}</h1>
<a href="/">Return to index</a>
<table>
<tr>
<td> Title </td>
<td id="title">{{ row.title }}</td>
</tr>
<tr>
<td> Text </td>
<td id="text">{{ row.text }}</td>
</tr>
</table>
<hr>
<form method="post" action=".">
<table>
<tr>
<td> Title for new document </td>
<td><input type="text" name="title" value="{{ row.title }}"></td>
</tr>
<tr>
<td> Text </td>
<td><textarea name="text">{{ row.text }}</textarea></td>
<tr>
<td><input type="submit"></td>
</tr>
</table>
</form>
</body> </html>
And that sums up our new web app. Go ahead and give it a whirl.
couchdb &
cd comfy_django_example
python manage.py runserver
And then navigate to http://127.0.0.1:8000 to see it in action. Its simple, but it should be enough to get your creative juices flowing a bit about what you could do by combining Django and CouchDB. Especially as CouchDB continues to mature, I think this'll be a potent pairing.
Downloading a Copy of Tutorial
You can download the zip file containing this project here, but for the most up-to-date version it'll be easiest to check the tutorial's repository on GitHub.
Why Would We Ever Actually Do This?
So, thats an important question. Is there a situation where using Django with CouchDB makes substantially more sense than using it with PostgreSQL or MySQL? Are we actually gaining something useful while we forsake the Django Admin (and the sessions framework, etc)?
There are a number of reasons why CouchDB is legitimately worth considering despite the inconveniences. Consider a wiki that allowed users to download a copy of its contents and modify them. If it was implemented with CouchDB, the users could then upload their contents and merge them into the database. The implementer wouldn't need to write custom code for this, its part of CouchDB's feature set. If some of the new data was bad, you could then use the versioning system to rollback to a previous revision (unless it had already gotten deleted by compacting the database... ahem).
Also, consider something like FreeBase which creates schema seemingly on the fly for different types of data. It would be extremely unpleasant to do that in a relational database, because you'd be constantly creating and modifying schema. But in CouchDB, if you wanted to add another field for a piece of data, then you'd just add the data, and it would just work. Having flexible (or, really, non-existant) schema makes it easy to implement some kinds of applications that would require a few drops of the genius elixer to solve with rigidly defined schema3.
Finally, there is the recurring thread from Damien Katz that--once they start optimizing--CouchDB may become impressively fast.
Can We Have Our Cake and Eat it Too?
Yes. Yes, you can. You could use an SQLite or PostgreSQL database to manage your sessions and users, and only use CouchDB to store the document data. In that way, you'd get to use the Django Admin for much of your data, and could still take advantage of CouchDB's document-based storage for the applicable pieces of your application.
The additional complexity from doing so wouldn't be terribly high, at least for the initial programmer. However, scaling such an application would require more thought and effort than scaling an application with only relational databases or that only used CouchDB. It would also be more difficult to efficiently deploy such a service with limited resources (like on a small VPS). These technical hurdles wouldn't be insurmountable, but probably wouldn't be a lot of fun to deal with either.
Wait, That isn't the Cake I Wanted To Eat
Oh. So you want to implement the sessions frameworks on top of CouchDB? You could certainly custom roll an authentication framework using CouchDB, and it is probably possible--although likely requiring a superhuman effort to the extent of translating SQL statements into commands for CouchDB--to create a Django backend that seamlessly used CouchDB instead of a traditional relational database.
If anyone does want to undertake that project, they should probably wait until CouchDB reaches 1.0 and the API stabilizies. But... even then I imagine the project would be something of a hellish quagmire that would make children out of men and break spirits faster than being asked to implement Perl regular expressions with a pushdown autonoma.
Ending The Loose Coupling in Django Series
As this article winds to a close, so does the series encompassing it which looked at Jinja2, SQLAlchemy and now CouchDB and how Django's philosophy of loose coupling allowed them to easily integrate with Django. For me, one of the enduring thoughts after writing these entries is that the loose coupling certainly isn't perfect, but it'll let you do whatever you want if you're willing to pay the toll in time.
And, for my time, the prices at the toll ain't half bad.
Which in some ways makes it seem a bit similar to a versioning system. I know GitHub created a pastie-like service implemented on top of git, and I also recall an article, perhaps by Linus, suggesting that wikis could be beneficially implemented on top of git. Perhaps a Ruby on Rails backend using Git for storage or a Django backend using Mercurial for storage would be a fun project to play with. That said, its a bit different from a versioning system because it doesn't try all that hard to maintain old revisions, and will delete them in order to compact the size of the database.↩
I'd be willing to consider to any argument that suggested I am misusing the phrase coup de grace here. I mostly just like how it sounds and have run out of connection words ('next', 'finally, 'thirdly','one hundredthly', etc) to use in this tutorial.↩
Although I'm sure that relational database supporters will point out that removing rigidity from schema introduces some problems as well. I think thats a pretty fair statement to make, but perhaps not as important as they might suggest.↩
Will, this is an excellent article. Thanks for taking the time to research and write it. Would you consider adding this to the CouchDB wiki (http://wiki.apache.org/couchdb/)?
You asked for feedback: The only most glaring thing I can see is using
db.query(func)for queries. In CouchDB terms this means that you are using a temporary view. While flexible and useful in development, the results of a temporary view are created each time you use it.CouchDB can also make permanent views. Permanent view's results are cached and updated incrementally resulting in way faster response times. You might not notice a big difference with just a handful of documents in the database, but you will later. In fact, permanent views are the recommended way to run queries.
See http://wiki.apache.org/couchdb/HttpViewApi for how to create permanent views. You can also use Futon that you can reach at http://127.0.0.1:5984/_utils/ in your installation.
I understand that just using
db.query(func)makes for an easy introduction and permanent views add to complexity, but it'd be cool if you can add a bit of information about them in the article as I expect many to read this one :)Thanks again, excellent work!
Cheers
Jan
--
Excellent articcle. Just to complete it, I've started a django app to use couchdb with django based on couchdb-python : http://code.google.com/p/couchdb-django/
For now it's juste a start and will be findished around the end of week. What you have already is :
I'm currently work on making the code simple: no more use of httplib2 and adding the missing things that will easy integration with django like user backend, form generation. Feel free to participate if you want ;)
Thanks, very useful info. I have no time to dig into CouchDB approach, but this article stitched missing links in my head :-)
Hello.
You have one or more filed complaints at the bureau of language inspection (henceforth BLI).
Attached is a comprehensive listing of complaints as of the 19th of August, 2008.
Great article, I've been searching for info for a while on how to integrate couchdb with Django. Anyway - my take on "coup de grace", I don't think it's an incorrect usage, but I would prefer another french phrase in this context - "piece de resistance" conveys the finishing touch, which is what I think you mean.
You should really mention that couchdb has no security model and, as a result, is inappropriate for production applications with security implications.
I love couchdb and use it for my pet projects at home all the time, but unfortunately, it doesn't have all the parts in place for the day job just yet.
Nevertheless, I look forward to the day when they do finish the security model.
@david Use an authenticating HTTP proxy if you need to. Yes, you shouldn't use CouchDB publicly, then again, traditional databases are not used publicly either.
Yet another awesome post. Thanks a lot for sharing all this, each new post reminds me one topic I'd like to investigate and fortunately you made the first step!
ps: about "coup de grâce", that's not exactly the right place to put it, it means "the latest thing which will kill you", as suggested by Vernon, "plat de résistance" (not "pièce") is probably more appropriated. If you love the first expression, you can put it in SQL+CouchDB use.</my 2 French cents>
Another great article.
The timing of this post is excellent as well, as CouchDB was just featured on the FLOSS Weekly podcast. I've yet to listen to the episode, but thought those of us looking for some additional intro to CouchDB stuff might find it useful.
hw can v design a UI for retrieving the views from the database and displaying the records
Reply to this entry