Irrational Exuberance!

Overview of Using Django on the Google App Engine

June 17, 2008. Filed under djangogoogle-app-engine

The most startling thing about developing with the Django on the Google App Engine is how similar the simple things are. On the App Engine a concise front page view might look like this:

from snippet.models import Snippet
def index(request):
    recent = Snippet.all().order("-entry_time")[:5]
    return render_to_response(
        'snippets/index.html',
        {'recent':recent},
        RequestContext(request, {}))

Translating this to a normal Django view involves only one change: Snippets.all().order("-entry_time')[:5] becomes Snippets.all().order_by('-entry_time')[:5]. That is the comforting side of the Google App Engine: in most cases someone used to Django can simply start coding with occasional glimpses at the documentation. However, that doesn't really tell the whole story. Fortunately, I'm in a storytelling mood, so we'll delve a bit deeper than the superficial "It's the Same Damn Thing" angle1.

Actually, it turns out that really addressing the differences here involves telling two stories: the first is a short one about the Google App Engine platform, and the second is a longer one looking at Django on the Google App Engine.

The Google App Engine Platform

The platform is the real attraction of the App Engine. This is because Google has made it amazingly simple to develop and deploy applications. If the Google App Engine had instead been free access to world-class dedicated servers, I don't think it would be nearly as appealing to developers2 as the present incarnation.

Easy Deployment

Quite simply, the platform handles all the messiness of deployment for you (what was my PostgreSQL username again, and what did I name the tables, and why did SSH stop accepting my key for automatic login anyway?). Its difficult to believe how easy the deployment process is. You fill in a simple YAML file, and then run a simple command at the command line. Or, you can use an even simpler GUI to handle the deployment. The upload and syncing may take a minute or two, but your involvement is all of five seconds long, and utterly painless.

BigTable

Not only does GAE free you from most of the difficulties of deployment, it also shields you from many of the details of managing scaling. You won't be setting up servers or sharding your database, instead you'll just pay more money to Google. Or, at least that is the dream scenario of using BigTable instead of cluster of relational databases. Exactly how well the scaling will work out has yet to be seen, and the keys to successful scaling (or at minimum the keys to failure) will still be firmly in the hands of the developers and their application design choices.

This will be an interesting area to watch once several GAE applications become successful and start placing higher demands on the platform. Also--as a brief aside--there is definitely an open niche for a few rocking tutorials on how to design models for BigTable.

Google Accounts & Mail

Another benefit, although probably far less important in the long run, is the ability to integrate with Google Accounts and send email via Google's mail servers. For applications oriented towards technical users, the former is a great feature, since most people in that group will already have a Google account3. In less technical user groups, I suspect the penetration of Google accounts is much lower, but it will still be a helpful option.

The ability to send emails with Google's servers is a nice little benefit as well, and sufficiently necessary for many webapps that using GAE without the ability would have been rather difficult. Other than saying "I'm glad I won't have to set up Postfix" there isn't too much more to say on that account.

Now that we've looked at the Google App Engine platform briefly, lets move on and look at the joys and sorrows of deploying Django on the Google App Engine.

Django on the Google App Engine

The previous segment began by saying the platform is the real attraction of the App Engine, but inevitably the source of its gifts is also the source of its inconveniences and frustrations. Django and GAE are not bad bedfellows, but GAE does have the tendency to steal the sheets sometimes.

The larger a framework grows the less agile it becomes. People often bemoan this point about Ruby on Rails with glum frowns and comments like "It works really well when you do what it wants you to." Django has largely avoided that fate because of its design goal of having interchangable components (use your own templating system, use your own ORM, etc), but it is not immune to the underlying problem: standardization makes it easy to repeat, but hard to adapt.

Even though its possible to use only a subset of Django, doing so invalidates your existing experience with the pieces that are being replaced. Thus Django developers coming over to the GAE have to adjust their eyes to grok the new environment, even though it is filled with mostly familiar sights.

Models and BigTable

The biggest change for Django developers is the model framework: Django's ORM has been replaced with something of Google's design. This is a necessary change because the models are no longer existing in a relational database but are instead existing in BigTable, but this is also the place where attempts to port an application from Django@Elsewhere to Django@GoogleAppEngine will run into the most resistance.

For an example, lets look at part of the Snippet model I use in my simple syntax highlighting app on GAE.

from appengine_django.models import BaseModel
from google.appengine.ext import db
CHOICES = ('Scheme','Python','Ruby') # etc, etc
class Snippet(BaseModel):
    title = db.StringProperty(required=True)
    highlighter = db.StringProperty(required=True, default='Python',choices=CHOICES)
    content = db.TextProperty(required=True)
    url_hash = db.StringProperty()
    parent_hash = db.StringProperty()

Instead of storing a direct link to the parent Snippet, I am storing the parent's hash. This means I can build a link to the parent Snippet without doing an additional query. But how would I get the parent's title? Well, one way to do it would be like this:

from django.shortcuts import render_to_response
from models import Snippet
def view_snip(request, hash):
    snip = Snippet.all().filter("url_hash =", hash)
    parent = Snippet.all().filter("url_hash =", snip.parrent_hash)
    return render_to_response(
        'snippets/snippet_detail.html',
        {'object':snip,'parent_title':parent.title},
        RequestContext(request, {}))

But doing it that way is the relational database way, and since we're using BigTable, it turns out that is now know as the wrong way. Instead, we 'retrieve' the parent's name by preemptively storing it in the child. Thus we would change the model to look like this:

class Snippet(BaseModel):
    title = db.StringProperty(required=True)
    highlighter = db.StringProperty(required=True, default='Python',choices=CHOICES)
    content = db.TextProperty(required=True)
    url_hash = db.StringProperty()
    parent_hash = db.StringProperty()
    parent_title = db.StringProperty()

And we would do the necessary fetching once and only once when we created the Snippet.

# assuming parent_hash variable exists in environment
new_snip = form.save(commit=False)
parent = Snippet.all().filter("url_hash =", parent_hash)
snip.parent_hash = parent_hash
snip.parent_title = parent.title

This feels a little awkward for someone used to relational databases--I mean what the hell, why are you caching data in the model itself?--but this is how BigTable is best utilized: duplicate data to avoid extra lookups. For data that changes very frequently, then it may become necessary to fetch that data frequently (at which point you would start using a different kind of caching: memcached), but often simple data redundency is sufficiently flexible to substantially reduce the quantity of performed lookups.

Certainly, this data redundancy also brings with it certain costs. For example, what do you do if you change the title of a Snippet? Well, you have to change the title for the Snippet itself, and for each of its children snippets. The code might look something like this:

def update_title(hash, title):
    parent = Snippet.all().filter("url_hash =", hash)
    parent.title = title
    parent.put()
    for snip in Snippet.all().filter("parent_hash =", hash):
        snip.parent_title = title
        snip.put()

Which is, in all fairness, pretty ugly. The cost and benefit of this kind of data duplication is going to depend entirely on how much your data is duplicated (if the average Snippet has zero children, then it has no performance cost, although there is a programmer cost for writing the extra code), and how often you make changes (if a Snippet can't change its title, all the sudden the inconveniences become irrelevant).

I think the most important thing to remember here is that different caching and design patterns will be necessary to use BigTable effectively. This is the least cosmetic difference caused by using Django on Google App Engine, and one that needs to be carefully considered while designing an application to run on GAE.

Django.contrib.* and Middleware

Another chunk of Django functionality that is not available on Google App Engine is many of the middlewares and django.contrib apps. The authentication and sessions frameworks in particular will be missed. As one would expect, Google does provide workable--and in some places perhaps more useful--replacements, but its yet another piece of Django that has been replaced and will have to be relearned by switchers to GAE.

Unlike the mismatch between BigTable and relational databases, the difference here is between details instead of concepts. A ported app will need to have chunks rewritten, but the apps themselves--for the most part--won't need to be redesigned.

Django Helper Project

There are many links and vague but optimistic praises for the Django Helper project which aims to abstract away differences between Django@Elsewhere and Django@GAE. In the end, I think its a somewhat misguided--although well intentioned and at times helpful--effort. The problem is that creating a layer of abstraction on top of two different things will lead to the minimal subset of both. Imagine trying to create a library that abstracted away whether you were using Git or Mercurial for your repository: you'd lose out on the unique features of both, and end up with something less compelling than either on its own.

My experience is that the helper doesn't quite work as advertised in a variety of ways, and it makes more sense to do what you can with Django, and then use the Google framework components to supplement Django, rather than pouring a third blend of the two into the koolaid.

Ending Thoughts

Really, I had a very positive experience developing with the Google App Engine, and I'm looking forward to refining my existing project, and to trying out new projects on it as well. It eliminates the pain of deploying webapps, which is usually the most difficult--or at least most capital intense--part of pushing out a new project. Before the AppEngine I had two choices for pushing out projects with a small budget (for the most part consisting of small personal projects):

  1. put it on a VPS and pay more $20-$40 per month, or
  2. put it on a shared host and accept often abysmal performance.

Now, Google App Engine has given me a third option with most of the benefits of the first two, and for that I am grateful. Whether the barrier to web applications needed to be lowered even further is a discussion for another day, but GAE is certainly an exciting experiment to observe and to participate in.

If you have tried developing with GAE, what were your impressions? Do you think it will be a viable platform for webapps?


  1. One of my students recently delivered a speech whose title was 世界のみんなは同じだ or All The World's People Are the Same. Its a pleasant thought, usually one we arrive at somewhere between being paralyzed by lack of detail, and being paralyzed by excessive detail, but not one that seems particularly true. The same goes for the distinction betweeen Django on Google App Engine (DOGAE?) and Django in Others Places (DIOP?): it may seem the same, but it isn't.

  2. And that is who the Google App Engine really seems to be aimed at. Small groups of developers who are lacking either the capital, the desire or the technical prowess to handle the deployment and scaling of their web application.

  3. Well, except for the ones who have decided Google is becoming a terrifying creature to be avoided. I'm pretty sure those people won't be that happy with using a Google App Engine application anyway...