Irrational Exuberance!

Making Django Apps Run On and Off GAE

March 10, 2009. Filed under djangogoogle-app-engine

There is a young and growing cottage industry for bridging the gap between Django on the Google App Engine and standard Django deployments. The oldest entrant is the Google App Engine Helper for Django, which helps porting existing Django applications to GAE, and just recently awareness of django-gae2django has increased, which supports porting projects in the opposite direction.

Neither of these are drop-in replacements though, and both require some special tweaks or modifications to your code. Indeed, given the different capabilities and designs of Bigtable versus relational databases, it seems likely to me that a one-size-fits-all drop-in fix will never exist.

That said, it is possible to write pluggable Django applications that play nicely with both normal and GAE deployment, and degrade gracefully when necessary. This is the strategy I used when porting django-springsteen to GAE.

Yet Another Layer Of Abstraction

The pain that this method inflicts is that it requires yet another layer of abstraction. Here was my first attempt at porting Springsteen's use of caching to GAE.

# springsteen/utils.py
try:
    import google.appengine.api
    def cache_get(key):
        pass
    def cache_put(key, value, duration):
        pass
except ImportError:
    import django.core.cache
    def cache_get(key):
        return django.core.cache.cache.get(key)
    def cache_put(key, value, duration):
        django.core.cache.cache.set(key, value, duration)

In this first attempt I was willing to accept the graceful degradation solution. Then in my other code I used these functions like this:

from springsteen.utils import cache_get, cache_put
cache_put("yada", "yadayada", 100)
ab = cache_get("yada")

After testing on GAE I realized pretty quickly that the degrade gracefully solution was inadequate for Springsteen, and filled in the stubbed cache_get and cache_put.

# springsteen/utils.py
try:
    import google.appengine.api
    def cache_get(key):
        return google.appengine.api.memcache.get(key)
    def cache_put(key, value, duration):
        google.appengine.api.memcache.set(key, value, duration)
except ImportError:
    import django.core.cache
    def cache_get(key):
        return django.core.cache.cache.get(key)
    def cache_put(key, value, duration):
        django.core.cache.cache.set(key, value, duration)

The next adaptation I made was less generic and more specific to Springsteen. Springsteen will optionally log all queries for later analysis, and by default uses Python's logging module to accomplish this.

However, while logging queries to a queries.log file makes it very easy to parse and analyze them at a later point, logging the queries to the GAE's admin's log viewer makes it essentially impossible to analyze the data.

Instead I decided to have the GAE version write queries into a Bigtable datastructure which could be queried at a later time to perform analysis and generate statistics.

I used a variant of the above code to accomplish that.

# springsteen/utils.py
from django.conf import settings
try:
    # Setup utilities for App Engine deployment
    import google.appengine.api
    if getattr(settings, 'SPRINGSTEEN_LOG_QUERIES', False):
        from google.appengine.ext import db

        class QueryLog(db.Model):
            text = db.StringProperty()

        def log_query(msg):
            logged = QueryLog()
            logged.text = msg.lower()
            logged.put()
    else:
        def log_query(msg):
            pass

except ImportError:
    # Setup utilities for normal Django deployment
    import logging, os

    if getattr(settings, 'SPRINGSTEEN_LOG_QUERIES', False):
        def get_logger(name, file):
            logger = logging.getLogger(name)
            hdlr = logging.FileHandler(os.path.join(settings.SPRINGSTEEN_LOG_DIR, file))
            formatter = logging.Formatter('%(message)s') 
            hdlr.setFormatter(formatter)
            logger.addHandler(hdlr)
            logger.setLevel(logging.INFO)
            return logger

        QUERY_LOGGER = get_logger('findjango','queries.log')

        def log_query(msg):
            QUERY_LOGGER.info(msg.lower())
        
    else:
        def log_query(msg):
            pass

Using that pattern I was able to enable query logging in different ways for deployment scenarios, as well as disable it entirely depending on the value of SPRINGSTEEN_LOG_QUERIES in settings.py.

Usage of the above code is what one might expect:

from springsteen.utils import log_query
log_query("python")

With a little imagination you can see how one might extend this concept to create wrappers around creating, editing querying and deleting data on the two platforms.

It isn't pretty--nor is it free--but neither is it particularly complex.

Cost and Benefit

The biggest problem with this approach is that it doesn't allow developers to use the functionality they're used to, and instead every single application recreates their own minimal version of the wheel.

On the other hand, it means that someone with no experience with the Google App Engine can deploy applications to GAE and also normal setups without any modifications on their part.

Also, each application gets to degrade gracefully or emulate functionality in the most appropriate way for its specific performance and functionality requirements. This makes it possible to optimize in a more granular fashion than the more ambitious Django-GAE and GAE-Django projects.

I don't expect this approach to catch on, but for certain reusable apps with relatively minor incompatabilities, it's a pretty reasonable way to reduce the deployment burden on your users.