A Django Middleware for Google Analytics (repost)

06/14/2007

This is a transplant from the original Irrational Exuberance, and was written in mid 2007: nearly two years ago.

Part of the beauty of Django is that it was designed to maximize flexibility. As developers we have multiple places we can hook into Django, and these hooks allow access to most of Django's moving parts. The first hook a Django developer is introduced to is the urls.py file where we map urls onto our applications.The urls.py file is the fundamental Django hook, but sometimes Django's second hook--the Middleware--can offer us real gains in simplicity. In this article, we will first look at how to use the GoogleAnalyticsMiddleware to inject calls to Google Analytics into your pages, and then we'll take a step-by-step look at how it was built.The source for the GoogleAnalyticsMiddleware is available here (it is under a Lesser GNU license).<!--more-->

Using the Google Analytics Middleware

Using this Middleware is very simple. You will need to place the analytics.py file somewhere in your Django path, and you'll need to place several variables into your settings.py file.The easiest way to ensure the file is in your Django path is to place it either within your Django source directory (if you installed Django via the python setup.py install method then you'll need to place it within your site-packages folder), or within your Django application's folder. It is preferable to put analytics.py in the middleware folder because then all applications can access it.Next we need to add it to the list of Middlewares in your settings.py file. Your settings.py file should look something like this (this Middleware does not depend on any others, so its position is not signifigant):

MIDDLEWARE_CLASSES = (
    'django.middleware.common.CommonMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.middleware.doc.XViewMiddleware',
    'myproject.analytics.GoogleAnalyticsMiddleware',
)

If you went ahead and placed it within the django.middleware folder, you'd want to line to be

'django.middleware.analytics.GoogleAnalyticsMiddleware',

instead.Next you need to add the global variable ANALYTICS_ID to your settings.py file. Its value should be your Google Analytics id.

ANALYTICS_ID = 'abcd1234'

Often you don't want to track data about yourself, so to facilitate that there is a second, optional, global variable you may declare: ANALYTICS_IGNORE_ADMIN. If you set that equal to True then pages rendered for a logged in admin will not inject the Google Analytics javascript (you don't skew your results, don't have to load the javascript, good times are had by all). If you want to use it this should be inserted into your settings.py file:

ANALYTICS_IGNORE_ADMIN = True

If you don't want to disable tracking for admins, then you don't have to do anything, as the middleware will be injected into all pages by default.Thats it for setting it up. If you want to test it without sending real data to Google Analytics there is a line within init that indicates you should uncomment it such testing.

Developing the Middleware

The first thing we'll need to do is look at the Middleware API. Middleware objects do not have any specific class they need to extends, and they must implement (at least) one of three methods:

process_request(self, request)
process_view(self, request, view_func, view_args, view_kwargs)
process_response(self, request, response)

Process_request is for intercepting requests before Django decides on the view to render (this could be used to reject unauthorized users). Process_view is for intercepting a view before it is rendered and displaying something else instead (perhaps, an error message, or perhaps you want to send a one-time message to users the next time they log in). Process_response is used for modifying the content that will be displayed.For injecting the Analytics code, the process_request is just what we need.

Designing our Middleware

When designing a Middleware its important to streamline it as much as possible. Because this function will be called everytime a page is loaded it ought to be fairly efficient. This means we should design our middleware to do as much preprocessing as possible (and thus avoid as much overhead as possible in the process_response method).As such our init method will contain most of the thinking code. The logic to our init method is as follows:
  • Get the values of ANALYTICS_ID and ANALYTICS_IGNORE_ADMIN from settings.py.
  • Unless there is no ANALYTICS_ID, prebuild the html representing our javascript call to Analytics.
  • If there is no value for ANALYTICS_ID, set process_response to return the response without any modification.
  • Elif ANALYTICS_IGNORE_ADMIN is true, set process_response to inject the Analytics code unless an admin is logged in.
  • Else process_response should always inject the Analytics code.
The first thing we need to know is how to grab information from settings.py. This is accomplished like this:

from django.conf import settings
id = settings.ANALYTICS_ID
ignore = settings.ANALYTICS_IGNORE_ADMIN

Pretty simple, thanks to some metaprogramming magic courtesy of the Django team. The next thing we'll need is a simple function to build the html string based on our id. This will look like this:

@staticmethod
def form_analytics_string(id):
    return """&lt;script src="http://www.google-analytics.com/urchin.js" type="text/javascript"&gt;&lt;/script&gt;&lt;script type="text/javascript"&gt;  _uacct = "%s";  urchinTracker();&lt;/script&gt;""" % ( str(id) )

This is a simple injection of a value into a Python String. It uses the staticmethod decorator because it doesn't require access to the instance's details (using the staticmethod decorator means that it will not automatically be passed self as its first argument).

Slight Detour: some support methods

Before we finish implementing the logic of the init method, we need to make several support methods that will greatly simplify things for us.We will be making three support methods, one to inject the Analytics html into a response (for when there is an ANALYTICS_ID), one to simply return the response unaltered (for when there is no ANALYTICS_ID), and the last is to curry a check if the current user is an admin onto a function it is passed.The function to return it unaltered is by far the simplest, so lets start there.

def return_unaltered(self, request, response):
    return response

Inserting the Analytics code into the response is slightly more complex, but still reasonable.

def insert_analytics(self, request, response):
    content = response.content
    index = content.upper().find('&lt;/BODY&gt;')
    if index == -1: return response
    response.content = content[:index] +
        self.html +
       context[index;]
    return response

We get the content from the response, scan it to see if it has a closing body tag, and if it does we insert the Analytics code immediately before it (this way the analytics script won't slow down the page's loading time). Not too bad.The last support function we need to build is for currying the is_user_admin? check onto the insert_analytics method (the two will only be curried together if the ANALYTICS_IGNORE_ADMIN variable is True, but we'll be doing that logic in our init method once we go back and code it).(If you're wondering what currying is, its a functional programming concept, and you can see an explanation here.)This code is slightly confusing, but it lets us use the code from insert_analytics regardless of whether or not we are ignoring admin users.

@staticmethod
def ignore_admin(func):
    def ignore_if_admin(request, response):
        if request.user.is_authenticated() and request.user.is_staff:
            return response
        else:
            return func(request, response)    return: ignore_if_admin

First lets note that ignore_admin is a static function (it simply manipulates a function, and is not associated with a particular instance of the class).What happens in the ignore_admin method is that it returns a copy of the inner function ignore_if_admin specialized on the parameter 'func'. Thus the returned function remembers the value of 'func' even though it isn't ever explicitly passed it as a parameter (this is a common functional programming idiom).

Returning to the constructor

Now that we have written our support methods we can go back and finish writing our constructor.

def __init__(self):
    try:
        id = settings.ANALYTICS_ID
        self.html = self.form_analytics_string(id)
        self.process_response = self.insert_analytics
        try:
            ignore = settings.ANALYTICS_IGNORE_ADMIN
            if ignore is True:
                self.process_response = self.ignore_admin(
                    self.process_response)
        except AttributeError:
            pass    except AttributeError:
        self.process_response = self.return_unaltered

In the init method, we first try to get the id from the settings file, if we fail that then we set process_response to our return_unaltered method. Otherwise we return the insert_analytics method, curried within the ignore_admin method if appropriate.Thanks to the support methods the init is fairly and easy to follow. Good stuff.

Stepping Away from the Trees

Now lets look at the entirety of the code that we have pieced together.

from django.conf import settings
class GoogleAnalyticsMiddleware(object):
    def __init__(self):
        'Constructor for GoogleAnalyticsMiddleware'
        try:
            id = settings.ANALYTICS_ID
            self.html = self.form_analytics_string(id)
            self.process_response = self.insert_analytics
            try:
                ignore = settings.ANALYTICS_IGNORE_ADMIN
                if ignore is True:
                    self.process_response = self.ignore_admin(
                        self.process_response)
            except AttributeError:
                pass
        except AttributeError:
            self.process_response = self.return_unaltered

    def insert_analytics(self, request, response):
        content = response.content
        index = content.upper().find('&lt;/BODY&gt;')
        if index == -1: return response
        response.content =  content[:index] +
            self.html +
            content[index:]
        return response

    def return_unaltered(self, request, response):
        return response

    @staticmethod
    def ignore_admin(func):
        def ignore_if_admin(request, response):
            if request.user.is_authenticated() and request.user.is_staff:
                return response
            else:
                return func(request, response)
        return ignore_if_admin

    @staticmethod
    def form_analytics_string(id):
        return """&lt;script src="http://www.google-analytics.com/urchin.js" type="text/javascript"&gt;&lt;/script&gt;&lt;script type="text/javascript"&gt;  _uacct = "%s";  urchinTracker();&lt;/script&gt;""" % ( str(id) )

All in all we have written a moderately useful Middleware in just under fifty lines of code (ignoring whitespace, as one always does when trying to make an illadvised point about number of lines). Not a bad accomplishment. Now that we have written one Middleware, the next time we'll understand the fundamentals and can write something a bit more interesting.Remember that the completed source code is available here, and it (unlike the code we just finished building) actually contains comments.

All Rights Reserved, Will Larson 2007 - 2014.