Setting Up Django-Rcsfield

October 15, 2008. Filed under django 72 python 56

As I was continuing to work on a project of mine, I slowly came into the realization that I was going to need to version some of the data I was dealing with. After spending some time considering the possibilities, I remembered a tweet from Kevin a few weeks back about django-rcsfield supporting just that. (I can't find the specific tweet where he mentioned it, sorry.)

So, I embarked off onto a slight journey to get django-rcsfield installed and working correctly. At project's wiki, there is a quick start guide, but it wasn't quite enough to get me started, so I'll share my notes from the process.

Setting up django-rcsfield

  1. First you need to grab an SVN checkout:

    svn checkout http://django-rcsfield.googlecode.com/svn/trunk/ django-rcsfield
    
  2. Then you need to setup symlinks into your site-packages directory.

    ln -s `pwd`/django-rcsfield/rcsfield/ /Library/Python/2.5/site-packages/rcsfield
    

    (I think my favorite thing about Leopard is that the site-packages directory is finally in an easy to remember location. Adding packages to the site-packages dir used to be like looking for eggs on Easter morning.)

  3. I used the backend based on git-python, which requires that you have Git installed, as well as git-python.

    sudo port install git
    git clone git://gitorious.org/git-python/mainline.git
    mv mainline git-python
    sudo python setup.py install
    

    The django-rcsfield project also supports using Bazaar and Subversion backends as well. Looking at the code, adding support for Mercurial would probably be a 1-3 hour hack session depending on how many things went horribly wrong. (Mostly it would take that long because there doesn't seem to be much documentation for using Mercurial from the command line, although I believe the source itself is well documented.)

  4. Now open up your project's settings.py file and add these lines:

    RCS_BACKEND = 'gitcore' # name of backend module
    GIT_REPO_PATH = os.path.join(ROOT_PATH,'git')
    

    Notice that the value for RCS_BACKEND is the name of the module containing the backend you want to use. Since it wasn't clear what to put there, I initially specified 'git', which lead to a bewildering debug session where I tried to find the error in rcsfield.backends.gitcore, when that file wasn't actually being loaded. Instead it was loading the actual git-python module (git), but they both have a .commit attribute, which helped perpetrate the confusion in my mind. (Of course, if I had carefully read the error message to begin with, then I would have realized what was going wrong right away, woops.)

    If you want to use the Bazaar backend, you would specify:

    RCS_BACKEND = 'bzr'
    BZR_WC_PATH = 'some/path/here'
    

    and if you wanted to use the Subversion backend:

    RCS_BACKEND = 'svn'
    SVN_WC_PATH = 'some/path/here'
    

    Note, though, that it is not necessary to initialize the repository yourself, at least for the Git backend (and I assume for the others as well).

  5. Now you will need to modify your model to include an RcsTextField. (Note that the QuickStart Guide provides an excellent snippet covering this part of configuration.) Something like:

    from rcsfield.fields import RcsTextField
    from rcsfield.manager import RevisionManager
    
    class Note(models.Model):
        title = models.Charfield(max_length=200)
        date = models.DateTimeField()
        slug = models.SlugField(max_length=200, unique=True)
        text = RcsTextField()
        objects = RevisionManager()
    

    Anything you can do without RcsTextField, you can do with RcsTextField. It really just slots in cleanly. If you already have a custom manager, you could simply chose a different name for the RevisionManager

    objects = MyCustomManager()
    revisioned_objects = RevisionManager()
    

    and just remember to use the appropriate one.

  6. I can't strictly say that you must reset your app at this point, but I didn't have any success otherwise. Fortunately, you can dump your existing data out, and reload it afterward, and the reset won't cause too many issues.

    To be clear: I changed a model from using TextField to using RcsTextField, dumped out the data, reset the app, and was able to load the data back in successfully without editing the data dump. This means that replacing a TextField with an RcsField is quite painless (and it makes sense, since in the database table they are both stored in the same way, RcsField's magic happens outside of the database).

    Also, it is necessary to run syncdb as well, even if you are not necessarily adding any new structures to the database.

    python manage.py dumpdata myapp > tmp.json
    python manage.py reset myapp
    python manage.py sycndb
    python manage.py loaddata tmp.json
    
  7. At this point everything should work, unless you're using OSX. For OSX, Git wasn't able to properly locate the files that it was creating in order to add them to the repository. I worked around that by editing line 71 in rcsfield/gitcore.py.

    I changed line 71 from:

    repo.git.add(key)
    

    to

    repo.git.add(os.path.join(self.repo_path, key))
    

    I was initially tempted to submit a patch, but it turns out that the original code performs correctly on Ubuntu, and instead I suspect that the issue should be patched in either git-python or git itself. Then again, it is equally possible that this is simply a symptom of different versions of git being installed from Port and Aptitude.

    So there is something screwy occuring, but it would involve some investigation to figure out exactly what/when/where/who/why and most importantly how to fix it appropriately.

  8. Now when you edit the rcsfields, you should see changes occur in the Git repository you specified in GIT_REPO_PATH. Success is sweet. (Unless you ran into different issues and are angry. Then my success is bitter. I swear.)

Performance Thoughts

My current feel is that the performance here is slightly worse than desirable. The commit to the rcs backend occurs via a post-save signal, so when we consider a request like this:

# super sloppy code alert, trying to 
# focus on the issue at hand...
def update_obj(request):
    obj = Obj.objects.get(slug=request.POST'slug'])
    obj.text = request.POST['text']
    obj.save()
    return HttpResponse("Updated %s successfully." % obj.title)

We not only save the object to the database, the post_save hook is being activated as well, before the HttpResponse is sent. Because the database still maintains the current version of the field, it isn't necessary for the update to the repository to be finished before the response is returned to the user (I'll concede that there may be some applications in which this is not true, but I suspect for the vast majority it is).

This means that saving an object with an rcsfield is substantially slower than saving an object without one. Whether or not that is an issue depends heavily on your user interface and overall application's design, but it is worth keeping in mind.

As for me, if redesigning the UI doesn't sufficiently relieve the feeling of slowness, then I have two thoughts on how to modify the current solution to provide a faster feel:

  1. Throw the saving to repository into its own thread. Threads are evil, yada yada yada, but this seems like a situation where threading out the commit would provide a substantial boost in responsiveness. (This is the 'kind of hacky but might be great' solution.)

  2. Add the commit to a task queue whose tasks are handled by a different service. (This is the "we're scability engineers, wee" solution.)

Anyway, django-rcsfield is a great little project, and I highly recommend playing around with it a bit and seeing if you have any projects that might benefit from adding it to the mix.