October 16, 2008.
I've spent a bit of time with AuditTrail over the past day, since I first discovered it, and I've been quite pleased with it. However, my app makes a large number of changes, and I was beginning to experience a bit of database bloat because of the growing number of audits.
After a day of usage, one of my models had about 180 revisions, and while each revision itself is small, it was pretty clear that I wasn't going to be able to ignore the situation without causing myself some serious headaches in the relatively near future (of course, being able to only record diffs is a nice advantage for something like django-rcsfield, which would be able to get by with much less space).
Fortunately, depending on how you're using revisions, there is a
fairly simple solution to this dilemma: throw the excess revisions
away. I didn't want to perform extra database lookups everytime
a new revision was created, so I decided that adding an
manage.py would be an adequate solution
(which I could periodically activate with a cronjob).
So I setup the skeleton for a management command:
cd my_app mkdir management cd management touch __init__.py mkdir commands cd commands touch __init__.py emacs clean_audit_trails.py
At first I intended to go with a very specific set of rules for picking the revisions to keep:
But then I started actually writing that code, and my enthusiasm for that approach swiftly dwindled. Instead I decided I could accomplish roughly what I wanted much more concisely by using a simple backoff to determine the cutoffs for dates.
Depending the type of backoff you use, you can control the spacing of revisions to save.
>>> def mult_backoff(x): ... return x * 10 ... >>> [ mult_backoff(x) for x in xrange(0,10) ] [0, 10, 20, 30, 40, 50, 60, 70, 80, 90] >>> def exp_backoff(x): ... return x * x ... >>> [ exp_backoff(x) for x in xrange(0,10) ] [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
You could also do an additive backoff, etc. For my needs the multiplicitive backoff worked well. Starting from 60 seconds and multiplying by ten it follows this pattern: 1 minute, 10 minutes, 1 hour, 16 hours, 6 days, 9 weeks, and so on.
Here is the implementation of the
from django.core.management.base import NoArgsCommand from my_app.models import MyModel import datetime class Command(NoArgsCommand): help='Removes excessive Reversion history for Notes.', args='' def handle_noargs(self, **options): print "Removing unwanted audit trails..." # if you let the backoff grow too large, # it'll turn into a long int and datetime.timedelta # cannot be instantiated with a long int max_age = 60000 objects = MyModel.objects.select_related().all() remove = 0 now = datetime.datetime.now() for obj in objects: backoff = 60 cutoff = datetime.timedelta(seconds=backoff) for trail in obj.history.all(): diff = now - trail._audit_timestamp if backoff > max_age or diff < cutoff: trail.delete() remove = remove + 1 else: backoff = backoff * 10 cutoff = datetime.timedelta(seconds=backoff) print "Removed %d audit trails." % remove
Note that the code is assuming a model that looks like this:
from django.db import models import audit class MyModel(models.Model) title = models.CharField(max_length=200) text = models.TextField() history = audit.AuditTrail()
Using it is the same as any other management command:
python manage.py clean_audit_trails
With a little meta-magic you could probably put together
a versitle tool based on this that isn't hardcoded to
clean a specific model, and uses a backoff method specified
in the projects