October 16, 2008.
I've spent a bit of time with AuditTrail over the past day, since I first discovered it, and I've been quite pleased with it. However, my app makes a large number of changes, and I was beginning to experience a bit of database bloat because of the growing number of audits.
After a day of usage, one of my models had about 180 revisions, and while each revision itself is small, it was pretty clear that I wasn't going to be able to ignore the situation without causing myself some serious headaches in the relatively near future (of course, being able to only record diffs is a nice advantage for something like django-rcsfield, which would be able to get by with much less space).
Fortunately, depending on how you're using revisions, there is a
fairly simple solution to this dilemma: throw the excess revisions
away. I didn't want to perform extra database lookups everytime
a new revision was created, so I decided that adding an
extension to manage.py
would be an adequate solution
(which I could periodically activate with a cronjob).
So I setup the skeleton for a management command:
cd my_app
mkdir management
cd management
touch __init__.py
mkdir commands
cd commands
touch __init__.py
emacs clean_audit_trails.py
At first I intended to go with a very specific set of rules for picking the revisions to keep:
But then I started actually writing that code, and my enthusiasm for that approach swiftly dwindled. Instead I decided I could accomplish roughly what I wanted much more concisely by using a simple backoff to determine the cutoffs for dates.
Depending the type of backoff you use, you can control the spacing of revisions to save.
>>> def mult_backoff(x):
... return x * 10
...
>>> [ mult_backoff(x) for x in xrange(0,10) ]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
>>> def exp_backoff(x):
... return x * x
...
>>> [ exp_backoff(x) for x in xrange(0,10) ]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
You could also do an additive backoff, etc. For my needs the multiplicitive backoff worked well. Starting from 60 seconds and multiplying by ten it follows this pattern: 1 minute, 10 minutes, 1 hour, 16 hours, 6 days, 9 weeks, and so on.
Here is the implementation of the clean_audit_trails
management command:
from django.core.management.base import NoArgsCommand
from my_app.models import MyModel
import datetime
class Command(NoArgsCommand):
help='Removes excessive Reversion history for Notes.',
args=''
def handle_noargs(self, **options):
print "Removing unwanted audit trails..."
# if you let the backoff grow too large,
# it'll turn into a long int and datetime.timedelta
# cannot be instantiated with a long int
max_age = 60000
objects = MyModel.objects.select_related().all()
remove = 0
now = datetime.datetime.now()
for obj in objects:
backoff = 60
cutoff = datetime.timedelta(seconds=backoff)
for trail in obj.history.all():
diff = now - trail._audit_timestamp
if backoff > max_age or diff < cutoff:
trail.delete()
remove = remove + 1
else:
backoff = backoff * 10
cutoff = datetime.timedelta(seconds=backoff)
print "Removed %d audit trails." % remove
Note that the code is assuming a model that looks like this:
from django.db import models
import audit
class MyModel(models.Model)
title = models.CharField(max_length=200)
text = models.TextField()
history = audit.AuditTrail()
Using it is the same as any other management command:
python manage.py clean_audit_trails
With a little meta-magic you could probably put together
a versitle tool based on this that isn't hardcoded to
clean a specific model, and uses a backoff method specified
in the projects settings.py
.