I've spent a bit of time with AuditTrail over the past day, since I first discovered it, and I've been quite pleased with it. However, my app makes a large number of changes, and I was beginning to experience a bit of database bloat because of the growing number of audits.
After a day of usage, one of my models had about 180 revisions, and while each revision itself is small, it was pretty clear that I wasn't going to be able to ignore the situation without causing myself some serious headaches in the relatively near future (of course, being able to only record diffs is a nice advantage for something like django-rcsfield, which would be able to get by with much less space).
Fortunately, depending on how you're using revisions, there is a
fairly simple solution to this dilemma: throw the excess revisions
away. I didn't want to perform extra database lookups everytime
a new revision was created, so I decided that adding an
extension to manage.py would be an adequate solution
(which I could periodically activate with a cronjob).
So I setup the skeleton for a management command:
cd my_app
mkdir management
cd management
touch __init__.py
mkdir commands
cd commands
touch __init__.py
emacs clean_audit_trails.py
At first I intended to go with a very specific set of rules for picking the revisions to keep:
- All revisions in the past hour,
- The first revision older than one hour,
- The first revision older than one day,
- The first revision older than one week,
- The first revision older than one month, 6 and so on...
But then I started actually writing that code, and my enthusiasm for that approach swiftly dwindled. Instead I decided I could accomplish roughly what I wanted much more concisely by using a simple backoff to determine the cutoffs for dates.
Depending the type of backoff you use, you can control the spacing of revisions to save.
>>> def mult_backoff(x):
... return x * 10
...
>>> [ mult_backoff(x) for x in xrange(0,10) ]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
>>> def exp_backoff(x):
... return x * x
...
>>> [ exp_backoff(x) for x in xrange(0,10) ]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
You could also do an additive backoff, etc. For my needs the multiplicitive backoff worked well. Starting from 60 seconds and multiplying by ten it follows this pattern: 1 minute, 10 minutes, 1 hour, 16 hours, 6 days, 9 weeks, and so on.
Here is the implementation of the clean_audit_trails
management command:
from django.core.management.base import NoArgsCommand
from my_app.models import MyModel
import datetime
class Command(NoArgsCommand):
help='Removes excessive Reversion history for Notes.',
args=''
def handle_noargs(self, **options):
print "Removing unwanted audit trails..."
# if you let the backoff grow too large,
# it'll turn into a long int and datetime.timedelta
# cannot be instantiated with a long int
max_age = 60000
objects = MyModel.objects.select_related().all()
remove = 0
now = datetime.datetime.now()
for obj in objects:
backoff = 60
cutoff = datetime.timedelta(seconds=backoff)
for trail in obj.history.all():
diff = now - trail._audit_timestamp
if backoff > max_age or diff < cutoff:
trail.delete()
remove = remove + 1
else:
backoff = backoff * 10
cutoff = datetime.timedelta(seconds=backoff)
print "Removed %d audit trails." % remove
Note that the code is assuming a model that looks like this:
from django.db import models
import audit
class MyModel(models.Model)
title = models.CharField(max_length=200)
text = models.TextField()
history = audit.AuditTrail()
Using it is the same as any other management command:
python manage.py clean_audit_trails
With a little meta-magic you could probably put together
a versitle tool based on this that isn't hardcoded to
clean a specific model, and uses a backoff method specified
in the projects settings.py.
Hi Wil,
This will prove useful, thanks! However there is a small typo in handle_noargs when instantiating objects (as ojects).
Cheers,
Colin
Nice catch Colin. I fixed it (and eventually the changeset will make it through the caching ;).
I initially was going to quibble and say that exp_backoff isn't actually exponential. Intuitively its growth is glower than I'm used to. However, wikipedia defines exponential growth as growth proportional to a variable's value, and writing the function in terms of the previous value makes that clear, though the exponent is small.
I found it clearer in a functional style:
Did you ever find a solution to this problem with Audit Trail?
http://code.djangoproject.com/wiki/AuditTrail#Caveats
Reply to this entry