Taming AuditTrail Proliferation
I've spent a bit of time with AuditTrail over the past day, since I first discovered it, and I've been quite pleased with it. However, my app makes a large number of changes, and I was beginning to experience a bit of database bloat because of the growing number of audits.
After a day of usage, one of my models had about 180 revisions, and while each revision itself is small, it was pretty clear that I wasn't going to be able to ignore the situation without causing myself some serious headaches in the relatively near future (of course, being able to only record diffs is a nice advantage for something like django-rcsfield, which would be able to get by with much less space).
Fortunately, depending on how you're using revisions, there is a
fairly simple solution to this dilemma: throw the excess revisions
away. I didn't want to perform extra database lookups everytime
a new revision was created, so I decided that adding an
extension to manage.py
would be an adequate solution
(which I could periodically activate with a cronjob).
So I setup the skeleton for a management command:
cd my_app
mkdir management
cd management
touch __init__.py
mkdir commands
cd commands
touch __init__.py
emacs clean_audit_trails.py
At first I intended to go with a very specific set of rules for picking the revisions to keep:
- All revisions in the past hour,
- The first revision older than one hour,
- The first revision older than one day,
- The first revision older than one week,
- The first revision older than one month, 6 and so on...
But then I started actually writing that code, and my enthusiasm for that approach swiftly dwindled. Instead I decided I could accomplish roughly what I wanted much more concisely by using a simple backoff to determine the cutoffs for dates.
Depending the type of backoff you use, you can control the spacing of revisions to save.
>>> def mult_backoff(x):
... return x * 10
...
>>> [ mult_backoff(x) for x in xrange(0,10) ]
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
>>> def exp_backoff(x):
... return x * x
...
>>> [ exp_backoff(x) for x in xrange(0,10) ]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
You could also do an additive backoff, etc. For my needs the multiplicitive backoff worked well. Starting from 60 seconds and multiplying by ten it follows this pattern: 1 minute, 10 minutes, 1 hour, 16 hours, 6 days, 9 weeks, and so on.
Here is the implementation of the clean_audit_trails
management command:
from django.core.management.base import NoArgsCommand
from my_app.models import MyModel
import datetime
class Command(NoArgsCommand):
help='Removes excessive Reversion history for Notes.',
args=''
<span class="k">def</span> <span class="nf">handle_noargs</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">options</span><span class="p">):</span>
<span class="k">print</span> <span class="s">"Removing unwanted audit trails..."</span>
<span class="c"># if you let the backoff grow too large, </span>
<span class="c"># it'll turn into a long int and datetime.timedelta</span>
<span class="c"># cannot be instantiated with a long int</span>
<span class="n">max_age</span> <span class="o">=</span> <span class="mf">60000</span>
<span class="n">objects</span> <span class="o">=</span> <span class="n">MyModel</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">select_related</span><span class="p">()</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="n">remove</span> <span class="o">=</span> <span class="mf">0</span>
<span class="n">now</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="k">for</span> <span class="n">obj</span> <span class="ow">in</span> <span class="n">objects</span><span class="p">:</span>
<span class="n">backoff</span> <span class="o">=</span> <span class="mf">60</span>
<span class="n">cutoff</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">seconds</span><span class="o">=</span><span class="n">backoff</span><span class="p">)</span>
<span class="k">for</span> <span class="n">trail</span> <span class="ow">in</span> <span class="n">obj</span><span class="o">.</span><span class="n">history</span><span class="o">.</span><span class="n">all</span><span class="p">():</span>
<span class="n">diff</span> <span class="o">=</span> <span class="n">now</span> <span class="o">-</span> <span class="n">trail</span><span class="o">.</span><span class="n">_audit_timestamp</span>
<span class="k">if</span> <span class="n">backoff</span> <span class="o">></span> <span class="n">max_age</span> <span class="ow">or</span> <span class="n">diff</span> <span class="o"><</span> <span class="n">cutoff</span><span class="p">:</span>
<span class="n">trail</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span>
<span class="n">remove</span> <span class="o">=</span> <span class="n">remove</span> <span class="o">+</span> <span class="mf">1</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">backoff</span> <span class="o">=</span> <span class="n">backoff</span> <span class="o">*</span> <span class="mf">10</span>
<span class="n">cutoff</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">seconds</span><span class="o">=</span><span class="n">backoff</span><span class="p">)</span>
<span class="k">print</span> <span class="s">"Removed </span><span class="si">%d</span><span class="s"> audit trails."</span> <span class="o">%</span> <span class="n">remove</span>
Note that the code is assuming a model that looks like this:
from django.db import models
import audit
class MyModel(models.Model)
title = models.CharField(max_length=200)
text = models.TextField()
history = audit.AuditTrail()
Using it is the same as any other management command:
python manage.py clean_audit_trails
With a little meta-magic you could probably put together
a versitle tool based on this that isn't hardcoded to
clean a specific model, and uses a backoff method specified
in the projects settings.py
.