Syntax Highlighting with MarkDown, and a pinch of Automagick for Django

Published on July 14, 2007. django (72), python (64)

There are a handful of good resources on implementing syntax highlighting and MarkDown support for content in Django, when I was starting out work on my blog I based my code off of the example I found here. It is a well done article, and it ends up with a usable system (the core of which I reuse here). But this article is going to look at extending its functionality a bit, and also simplifying the code.

If you were to implement the above article's code you would end up writing MarkDown that look like this:

### My article

My list entry one
My list entry two

<code class="python">def x (a, b):
return a * b</code>

That is reasonable solution, but I find the code element a bit clunky, so I here is my alternative implementation of syntax highlighting with MarkDown for Django. After its implemented that bit of MarkDown will be written like this:

 ### My article

My list entry one
My list entry two

@@ python
def x (a, b):
return a * b

@@ Your milleage may vary, but I think thats a big improvement for both readability and writability. Further more the MarkDown++ solution shown here allows access to all of Pygments syntax lexers so you can use languages like html+django, ruby, scheme, or apache.

Implementation

The first step is to download [MarkDown++][markdownpp]. You can either add it to your Python path, or I personally added it as a file in my Django application that calls it, thus I import it like so:

import myproject.myapp.markdownpp as markdownpp

Next we need to make our model that contains our content:

class Entry(models.Model):
     title = models.CharField(maxlength=200)
     body = models.TextField()
     body_html = models.TextField(blank=True, null=True)
     use_markdown = models.BooleanField(default=True)

and give it a save method

    def save(self):
        if self.use_markdown:
            self.body_html = markdownpp.markdown(self.html)
        else:
            self.body_html = self.body
        super(Entry,self).save()

And thats it, now you can use the MarkDown++ syntax for code syntax highlighting. You will need to save a copy of this css file and place it in your media folder somewhere that your template can load it (it is the css needed for the syntax coloring, and shouldn't interfere with your existing css at all).

Taking it a bit further

Now you already have a working MarkDown and code syntax highlighter implemented, but we can spruce this up a little bit more.

Often you'll have files (images or otherwise) that you'll be referencing in your content. Unfortunately its an error prone process to write your reference links by hand:

[myfile]: http://www.lethain.com/media/lifeflow/myfile.jpg

and if you write a number of them it gets a bit tedious. Wouldn't it be great if the entries wrote the references for themselves? Well we can, and we're going to (and there was much rejoicing across the land).

First lets add a Resource model:

class Resource(models.Model):
    title = models.CharField(maxlength=50)
    markdown_id = models.CharField(maxlength=50)
    content = models.FileField(upload_to="myapp/resource")

Now we need to make a few modifications to our Entry from earlier:

class Entry(models.Model):
     title = models.CharField(maxlength=200)
     body = models.TextField()
     body_html = models.TextField(blank=True, null=True)
     use_markdown = models.BooleanField(default=True)

    def save(self):
        if self.use_markdown:
            pieces = [self.html,]
            for res in Resource.objects.all():
                 ref = u'\n\n[%s]: %s "%s"\n\n' % (
                    res.markdown_id,
                    res.get_content_url(),
                    res.title,
                    )
                pieces.append(res)
            content = u"\n".join(pieces)
            self.body_html = markdownpp.markdown(content)
        else:
            self.body_html = self.body
        super(Entry,self).save()

And now when you write an article you can use any of the markdown_id's from any Resources you have created.

After looking at the code you'll think to yourself: "Why not have it only import the Resources that the Entry is actually using? Just throw in a quick many to many relationship and...." Infact, that is what I did, but it quickly becomes a very unpleasant solution... let me explain.

If you have a ManyToMany field linking you Entry to Resource instances, then you'll do something like this to get the related resources:

refs = entry.resources.all()

Which is great, really easy and all that jazz. Only it doesn't work. The problem is that when you save your Entry you are also saving your new relationships between Resources and the Entry. This means at the point in time where you save the Entry, the fully updated list of Resources related to that Entry is not yet available. So you'd think you could do something like this:

def save:
    super(Entry,self).save()
    res = self.resources.all()
    # etc etc
    super(Entry,self).save()

But unfortunately you can't. For whatever reason the change doesn't propegate to the database quickly enough for it to be updated at that point (disclaimer: I do my development mostly using SQLite3, which has abysmal database locking, perhaps this approach would work better on PostgreSQL, but I doubt it).

Alright, now you're thinking "lets just use the dispatcher to listen for a post save hook, and then save the Entry a second time." And you're right, that works, sort of, but not as cleanly as you might think. The crux of the problem is that if you call resave again immediately with the post_save hook then things still won't be updated yet, you have to actually wait for 2-3 seconds for the change to be available.

I don't particularly recommend using it, but for the sake of completeness here is my current solution to this updating problem:

import time, thread
from django.db import models
from django.dispatch import dispatcher
from django.db.models import signals
#######################
### your models go here ###
#######################

def resave_object(sender, instance, signal, *args, **kwargs):
    def do_save():
        time.sleep(3)
        try:
            instance.save()
        except:
            pass
    id = unicode(instance) + unicode(instance.id)
    try:
        should_resave = resave_hist[id]
    except KeyError:
        resave_hist[id] = True
        should_resave = True
    if should_resave is True:
        resave_hist[id] = False
        thread.start_new_thread(do_save, ())
    else:
        resave_hist[id] = True

resave_hist = {}
dispatcher.connect(resave_object, signal=signals.post_save, sender=Entry)

This is very hacky, but it is the only solution I have come up with that works (other than simply adding all resources to every entry). The crux of it is that it uses a seperate thread to wait for three seconds and then resaves it. There is some additional code to prevent infinite save loops.

Onward and Upward

Precreating reference links so you don't have to is an idea that has legs: you can create automatic links to previous and next articles, to blogs on your blog roll, whatever you use. Give it a quick thought, maybe you can transfer some of your writing workflow onto the software instead.

I'd be interested to see if anyone has any better solutions for only loading relevant reference. Let me know if there are any questions.