Cleanly Extending Python Markdown with Syntax Highlighting

Published on January 9, 2008. python (64), markdown (6)

When I first started working on my own blog, one of the first things I started working on was getting code syntax highlighting for my entries. I even implemented a hacky article on the topic.

The crux of the challenge is extending Markdown to have a syntax that indicates a block should be highlighted. I'm still pretty comfortable with the syntax I chose:

def add(a, b):
    return a + b

Its nothing fancy, but it gets the job done. However, my first attempt at extending Python-Markdown to render that syntax correctly was kind of horrific. It worked, I mean it worked okay, but damn if it didn’t munge the entire Python-Markdown library while it did it.

That is a forgivable sin in some situations, but the implementors of Python-Markdown went out of their way to make it extensible... so I felt a bit dirty about it. As I have been working a lot on my blogging software, I decided that now was the time to fix my previous silliness.

Lets get to work.

Step 1: Get a new copy of Markdown

My old copy of Markdown was crippled and in tears after my first modifications, so I had to get a fresh copy. You'll also want to grab a copy of pygments while you're at it.

easy_install pygments

Step 2: Write the Damn Thing

There is a full-featured example in the Markdown library (search for FOOTNOTE to jump to it), which is a boon. Whenever confusion finds you, go look at it for guidance.

Now we need to make a new module to put our code in. It doesn't (and shouldn't be) in the same file as markdown.py. I named mine code.py, since I have it in a folder named markup. If you are placing yours in a folder with a less suggestive name, you may want to try a better name.

The first thing you need to write is a preprocessor. Preprocessors need to define one function:

def run(self, lines):
    # do things
    return lines

The Markdown library splits all the lines on "\n" and then feeds you the result. If you want to operate on the text as a blob, then you have to rejoin it yourself:

blob = u"\n".join(lines)

So our class is going to be called CodeBlockPreprocessor (catchy, I know), and its going to have this run method:

def run (self, lines):
    new_lines = []
    seen_start = False
    lang = None
    block = []
    for line in lines:
        if line.startswith("@@") is True and seen_start is False:
            lang = line.strip("@@ ")
            seen_start = True
        elif line.startswith("@@") is True and seen_start is True:
            lexer = get_lexer_by_name(lang)
            content = "\n".join(block)
            highlighted = highlight(content, lexer, HtmlFormatter())
            new_lines.append("\n%s\n" % (highlighted))
            lang = None
            block = []
            seen_start = False
        elif seen_start is True:
            block.append(line)
        else:
            new_lines.append(line)
    return new_lines

We walk through all the lines looking for the start to a code block (represented by two consecutive at symbols (@) at the beginning of a line). If we find one, we ignore text until we find a closing block (if there is no closing block, then everything after the opening block will be discarded... a bit ungraceful, but won't allow any undesirables through either). Then we use Pygments to color the code inbetween the start and end, using the lexer indicated on the opening line of the block (for example @@ ruby uses ruby, and @@ html+django uses html+django).

After we finish the run method, we just have to write some generic code, and soon we'll have a clean extension to Python-Markdown.

First we need to do some imports at the top of our file:

import re
from ddmarkup import markdown
from pygments import highlight
from pygments.formatters import HtmlFormatter
from pygments.lexers import get_lexer_by_name

then we need to write a simple class that we'll use to control our new preprocessor.

class CodeExtension :
    def extendMarkdown(self, md):
        index = md.preprocessors.index(markdown.HTML_BLOCK_PREPROCESSOR)
        preprocessor = CodeBlockPreprocessor()
        preprocessor.md = md
        md.preprocessors.insert(index, preprocessor)

This is about as simple as classes get. You take an instance of the Markdown class, and then you add an instance of CodeBlockPreprocessor to its list of preprocessors (before the HTML_BLOCK_PREPROCESSOR).

Lastly, we need to make a function to call markdown using our new preprocessor.

def render(text):
    md = markdown.Markdown()
    codeExtension = CodeExtension()
    codeExtension.extendMarkdown(md)
    md.source = text
    return unicode(md)

We create an instance of Markdown, add our extension, and then render away. If we want to we can make it accept arguments from the command line as well:

if __name__ == '__main__':
    print render(file(sys.argv[1]).read())

Although it seem like more effort than it was worth the first time I modified Python-Markdown, its really a well designed library, and a good example of designing libraries so that others can cleanly extend them. Give its code a read sometime.