Rewriting Parameterized URLs with Nginx
This past Sunday (and part of this morning) my goal was to finally kill off my old WordPress blog, repost any valuable content to this blog, and--the step I had been fearing--setup redirects between the old blog and new.
The redirects ended up being a bit tricky to handle using Nginx, because my WordPress URLs looked like this:
http://willarson.com/blog/?p=3
http://willarson.com/blog/?p=41
Not exactly pretty urls. When people talk about rewriting pretty urls, what they are talking about is taking a url in this format:
http://willarson.com/blog/3/
and converting it into the /p=3
format, but going the
other direction is rarely discussed. Especially with Nginx,
whose rewriting module receives less coverage than Apache's
mod_rewrite
. After some time perusing
the Nginx Http Core Module documentation,
I was able to connive a solution.
This is the (slightly generalized) server definition for willarson.com
:
server {
listen *:80;
server_name www.willarson.com willarson.com;
if ($args ~ "p=(.*)") {
set $p $1;
rewrite ^/blog/?(.*)$ /redirect/$p/ permanent;
}
access_log /home/django/domains/willarson.com/log/access.log;
error_log /home/django/domains/willarson.com/log/error.log;
<span class="kn">location</span> <span class="s">/</span> <span class="p">{</span>
<span class="kn">root</span> <span class="s">/home/django/domains/willarson.com/public</span><span class="p">;</span>
<span class="kn">index</span> <span class="s">index.html</span><span class="p">;</span>
<span class="p">}</span>
}
The key here is the set $p $1;
line. On new versions of Nginx (newer than that packaged with Ubuntu Intrepid at any rate)
you could skip that step and use $arg_p
instead (possibly requires $arg_P
, docs are ambiguous on that).
It, in its hideous glory, rewrites a URL like
http://willarson.com/blog/?p=17
to
http://willarson.com/redirect/17/?p=17
This is pretty awful, but is actually sufficient for my very basic needs. From there I wrote an equally awful Python script to generate redirect pages.
import sys
url = sys.argv[1]
print """
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<link rel="canonical" href="%s">
<script type="text/javascript">
<!--
window.location = "%s"
//-->
</script>
<meta http-equiv="Refresh" content="5; url=%s">
</head>
<body>
<h2>You will be redirected shortly. Sorry for the inconvenience.</h2>
<a href="%s">Click here to transition more quickly.</a>
</body>
</html>
""" % (url, url, url, url)
(I decided to use the new canonical metaword because it is new and shiny, and will hopefully facilitate moving all of the old blog's Google rank to the new blog. We'll see.)
Then I decided on the mappings from the old blog to the new blog for each entry (the old blog only had 27 or so articles, 9 of which were worthy of reposting, 5 more which were worthy of specific redirects to newer content, and the rest of which were redirected to the front page of the new blog. Basically anything with traffic got a custom redirect to something relevant, word-junk got the generic redirect to the front).
Then for each page I did something like this:
cd ~/domains/willarson.com/public/
mkdir 17
python create_redirect http://lethain.com/etc/etc/ > 17/index.html
You can try it out.
And that's all he wrote.