Deployment Scripts With BeautifulSoup

June 9, 2008. Filed under python 56 ptd 2

Working on Processed Tower Defense it became pretty clear that the code was getting too large to keep in a single file. Sure, the computer didn't think it was particularly large, but it was starting to make finding things really difficult, especially since we use text editors (I use Emacs, Peter uses TextMate) instead of IDEs.

Peter solved the problem easily enough by breaking the monolithic ptd.js into five or six files, but in doing so a new problem popped up: how to put humpty dumpty together again when deploying the game?

Fortunately, Python and BeautifulSoup came to my rescue and made it possible to put together a simple but powerful deployment script. First I added a containing div for all the loaded javascript files, which looked like this:

<div id="js_imports">
<script src="processing.js" type="text/javascript" charset="utf-8" ></script>
<script src="jsfprocessing.js" type="text/javascript" charset="utf-8" ></script>
<script src="jquery-1.2.6.min.js" type="text/javascript" charset="utf-8" ></script>
<script src="game/creep_waves.js" type="text/javascript" charset="utf-8" ></script>
<script src="game/terrain.js" type="text/javascript" charset="utf-8" ></script>
<script src="game/util.js" type="text/javascript" charset="utf-8" ></script>
<script src="game/creeps.js" type="text/javascript" charset="utf-8" ></script>
<script src="game/ui_modes.js" type="text/javascript" charset="utf-8" ></script>
<script src="game/weapons.js" type="text/javascript" charset="utf-8" ></script>
<script src="game/ptd.js" type="text/javascript" charset="utf-8" ></script>
</div>

Then it merges them all together:

from BeautifulSoup import BeautifulSoup
html = open('ptd.html', 'r')
soup = BeautifulSoup(html.read())
html.close()
merged_js = open('deploy/ptd.js', 'w')
tags = soup.findAll("script")
for tag in tags:
    if tag.has_key('src'):
        src_file = open(tag['src'],'r')
        merged_js.write(src_file.read())
        src_file.close()
merged_js.close()

The findAll method from BeautifulSoup makes it easy to locate all the script tags in ptd.html. Then we iterate through and append the contents of any imported script files to the new merged_js file we are creating.

Interruption from our scheduled program.

We can rewrite this loop with Python list comprehensions to get a more concise and pleasant feel to our code.

def merge(file):
    src = open(file,'r')
    merged_js.write(src.read())
    src.close()
[merge(tag['src']) for tag if tag.has_key('src')]

If we weren't good programmers who close files they open we could compress this into a single line:

[merged_js.write(open(tag['src'],'r').read()) for tag if tag.has_key('src')]

Unfortunately, we are thoughtful programmers who don't open up files without closing them afterwards. Regardless, I think its usually a fun and occasionally valuable exercise to consider rewriting your loops with list comprehensions to remember the syntax and admire their quiet density.

Returning to where we were...

Okay, so we just merged the imported JavaScript files into one file. Now we want to replace that entire js_imports div with a single import of the new merged file. This is also pretty easy.

from BeautifulSoup import BeautifulSoup, Tag
# html is a string containing raw html
soup = BeautifulSoup(html)
js_imports_div = soup.find(id="js_imports")
merged_src = Tag(soup, "script")
merged_src["src"] = u"ptd.js"
merged_src["charset"] = "utf-8"
merged_src["type"] = "text/javascript"
js_imports_div.replaceWith(merged_src)

We simply search the soup for js_imports, and then use the replaceWith method to insert a new Tag that we create. And with that we've fixed up our html file and merged our JavaScript file, but still have a few more things to tidy up.

First, we need to write our fixed up html to file:

deploy_html = open('deploy/ptd.html', 'w')
deploy_html.write(str(soup))
deploy_html.close()

Which turns out to be easy enough. After that we have one last step, to minify the JavaScript file to save space.

from jsmin import jsmin
merged_js = open('deploy/ptd.js', 'r')
js = merged_js.read()
merged_js.close()
minified_js = open('deploy/ptd.js, 'w')
minified_js.write(jsmin(js))
minified_js.close()

Which will minify the deploy/ptd.js file, saving us a bit of space. It may feel a bit backwards to minify the entire file at once, instead of minifying the pieces before merging them into the merged_js file, but in my experience merging them piece by piece can very easily lead to the code being broken.

Now that the relatively complex aspects of the deployment script are handled, everything else is easily accomplished with a simple shell script. Your shell script might end up looking something like this:

echo "Beginning deploy script..."
rm -rf deploy/*
echo "Merging & minifying javascript imports..."
python scripts/build.py
echo "Copying files..."
cp -r assets deploy/assets/
cp LICENSE deploy/LICENSE
echo "Deploy script completed! Files are in deploy/"

Since remembering the ease of html manipulation with BeautifulSoup, I've been on something of a craze while converting all of static webpages to use simple build scripts along these lines. For example, ptdef.com is generated by running a second deploy script on the results of the first PTD deploy script, which once extracts inlined JavaScript and injects it into the ptdef.com template, along with the contents of the game division.

Also, my extremely simple personal website at willarson.com is a simple html template which has some some MarkDown content rendered and then injected into it to generate the deployed version. That build script is a lazy twenty-four lines. Its short enough to look at briefly as a quick nightcap:

import markdown
from BeautifulSoup import BeautifulSoup

def main():
    main_file = open("main.md", 'r')
    main_md = main_file.read()
    main_file.close()
    main_html = markdown.markdown(main_md)
    index_file = open("index.html", 'r')
    index_html = index_file.read()
    index_file.close()
    index_soup = BeautifulSoup(index_html)
    main_div = index_soup.find(id="main")
    main_div.replaceWith(main_html)
    merged_file = open("deploy/index.html",'w')
    merged_file.write(str(index_soup))
    merged_file.close()

if __name__ == "__main__":
    main()

which is managed by a simple shell script:

echo "Building deploy copy of 'willarson.com'..."
echo "Removing old files..."
rm -rf deploy/*
echo "Running scripts/build.py..."
python scripts/build.py
echo "Copying media files..."
cp -r media deploy/
echo "Finished building deploy copy of 'willarson.com'!"

Anyway, I hope that others can use some of the ideas here to make their lives simpler by building flexible and convenient deploy scripts for their projects and websites.

Let me know if you have any problems or questions.