Creating Multi-Worksheet Excel Docs with xlwt

11/04/2012

This tutorial walks through using the xlwt library to create a multi-worksheet Excel document in Python. We'll be using the example of calculating our AWS server expenses, and in adition to xlwt, we'll also use requests to download the latest EC2 pricing data from Amazon.

I picked this example because I did exactly that earlier this week, using the much less enjoyable search-copy-paste method. (Which sadly means my pricing information is already outdated.)

Preparation

Before we get started, let's create a virtualenv to install our Python dependencies into. (If you haven't used virtualenv before take a gander here to get up to speed, it's a straightforward way to manage packages requirements for different projects.)

mkdir -p ~/code/
cd ~/code
virtualenv --no-site-packages aws_expenses

Next, install these dependencies into our new virtualenv:

cd ~/code/aws_expenses
. bin/activate
pip install requests==0.14.2 xlwt==0.7.4

Now we can quickly verify that the install succeeded:

~/code/aws_expenses/bin/python

Finally, verify that the imports work:

>>> import xlwt
>>> xlwt.__VERSION__
'0.7.4
>>> import requests
>>> requests.__version__
'0.14.2'

Now we're ready to get started.

Retrieve AWS Pricing via API

First, let's start by reverse-engineering the AWS EC2 pricing data into a useful format:

>>> import requests
>>> url = "http://aws.amazon.com/ec2/pricing/pricing-on-demand-instances.json"
>>> pricing = requests.get(url).json
>>> pricing.keys()
[u'vers', u'config']
>>> pricing['vers']
0.01
>>> pricing['config'].keys()
[u'regions', u'rate', u'currencies', u'valueColumns']
>>> type(pricing['config']['regions']
<type 'list'>
>>> pricing['config']['regions'][0].keys()
[u'region', u'instanceTypes']
>>> pricing['config']['regions'][0]['region']
u'us-east'
>>> pricing['config']['regions'][0]['instanceTypes'][0].keys()
[u'type', u'sizes']
>>> type(pricing['config']['regions'][0]['instanceTypes'][0]['sizes'])
<type 'list'>
>>> pricing['config']['regions'][0]['instanceTypes'][0]['sizes'][0]
{u'valueColumns': [
    {u'prices': {u'USD': u'0.065'},
     u'name': u'linux'},
    {u'prices': {u'USD': u'0.115'},
     u'name': u'mswin'}
   ],
 u'size': u'sm'}

OK, so after swimming around in that ocean, we've cobbled together enough information to plan out the data we want.

Let's make three simplifying assumption:

  1. we're only interested in the us-east region,
  2. we're only interested in linux machines,
  3. we're only interested in USD prices.

Within those simplifying assumptions, a reasonable way for us to represent the data we're interested in is a list of 3-tuples, with this information:

(instanceType, size, price)

For example, a small standard instance would be represented like:

('stdODI', 'sm', '0.115')

We can accomplish that using this function:

EC2_URL = "http://aws.amazon.com/ec2/pricing/pricing-on-demand-instances.json"
def prices(region_name="us-east", os="linux", currency="USD"):
    prices = requests.get(EC2_URL).json
    region = next((x for x in prices['config']['regions'] \
                   if x['region'] == region_name))
    price_tuples = []
    for instance_type in region['instanceTypes']:
        instance_name = instance_type['type']
        for size in instance_type['sizes']:
            size_name = size['size']
            size_price = next((x for x in size['valueColumns'] \
                               if x['name'] == os))['prices'][currency]
            price_tuples.append((instance_name, size_name, size_price))
    return price_tuples

Usage looks like:

>>> prices()
[(u'stdODI', u'sm', u'0.065'),
 (u'stdODI', u'med', u'0.130'),
 ...
]

Now that we can retrieve have the AWS pricing data, let's move that data into Excel.

Writing AWS Pricing to Excel

Let's start out by writing an Excel file with just the header row for prices:

import xlwt
wb = xlwt.Workbook()
ws = wb.add_sheet('EC2 Pricing')

headers = ('Type', 'Size', 'Price/hour (USD)', 'Price/month (USD)')
for i, header in enumerate(headers):
    ws.row(0).write(i, header)

wb.save('aws.xls')

We're using enumerate to track position within the headers--and will do the same for the actual pricing data--because the xlwt interfaces don't support append operations.

Now let's look at the snippet to write our pricing data beneath the headers:

# this is the prices function we wrote above
aws_prices = prices()

# this converts (0,0) to A1, (1,0) to B1 and so on
from xlwt.Utils import rowcol_to_cell

for i, (instance, size, price) in enumerate(aws_prices, start=1):
    ws.row(i).write(0, instance)
    ws.row(i).write(1, size)
    ws.row(i).write(2, float(price))

    # this generates the formula C0 * (30 * 24),
    # calculating the monthly rate from the hourly rate
    hourly_cell = rowcol_to_cell(i, 2)
    hours_in_month = 24 * 30
    ws.row(i).write(3, xlwt.Formula("%s * %s" % (hourly_cell, hours_in_month)))

Now we're generating a full Excel spreadsheet from the AWS data, with a working formula.

(See the full code thus far on Github.)

Making the Cross-Worksheet References

The final step is creating a second sheet using this data to project expenses. First, let's create the headers for our new Instances worksheet:

headers = ('Type', 'Size', 'Num Instances', 'Cost/month (USD)')
for i, header in enumerate(headers):
    ws.row(0).write(i, header)

Next, let's look at how we'll create the cross-worksheet reference to the EC2 Pricing worksheet:

for i, (instance, size, _) in enumerate(aws_prices, start=1):
    ws.row(i).write(0, instance)
    ws.row(i).write(1, size)

    # instead of cost, we're writing the number of EC2 instances
    ws.row(i).write(2, 0)

    # multiplying monthly price by number of instances
    instances_cell = rowcol_to_cell(i, 2)
    monthly_price_cell = rowcol_to_cell(i, 3)
    formula = "'EC2 Pricing'!%s * %s" % (monthly_price_cell, instances_cell)
    ws.row(i).write(3, xlwt.Formula(formula), style)

Now we can play around with our infrastructure costs just by changing the values in the Num Instances column. The last piece is to add a row which sums the total infrastructure cost across all instances.

Using Excel's SUM function, this ends up being quite simple:

# calculate total infrastructure cost
num_rows = len(ws.rows)
ws.row(num_rows).write(2, "Total")

first_price_cell = rowcol_to_cell(1, 3)
last_price_cell = rowcol_to_cell(num_rows-1, 3)
total_formula = "SUM(%s:%s)" % (first_price_cell, last_price_cell)
ws.row(num_rows).write(3, xlwt.Formula(total_formula), style)

Finally, we need to call this new write_instances function as part of creating our workbook. Note that you must create the EC2 Pricing worksheet before populating the Instances worksheet because xlwt verifies existance of cross-referenced worksheets.

It only validates the worksheet exists--not whether the specific cross-referenced rows exist--so you can still create Instances before EC2 Pricing as long as you wait to call write_instances until after both exist.

All together, exercising our functions looks like:

wb = xlwt.Workbook()
ws1 = wb.add_sheet('Instances')
ws2 = wb.add_sheet('EC2 Pricing')

aws_prices = prices()
write_prices(ws2, aws_prices)
write_instances(ws1, aws_prices)

wb.save('aws.xls')

With that, all the pieces have come together, and we have created a multiple worksheet Excel document using xlwt.

The full code for the script is available as excel.py on Github.

You can also see the generated spreadsheet on Google Docs.

If you're looking to go further with xlwt, take a look at their examples repository on Github.

All Rights Reserved, Will Larson 2007 - 2014.