Creating Multi-Worksheet Excel Docs with xlwt
This tutorial walks through using the xlwt library to create a multi-worksheet Excel document in Python. We’ll be using the example of calculating our AWS server expenses, and in adition to xlwt, we’ll also use requests to download the latest EC2 pricing data from Amazon.
I picked this example because I did exactly that earlier this week, using the much less enjoyable search-copy-paste method. (Which sadly means my pricing information is already outdated.)
Preparation
Before we get started, let’s create a virtualenv to install our Python dependencies into. (If you haven’t used virtualenv before take a gander here to get up to speed, it’s a straightforward way to manage packages requirements for different projects.)
:::shell
mkdir -p ~/code/
cd ~/code
virtualenv --no-site-packages aws_expenses
Next, install these dependencies into our new virtualenv:
:::shell
cd ~/code/aws_expenses
. bin/activate
pip install requests==0.14.2 xlwt==0.7.4
Now we can quickly verify that the install succeeded:
:::shell
~/code/aws_expenses/bin/python
Finally, verify that the imports work:
:::pycon
>>> import xlwt
>>> xlwt.__VERSION__
'0.7.4
>>> import requests
>>> requests.__version__
'0.14.2'
Now we’re ready to get started.
Retrieve AWS Pricing via API
First, let’s start by reverse-engineering the AWS EC2 pricing data into a useful format:
:::pycon
>>> import requests
>>> url = "http://aws.amazon.com/ec2/pricing/pricing-on-demand-instances.json"
>>> pricing = requests.get(url).json
>>> pricing.keys()
[u'vers', u'config']
>>> pricing['vers']
0.01
>>> pricing['config'].keys()
[u'regions', u'rate', u'currencies', u'valueColumns']
>>> type(pricing['config']['regions']
<type 'list'>
>>> pricing['config']['regions'][0].keys()
[u'region', u'instanceTypes']
>>> pricing['config']['regions'][0]['region']
u'us-east'
>>> pricing['config']['regions'][0]['instanceTypes'][0].keys()
[u'type', u'sizes']
>>> type(pricing['config']['regions'][0]['instanceTypes'][0]['sizes'])
<type 'list'>
>>> pricing['config']['regions'][0]['instanceTypes'][0]['sizes'][0]
{u'valueColumns': [
{u'prices': {u'USD': u'0.065'},
u'name': u'linux'},
{u'prices': {u'USD': u'0.115'},
u'name': u'mswin'}
],
u'size': u'sm'}
OK, so after swimming around in that ocean, we’ve cobbled together enough information to plan out the data we want.
Let’s make three simplifying assumption:
- we’re only interested in the
us-east
region, - we’re only interested in
linux
machines, - we’re only interested in
USD
prices.
Within those simplifying assumptions, a reasonable way for us to represent the data we’re interested in is a list of 3-tuples, with this information:
:::python
(instanceType, size, price)
For example, a small standard instance would be represented like:
:::python
('stdODI', 'sm', '0.115')
We can accomplish that using this function:
:::python
EC2_URL = "http://aws.amazon.com/ec2/pricing/pricing-on-demand-instances.json"
def prices(region_name="us-east", os="linux", currency="USD"):
prices = requests.get(EC2_URL).json
region = next((x for x in prices['config']['regions'] \
if x['region'] == region_name))
price_tuples = []
for instance_type in region['instanceTypes']:
instance_name = instance_type['type']
for size in instance_type['sizes']:
size_name = size['size']
size_price = next((x for x in size['valueColumns'] \
if x['name'] == os))['prices'][currency]
price_tuples.append((instance_name, size_name, size_price))
return price_tuples
Usage looks like:
:::pycon
>>> prices()
[(u'stdODI', u'sm', u'0.065'),
(u'stdODI', u'med', u'0.130'),
...
]
Now that we can retrieve have the AWS pricing data, let’s move that data into Excel.
Writing AWS Pricing to Excel
Let’s start out by writing an Excel file with just the header row for prices:
:::python
import xlwt
wb = xlwt.Workbook()
ws = wb.add_sheet('EC2 Pricing')
headers = ('Type', 'Size', 'Price/hour (USD)', 'Price/month (USD)')
for i, header in enumerate(headers):
ws.row(0).write(i, header)
wb.save('aws.xls')
We’re using enumerate to track position within the headers–and will do the same for
the actual pricing data–because the xlwt
interfaces don’t support append operations.
Now let’s look at the snippet to write our pricing data beneath the headers:
:::python
# this is the prices function we wrote above
aws_prices = prices()
# this converts (0,0) to A1, (1,0) to B1 and so on
from xlwt.Utils import rowcol_to_cell
for i, (instance, size, price) in enumerate(aws_prices, start=1):
ws.row(i).write(0, instance)
ws.row(i).write(1, size)
ws.row(i).write(2, float(price))
# this generates the formula C0 * (30 * 24),
# calculating the monthly rate from the hourly rate
hourly_cell = rowcol_to_cell(i, 2)
hours_in_month = 24 * 30
ws.row(i).write(3, xlwt.Formula("%s * %s" % (hourly_cell, hours_in_month)))
Now we’re generating a full Excel spreadsheet from the AWS data, with a working formula.
(See the full code thus far on Github.)
Making the Cross-Worksheet References
The final step is creating a second sheet using this data to project expenses.
First, let’s create the headers for our new Instances
worksheet:
:::python
headers = ('Type', 'Size', 'Num Instances', 'Cost/month (USD)')
for i, header in enumerate(headers):
ws.row(0).write(i, header)
Next, let’s look at how we’ll create the cross-worksheet reference to the
EC2 Pricing
worksheet:
:::python
for i, (instance, size, _) in enumerate(aws_prices, start=1):
ws.row(i).write(0, instance)
ws.row(i).write(1, size)
# instead of cost, we're writing the number of EC2 instances
ws.row(i).write(2, 0)
# multiplying monthly price by number of instances
instances_cell = rowcol_to_cell(i, 2)
monthly_price_cell = rowcol_to_cell(i, 3)
formula = "'EC2 Pricing'!%s * %s" % (monthly_price_cell, instances_cell)
ws.row(i).write(3, xlwt.Formula(formula), style)
Now we can play around with our infrastructure costs just by changing the values
in the Num Instances
column. The last piece is to add a row which sums
the total infrastructure cost across all instances.
Using Excel’s SUM
function, this ends up being quite simple:
:::python
# calculate total infrastructure cost
num_rows = len(ws.rows)
ws.row(num_rows).write(2, "Total")
first_price_cell = rowcol_to_cell(1, 3)
last_price_cell = rowcol_to_cell(num_rows-1, 3)
total_formula = "SUM(%s:%s)" % (first_price_cell, last_price_cell)
ws.row(num_rows).write(3, xlwt.Formula(total_formula), style)
Finally, we need to call this new write_instances
function as part
of creating our workbook. Note that you must create the EC2 Pricing
worksheet before populating the Instances
worksheet because xlwt
verifies existance of cross-referenced worksheets.
It only validates the worksheet exists–not whether the specific cross-referenced rows exist–so you can
still create Instances
before EC2 Pricing
as long as
you wait to call write_instances
until after both exist.
All together, exercising our functions looks like:
:::python
wb = xlwt.Workbook()
ws1 = wb.add_sheet('Instances')
ws2 = wb.add_sheet('EC2 Pricing')
aws_prices = prices()
write_prices(ws2, aws_prices)
write_instances(ws1, aws_prices)
wb.save('aws.xls')
With that, all the pieces have come together, and we have created a multiple worksheet Excel document using xlwt.
The full code for the script is available as excel.py on Github.
You can also see the generated spreadsheet on Google Docs.
If you’re looking to go further with xlwt, take a look at their examples repository on Github.