Python Datastructures Backed by Redis

09/05/2010

I've been working with Redis quite a bit at work lately, and I've really taken to it. At some point it occurred to me that it would be terribly straightforward to use Redis as the backend for the most common Python datastructures, lists and dictionaries. Not only easy, but doing so would even provide some benefits: readability, distribution across processes/threads/machines, and familiarity with the existing interfaces.

The code resulting from these ideas is available on Github. Before you run off, let's consider a few simple but powerful applications of this approach: a shared configuration mechanism and ye olde publisher-consumer example.

Shared Configuration

Once you start working at scale, managing shared configuration data becomes an increasingly important problem. Fortunately, it's a solvable problem. Here is one solution.

import redis_dict

# create and edit the configuration
x = redis_dict.RedisDict()
x['mysql-master'] = "127.0.0.0:1999"
x['mysql-slaves'] = "127.0.0.1:2000,127.0.0.2:2001"
x['mysql-slaves'] = x['mysql-slaves'] + ",127.0.0.3:2002"

# read the config on your machines
x = redis_dict.RedisDict()
master = x['mysql-master'].split(':')
slaves = [ y.split(':') for y in x['mysql-slaves'].split(',') ]

Pretty straightforward. Long-running processes would need to check in occasionally to see if configuration had changed, which is a drag, but doable.

Distributed Producers and Consumers

At a certain point most architectures are in dire need of a message queue. Many of them end up with a weird queue implemented on-top of MySQL. Many others end up using RabbitMQ, which doesn't always quite live up to it's motto of Messaging that just works. Let's examine a different approach.

import redis_list

# producer
import random
x = redis_list.RedisList("producer-queue")
for i in xrange(0, 100):
    x.append(random.randint(0,101)

# consumer
x = redis_list.RedisList("producer-queue")
while True:
    n = x.pop(blocking=True)
    print "Popped %s" % (n,)

There could a large number of producers and consumers, and they wouldn't need to be on the same machines. Instead of play with Python threads or multiprocessing you could just spawn as many as you want from the command line. Cute, eh?

A well configured Redis setup could use write-failover to a slave if the master went down, allowing the system to keep functioning without losing data and without waiting to recover the master; properties which are hard to replicate with RabbitMQ. For the more ambitious, it would be straightforward to ensure queue stability by limiting it's maximum size (via the Redis LTRIM command). The most ambitious, one could follow a Kestrel-like strategy and monitor the size of the queue, pull messages off the queue when it becomes too large, write them to file, and then repopulate the queue from the queue when it recovered (this could be done without losing message ordering by using two Redis lists wherein objects are moved from an input queue to either the serialized file or the processing queue if the processing queue is below it's minimum size, and nothing is written directly to the processing queue from the input queue while there is data remaining on disk).

Ok. Go build something.

All Rights Reserved, Will Larson 2007 - 2014.