Irrational Exuberance!

Refactoring & Testing Our Dynamo Clone

January 2, 2010. Filed under erlangdistributed-systems

Time was a bit scarce recently. As such, here is a foray into testing and refactoring before we cover physical, logical and vector clocks in the next entry.

Picking up where we left off, in this post we're going to undertake some refactoring to clean up the codebase before proceeding further with our implementation. Our goals are four:

  • simplify the kvs:store/1 function which has grown unweildly,
  • add unittests via EUnit,
  • seperate the client interface and server implementation into two files,
  • rework our message-passing based server as an Erlang gen_server.

Off we go.

Cleaning Up store

The kvs:store/1 function has gotten embarassingly long. Initially this may have been justifiable to keep explanation simple, but that time has passed.

The function contains both the pattern matching to disambiguate message types, and also the implementation of how to respond to each type of message. A quick simplification is to move each response into its own function.

The pattern-matching portion is simplified to:

store(Store = #kvs_store{data=Data, 
          pending_reads=Reads, pending_writes=Writes}) ->
    receive
        {Sender, retrieve, Client, Key} ->
            message_retrieve(Store, 
                Sender, Client, Key);
        %% and so on..

Where each response function1 looks like:

message_retrieve(Store, Sender, Client, Key) ->
    Sender ! {self(), retrieved, Client, 
        Key, proplists:get_value(Key, Data)},
    store(Store).

This work reduces the code into more easily consumed chunks, but it will also turn out to be instrumental in our rewriting the server as a gen_server later.

The changeset for this work is available here if you'd like to see the complete changes.

Adding Tests

Refactoring without tests is driving without road signs. After changing a substantial piece of functionality it is time-intensive to check that existing functionality remains intact. Where time intensive is a euphemism for humans do it badly and inevitably screw up.

Erlang provides a number of testing facilities2, the simplest of which is eunit, which we'll be using to test our project. EUnit is comparable to [phpunit] or [junit], except (perhaps) requiring a bit less boilerplate code.

Our initial eunit tests will be rather simple, but we can enhance them as we add further functionality down the line.

-module(kvs_tests).
-include_lib("eunit/include/eunit.hrl").
 
%% @doc test response when empty kvs process group
not_running_test() ->
    kvs:stop(),
    {error, not_running} = kvs:get(b),
    {error, not_running} = kvs:set(c, 100)    
 
%% @doc test most basic functionality
basic_test() ->
    started = kvs:start(3),
    {ok, undefined} = kvs:get(a),
    {ok, updated} = kvs:set(a, 5),
    {ok, 5} = kvs:get(a),
    stopped = kvs:stop().
 
%% @doc test performing many concurrent reads
many_reads_test() ->
    started = kvs:start(3),
    {ok, updated} = kvs:set(b, 100),
    Read = fun() -> {ok, 100} = kvs:get(b) end,
    lists:foreach(fun(_) -> spawn(Read) end,
		  lists:seq(0, 100)),
    stopped = kvs:stop().
 
%% @doc test performing many concurrent writes
many_writes_test() ->
    started = kvs:start(3),
    Write = fun() -> {ok, updated} = kvs:set(a, 101) end,
    lists:foreach(fun(_) -> spawn(Write) end,
		  lists:seq(0, 100)),
    stopped = kvs:stop().

Running the tests is straightforward, and can be done via a command-line script supplied by eunit, or via the Erlang shell.

Eshell V5.7.2  (abort with ^G)
1> c(kvs).
{ok,kvs}
2> c(kvs_tests).
{ok,kvs_tests}
3> kvs_tests:test().
kvs_tests: not_running_test...*failed*
::error:badarg
  in function kvs:get/2
  in call from kvs_tests:not_running_test/0


=======================================================
  Failed: 1.  Skipped: 0.  Passed: 3.
error

Well, looks like I need to make some tweaks to the refactor, but now we know that the tests are actually running.

Separation of Interface and Implementation

We'll move through this fairly quickly because it isn't particularly instructional, but the points of interest are:

  • splitting the current kvs.erl into kvs.erl and kvs_server.erl,
  • maintain the existing client interface,
  • because we have tests we can safely make changes, regression testing,
  • not adding any new functionality,
  • we keep using kvs process group; doesn't have to correspond to file.

The changeset for these changes is available here.

After the refactor comes to a close, it's time to rerun the tests.

5> c(kvs).
{ok,kvs}
6> c(kvs_server).
{ok,kvs_server}
8> c(kvs_tests).
{ok,kvs_tests}
9> kvs_tests:test().
kvs_tests: not_running_test...*failed*
::error:function_clause
  in function lists:'-filter/2-lc$^0/1-0-'/2
    called as '-filter/2-lc$^0/1-0-'({error,{no_such_group,kvs}},
              #Fun<kvs.1.52371442>)
  in call from kvs:stop/0
  in call from kvs_tests:not_running_test/0


=======================================================
  Failed: 1.  Skipped: 0.  Passed: 3.
error
10> 

Ahh, the kvs_tests:not_running_test is still having an issue, but everything else is passing. We've preserved the quirks and the functionality from before the refactor: a hallmark of a successful refactor.

From Message Passing to gen_server

The final stage of this refactor is to transition from our message-passing based implementation to one using an OTP gen_server. gen_server stands for generic server and is exactly what it sounds like. gen_servers are functionally equivalent to the message-passing based approach we've used thus far, but they have a few important advantages:

  • they eliminate boiler-plate code,
  • they play well with the rest of the Erlang OTP utilities like supervisors, hierarchy trees, etc,
  • they are familiar to other Erlang programmers so it is faster to understand a program's overall structure.

Most substantial Erlang projects either start out or end with at least one gen_server.

As we go through this rewrite, note that we'll keep the client interface constant, so that the hypothetical user is sheltered from any changes, and so that we can continue running our existing tests. As a bit of a spoiler, after a great deal of turmoil the combined size of kvs.erl and kvs_server.erl will decrease from 200 to 189 lines of code.

Now let's look at our first change: updating the implementation of the client interface for starting kvs servers in kvs.erl.

%% old start function
start(N) ->
    pg2:create(kvs),
    lists:foreach(fun(_) ->
		    Store = #kvs_store{data=[],
			       	 pending_reads=[],
			         pending_writes=[]},
		    pg2:join(kvs, spawn(kvs_server, store, [Store]))
		  end, lists:seq(0, N)),
    started.

%% new start function
start(N) ->
    lists:foreach(fun(_) ->
                   kvs_server:start_link().
                  end, lists:seq(0, N)),
    started.

We make similar changes to kvs:stop.

%% old stop function
stop() ->
    lists:foreach(fun(Pid) ->
			  pg2:leave(kvs, Pid),
			  Pid ! stop
		  end, pg2:get_members(kvs)),
    stopped.

%% new stop function
stop() ->
    lists:foreach(fun(Pid) ->
                          gen_server:call(Pid, stop)
                  end, pg2:get_members(kvs)),
    stopped.

Next we need to update the implementation of the client interface for updating and retrieving values. Two things worth noticing are:

  • we get the send & block on receive construct for free from gen_server/call,
  • we no longer need to explicitly pass along the sender's process id, as we'll get that for free as well.
%% old get implementation
get(Key, Timeout) ->
    Pid = pg2:get_closest_pid(kvs),
    Pid ! {self(), get, Key},
    receive
	{Pid, got, Value} ->
	    {ok, Value};
	{error, Error} ->
	    {error, Error}
    after 
	Timeout ->
	    {error, timeout}
    end.

%% new get implementation
get(Key, Timeout) ->
    Pid = pg2:get_closest_pid(kvs),
    gen_server:call(Pid, {get, Key}, Timeout).

And next the same changes for updating values.

%% old set implementation
set(Key, Val, Timeout) ->
    Pid = pg2:get_closest_pid(kvs),
    Pid ! {self(), set, Key, Val},
    receive
	{Pid, received, {set, Key, Val}} ->
	    {ok, updated};
	{error, Error} ->
	    {error, Error}
    after
	Timeout ->
	    {error, timeout}
    end.

%% new set implementation
set(Key, Val, Timeout) ->
    Pid = pg2:get_closest_pid(kvs),
    gen_server:call(Pid, {set, Key, Val}, Timeout).

Now that we've completed the changes to kvs.erl it's time to move onto kvs_server.erl. It would be rather dull to look at all of the changes required to convert into a gen_server (there is a changeset of everything which may be instructive). Rather, we'll take a high-level approach and note some of the interesting details.

In the old implementation we relied upon the store/1 function to dispatch incoming messages to the correct function, but the gen_server performs the dispatch for us, instead asking us to implement the handle_call/3 function.

%% excerpt from old implementation
store(Store) ->
    receive
        {Sender, get, Key} ->
            message_get(Store, Sender, Key);
        %% some other pattern matches
        stop ->
            ok
    end.

%% excerpt from new implementation
%% @doc handle explicit stop requests
handle_call(stop, Sender, State) ->
    {stop, stopped, stopped, State};

handle_call({get, Key}, Sndr, State=#kvs_store{pending_reads=Read}) ->
    %% implementation...

Above we are using handle_call, but there is a similar function named handle_cast which is also used in the new kvs_server.erl implementation. The difference between the two is that handle_call is a synchronous operation from the client's perspective, and handle_cast is an asynchronous operation from the client's perspective.

Note that handle_call needs only be a synchronous operation from the client's perspective, and that it is possible to have the server store the client request in its state, receive other incoming requests, and then return a response to the first request (this implementation makes extensive use of this pattern).

There are many other aspects of the gen_server implementation we could dive into, but we could get stuck in this investigation for a very long time. Rather, from our perspective the goal was to restructure the code so that we can make surgical algorithmic changes as the series progresses.

The full changeset of transitioning to a gen_server is available here.

Phew. Now that we've shuffled all of that code we've gotten out of the business of working against the OTP, and are now finally starting to work with it. Several entries from now after we've worked further on the core implementation, we'll return to this line of work and take advantage of some other Erlang goodness (in particular supervisors), but for the time being it is good to be ready to get on with some algorithms.

Next up, we'll address logical clocks, vector clocks and properly ordering updates.


  1. Note that the naming of the functions as message_??? is entirely arbitrary.

  2. The Erlang test_server is perhaps the most profoundly flexible and most profoundly confusing piece of testing software ever written. I have tried to grok its glory at several times, but still find it challenging to use at best.