Thoughts on Dating Erlang

Published on April 30, 2008. functional (4), erlang (20)

Recently I have been using Erlang for a project I am working on, and its been a fairly good experience so far. The project I am working on involves putting together a server that parses data from a custom packet I have defined, and here are some thoughts.

Message Passing

Upon parsing an incoming packet, the server needs to write the parsed data to a database. For a number of reasons, I wanted to try having only one process talking to the database (a SQL database, likely Postgres, probably not a distributed database) at once. I once attempted to do something of that sort with Python, and it got messy quite quickly. Fortunately, the message passing paradigm makes this very easy to implement.

I have a looping data storing process that waits for incoming messages, and then stores them.

%% Listens for messages, and then stores contained data.
data_store() ->
    receive
	{store, Packet} ->
	    store(parse(Packet))
    end,
    data_store().

Then I need to make that process available for other processes to talk to by registering it in my startup code.

register(data_store_PID, spawn(fun() -> data_store() end)).

With that setup, now any process that wants to store data can simply send a message to the data store, and the queue and threading will all be handled with nary a mutex or semaphore.

handle_data(Data) ->
    data_store_PID ! {store, Data}.

Message passing provides a transparent solution for (some kinds of) concurrency, and its a breath of fresh air after dealing with standard threading implementations. Certainly, the cleanliness of the implementation is heavily dependent on the fact that Erlang's message passing is based on processes rather than threads. The distinction between the two is that processes cannot have shared data, whereas threads may have shared data¹.

If you haven't had many positive concurrent experiences yet, I really do recommend giving Erlang a try, even if you don't intend to use it too seriously. It just feels good to see concurrency that works easily and well.

A drummer leaning backwards on top of a float.

A Simple Server Loop

Search the web for "Erlang socket tutorial", or perhaps "Erlang gen_tcp man", and you'll find some nearly identical examples of a pretty impressive server loop. (The one I found first, and which I have--with slight modifications and all errors my own--taken the below code from is here.)

-define(TCP_OPTIONS,[list,{packet, 0},{active, false}]).
listen(Port) ->
    LSocket = gen_tcp:listen(Port,?TCP_OPTIONS),
    accept(LSocket).
accept(LSocket) ->
case gen_tcp:accept(LSocket) of
{ok, S} -> spawn (fun() -> handle(S) end);
Other -> ok
end,
accept(LSocket).
handle(Socket) ->
case gen_tcp:recv(Socket, 0) of
{ok, Packet} ->
data_store_PID ! {store, Packet};
{error, Error} ->
io:format("error: wn", [Error])
end.

Thats a pretty lean server, but also an impressively functional one as well. It loops waiting for incoming connections, and then spawns a new process to handle each incoming connection. Since Erlang processes are much more light weight than most threading implementations², this is actually a conceivable solution rather than a pipe dream (I say that crossing my fingers and without any factual basis).

A picture on the back of a float in Japan

Strings as Lists

Erlang doesn't have any specific tools or support for strings, but instead they are simply lists of numbers. This means they can be manipulated with all of the available string functions (prefix, nthtail, map, foreach, fold, its all here). Combined with the fact that each character consumes 8 bytes of memory, and it isn't particularly strong at string handling. But, it is certainly possible.

In my case, I needed my server to parse the data out of incoming packets. I must admit that my one networking course I took at college really helped mentally prepare me for this task, as we wrote finite state machines to parse incoming packets, instead of using something simpler like regular expressions.

Here is a snippet of the parsing code which builds a token that ends on a double newline: "nn".

-define(NEWLINE, 10). %% Ascii value for '\n'.
next_value_token(Packet) ->
next_value_token(Packet, []).
next_value_token([], Acc) ->
{not_found, lists:reverse(Acc)};
next_value_token([First, Second|Rest], Acc) when First == Second ->
case First of
?NEWLINE -> {lists:reverse(Acc), Rest};
Other -> next_value_token([Second|Rest], [Other|Acc])
end;
next_value_token([First|Rest], Acc) ->
next_value_token(Rest, [First|Acc]).

Now, I'll hardly be the first one to call that code imminently readable, but it does show how Erlang's pattern matching can let you describe complex situations quite easily (I strongly suspect that a more legible solution exists than the one I have here, its something of an initial mishmash).

Translating that code into Python (roughly, and ignoring the build the linked-list in backwards and then reverse it functional idiom), and we get something quite different.

def next_value_token(packet, acc=[]):
    if packet == []:
        return ('not_found', acc)
    first = packet[0]
    second = packet[1]
    if first == second == "\n":
        return (acc, packet[2:])
    else:
        acc.append(first)
        return next_value_token(packet[1:],acc)

And by different, I mean easy to read. To be fair, I intentionally avoided declaring variables in my Erlang code (imposing constraints to find a clean solution, just hasn't quite worked out yet). Then again, the way I would actually implement this in Python would be simpler (among other things, the lack of tail call recursion optimization makes this solution more of a bauble than a tool).

def next_value_token(packet):
    index = packet.find("\n\n")
    if index == -1:
        raise MalformedPacketException(packet)
    return (packet[index:], packet[:index+2])

That is quite a bit simpler, and a solution using regular expressions would be perhaps a bit more verbose but similarly legible. I'll avoid driving this point home (by which I mean I won't post the Objective-C solution), because I am fairly certain that the ugliness of the Erlang sample here is my own fault, or at worst it is simply a mismatch with functional programming.

Question: How does Haskell deal with strings?

Minor Quibbles

There are two things I am continually annoyed at when I learn new languages, and Erlang is just as guilty of them as most every other language out there: I find it a bit frustrating that most languages create their own string formatting mini-language, and also that they create their own mini-language for list comprehensions. I realize there is a valid argument that we lose something when we stop trying to find better solutions, but I ask the gentle reader to consider the case of Perl's regular expressions. They're great, aren't they? Isn't it even better that you only have to figure them out once and then have this powerful tool available in most every language you deal with?

Wouldn't it be even nicer to have the list comprehension and string formatting mini-languages standardized across languages as well, such that they too could supplement our cross-language toolkit of powerful concepts?³

Moving Forward

Around a year ago I made a brief stumble into Erlang, but never got anywhere with it. The syntax was kind of abrasive, the documentation was bad, etc, etc. Nothing fundamental has changed with Erlang since then, but this second go around has been much more pleasant for me. The syntax just kind of sank in without much though, the functional programming isn't giving me pause, and the learning curve hasn't been particularly steep.

I suppose I might say that I was finally ready for Erlang, but that sounds a bit cheesy. What I will say, is that Erlang is an interesting language to get to know, and for certain problem sets it allows for exquisite solutions. It won't change your world any more than spending a few weeks in Scheme, but perhaps that is all the changing our worlds need sometimes.

Not having mutable shared data is a bit odd, but hasn't given me much trouble. I can't say that I have implemented any large scale systems in strictly functional programming style, but after getting used to the mindset I usually find that the same things can be accomplished, there is just a different manner for accomplishing them. Often the functional solution is more pleasant once discovered, but I imagine there are some situations where the limitations breed complexity rather than beauty.↩
They are substantially lighter than Java threads, or any programming language which implements threads via native threads (SBCL, etc). I am curious how they would compare against threads in Stackless Python or Ruby's green threads. As somewhat relevant trivia, Scala's message passing implementation--which mimics Erlang's in semantics--actually uses a fixed number of threads (I think four, but I may be off) to simulate being lightweight. Certainly something similar to this could be achieved in languages using native threading, as long as someone is willing to tackle writing the libraries. (My own foray into writing a Common Lisp actors library kind of well to the wayside, to my occasional dismay.)↩
I see two factors standing in the way of this. The first is that there hasn't yet been a string formatting or list comprehension implementation is clearly blows the socks off the competition. Personally, I think that Python list comprehensions are quite good (and that I would drown in my own tears if we adopted something like Common Lisp's Loop mini-language). As for string formatting, none of them really strike me as great, but simply standardizing on one of them would lessen my pain considerable.↩