Lately I’ve been thinking more about kappa architectures,
and particularly how you’d design a moderately complicated web application
to rely exclusively on an immutable log, with log processors that managed
Some of the interesting questions are:
is there a good way to do this?
can you do it without tight coupling of implementations between the processes that
publish the events and those that process the events?
is there a way for publishers to be unaware of write logic and to
provide synchronous behavior?
This works well! Logic is contained in a single place, but we’re mutating state directly
and don’t have a log of events. It’s easy to add an event log
publish(client, VIEW_TOPIC, path)
At which point our diagram get’s a little more interesting:
But still pretty simple, with synchronous operations.
Now we do have an event log, but state is not being generated from it. This means
we can’t recreate state from replaying the log, which defeats the entire
point. (Although at least it would get the data into your data ecosystem for
And the write logic is moved into a kafka consumer:
cli = get_client()
topic = cli.topics[VIEW_TOPIC]
consumer = topic.get_simple_consumer(reset_offset_on_start=True,
for message in consumer:
path = message.value.decode('utf-8')
r = redis.Redis(host='redis')
ts = int(time.time())
r.zadd(RECENT_KEY, path, ts)
r.zincrby(TOP_KEY, path, 1)
Not only have we lost control over timing, we also have a fairly awkward scenario
where read and write logic is split. We’ve picked up some exciting an exciting property
of an immutable event log that we can replay to generate state, but the application
no longer works properly and is more complicated to reason about.
Decoupled and synchronous
What we really want is the ability for the web process to behave
synchronously from the client’s perspective until downstream
processing has completed, without introducing any tight coupling.
Hmm, the rare case we want action at a distance?
You can think of implementation specific ways to do this, e.g. in this case you could
use Redis pubsub and subscribe to a channel that acks writes for a given
event. That’s an interesting step, but tightly couples implementation
between the web and consumer processes, discounting some of the interesting
advantages of the kappa architecture.
I’m honestly not aware of tooling that does what I want, but you could imagine
a somewhat handwavy implementation along these lines:
web sends message to a sync process.
sync pushes into kafka, injecting a unique identifier into each message,
and keeping the connection to web open.
Various consumer processes “ack” completion back to sync. sync would either
have a manifest of consumers that should acknowledge (hopefully built from static analysis,
rather than being manually maintained),
or would sample system behavior to learn the expected number of acks.
The idea of sampling feels a bit overdesigned, but in practice I think you’d want
p99 timings which you’d use to fail-fast and move forward with processing.
Actually, as an interesting hack, for a lightweight implementation, I imagine
you could probably only rely on the p95 timings
and not implement the “ack” at all.
The “ack” concept here can probably be viewed as a particular example of
the challenge of implementing back pressure in asynchronous processing.
For example, what you’re kind of hoping for is to synchronously block the
web process until all downstream consumers have processed all messages that
started due to messages at or before the time the message was published.
(This kind of reminds me of watermarks.)
Altogether, I think solving this problem–strongly decoupling implementations of publishers and consumers
while allowing timing coordination–isn’t solved very well by any existing tools, but there aren’t any
obvious reasons it couldn’t be, just a bit awkward today.
I imagine we’ll see some interesting experiments soon!