You can't reason about big balls of mud.

May 23, 2018. Filed under infrastructure 34 architecture 30

One of the smartest engineers I worked with couldn't finish projects. Forward progress was always a struggle. They saw the problem. They saw a related problem. They uncovered surrounding problems. Eventually the task became insurmountable. I got frustrated. How could they be endlessly enveloped by small problems?

Lately, I am trying to extend large existing systems. And their experience is resonating.

Cathedral systems have clear properties you can reason about. These properties allow you to reason about them abstractly. Enable you to layer on new properties. Yet, there are few large cathedral systems in the wild. They're particularly rare in young companies experiencing rapid growth. That ecosystem's apex pattern is the big ball of mud.

Big balls of mud appear to have properties, but they don't. All mutation occurs through an immutable log, you say. Oh, but there are a few caveats. A couple of performance wins. Two adjacent products made a minor exception to hit launch timelines. Systems must hold properties constant for them to facilitate higher level reasoning. They may sustain a rare exception. Persistent violations void the abstraction.

The failure to hold the line strikes like a millennia of meteors. Craters of technical debt caveating abstraction after abstraction. No properties remain, and abstract models of system behavior skew from actual behavior.

Often the first phase of modifying a system is to establish a nuanced mental model of the system. This works well for most problems, but not for complex, property-less systems. There developing this nuanced model is a trap. You'll refine into rich sophistication, and still be wrong.

Throw it away and start over? It's not so dire. You can extend large property-less systems. It's a matter of switching from abstract to empirical reasoning. Replace assertions of properties with observed behavior. Document what it does, not what it's intended to do.

Use observability tools for initial observations: logs, metrics, traces. Use targeted pilot projects to work the edge cases. Learn how the system responds to change. Using these approaches, you map the system's behaviors. Those steer modification of the big ball of mud. You've taken a different route, but now you're in familiar migration territory.

This is also how you return properties to a property-less system. Identify a property to reassert. Observe the related behavior. Do pilot projects to learn. Migrate into the new property. Done iteratively, each step gets easier than the last. An organic cathedral.