Notes on A Philosophy of Software Design.

September 3, 2018. Filed under architecture 30 book 14 review 13

Jumping on the recent trend, I picked up a copy of A Philosophy of Software Design by John Outerhout based on Cindy's recommendation. It's fairly concise at 160 pages, and I skimmed through it over the last few days, writing up some notes along the way.

Michael Krause was also kind enough to point out a great talk from John Outerhout which covers the same content.

A Philosophy takes a look at complexity in software, and wants you "to use complexity to guide the design of software through its lifetime." The author ran an undergraduate course on software design, modeled after the approach to teaching writing essays (draft, write, critique, rewrite, critique, rewrite again), and used that experience, combined with a long career of developing many lage systems, to develop categories of complexity and mitigations.

They particularly recommend the book as a useful tool to use during code reviews, providing a list of red flags along the lines of information leakage, shallow module, vague names, implementation documentation contaminates interfaces, conjoined methods, and general-specific mixture. (A full list of red-flags at bottom.)

Now, a list of snippets I found particularly interesting.

Complexity is anything that makes software hard to understand or to modify.

Starting with a broad definition of complexity, although it gets more focused as the book progresses.

Isolating complexity in places that are rarely interacted with is
roughly equivalent to eliminating complexity.

This is, I suppose, fairly obvious, but struck me as insightful. We don't think enough about where we incur complexity, and if we do a better job we can quickly make our systems simpler.

Complexity is more apparent to readers than to writers.
If other people think a piece of code is complex, it is.

An old refrain, but a good one. It's surprising how resistant folks can be to this feedback, including myself.

The book picks three symptoms of complexity: change amplification, cognitive load, unknown unknowns. Change amplification is when making a local change requires many changes elsewhere, and is best prevented when you

reduce the amount of code that is affected by each design decision,
so design changes don't require very many code modifications.

Cognititive load asks us to shift our mindset away from counting lines of code, but instead accepting that more, simpler, lines of code are still simpler than fewer more complex lines. (This is something I struggled with when I began writing more Go. Everything was simple but it was much longer than I was used to writing in Python.)

Finally, unknown unknowns are things that you want to know but there is no reasonable way for you to learn from the code itself.

It then moves on to a definition of complexity:

Complexity is caused by obscurity and dependencies.

And definitions of complexity's subcomponents:

Dependency is when code can't be understood in isolation.
Obscurity is when important information is not obvious.
This can often be due to lack of documentation.

Why is complexity so challenging to manage? It's because

Complexity is incremental, the result of thousands of choices.
Which makes it hard to prevent and even harder to fix.

To fight against complexity sprawl, he recommends distinguishing between strategic programming and tactical programming.

Tactical mindset is focused on getting something working,
but makes it nearly impossible to produce good system design.

Conversely, strategic programming shifts the goal post.

Strategic programming is realizing that working code isn't enough.
The primary goal is a good design that also solves your problem,
not working code.

Interestingly, the proposal is not that you should do major upfront design phases, but instead that you should be doing lots of incremental design improvement work over time. This is slightly different than just "doing Agile", because Agile is too focused with features, whereas

The increments of development should be abstractions, not features.
Ideally, when you have finished with each change, the system will
have the structure it would have had if you had designed it from
the start with that change in mind.

Many folks would argue against this focus on abstractions, arguing that it's not obviously useful, in terms of the You Aren't Gonna Need it, but he'd argue that the

payoff for good design comes quickly. It's unlikely that tactical approach
is faster even for the first version, let alone the second.

That section is specifically a refutation of the startup mentality of launching quickly and fixing things later as a false dichotomy.

The most important way to manage complexity is by shifting it from interfaces and into implementation:

Modules are interface and implementation.
The best modules are where interface is much simpler than implementation.
It's more important for a module to have a simple interface
than a simple implementation.

You have to be careful when designing modules thought, because

An abstraction is a simplified view of an entity that omits
unimportant details. Omitting details that are important leads
to obscurity, creating a false abstraction.

Done well this technique is known as information hding:

Each module should encapsulate a few pieces of knowledge,
which represent design decisions.
This knowledge should not appear in its interfaces,
and hence are restricted to its implementation.
Simpler interfaces correlate with better information hiding.

The opposite of information hiding is information leakage:

When a design decision is used across multiple modules,
coupling them together.

The book spends a while discussing exceptions and how their deviation from the normal flow of code leads them to cause more problems than their lines of code might suggestion. This is because you

Have to recover by either trying to revert (hard) or trying to repair
and move forward (also hard). This leads to inconsistency in many cases.

The solution is to "define errors out of existance," which is designing interfaces such that errors are not possible. The example of unset versus delete is given, where the former would ensure that something doesn't exist as opposed to ensuring that something that previously existed is now gone. A second example of this technique is around taking slices from lists, it's easier to just return nothing for non-existant ranges than to throw errors about being out of bounds.

The best designs are not your first design, but instead you should

Design it twice, taking radically different approaches.

There is an interesting aside on this topic, mentioning how very smart people often have been drilled by their early experiences that their first inclination is the right one because it's good enough to get a good grade, and consequently they struggle to take advantage of this technique.

Most large software design problems are fundamentally different than school work in that they are not inherently designed to be solvable, and consequently they benefit from multiple different approaches.

Finally, a closing benediction to the strategic mindset:

If you're not making the design better, you are probably making it worse.

Altogether, this was a really good read, and I highly recommend it!

Red flags

The full list of red flags from the book are presented here, although you'll have to purchase the book to get definitions!

  • Shallow module
  • Information leakage
  • Temporal decomposition
  • Overexposure
  • Pass-through methods
  • Repetition of the same fragments of code
  • General-specific mixture
  • Conjoined methods
  • Non obvious code
  • Hard to describe
  • Comment repeats code
  • Vague names
  • Hard to pick name
  • Implementation documentation contaminates interfaces