Late last year I had coffee with Keith Adams, and we ended up chatting a bunch about migrations in the context of making it easier to extend an unruly codebase. The discussion went in a bunch of directions, including chatting a bit about Building Evolutionary Architecture. One idea that Keith mentioned in that discussion has particularly stuck with me: most changes happen in the same handful of files, and those files are the most effective place to invest into quality improvement.
I believe Keith attributed this idea to Adam Tornhill’s Software Design X-Rays, where Adam refers to it as hotspotting. The suggestion is that code quality should be approached the same way you’d approach a performance optimization problem: measure what matters and prioritize changes in that vicinity. This focus on files is particularly fantastic because it captures several large categories of changes that greatly impact development productivity but are easy to miss when you rely exclusively on tools that require a deeper understanding of the software: configuration changes and testing.
This seemingly obvious idea actually conflicts a bit with what I’ve learned about making large software changes, which is that efforts meant to reduce complexity often increase complexity while they’re in-progress. Worse, they permanently increase complete if they are halted before completion without being fully reverted.
This raises the central question: when should we work the hotspots and when should we work towards completion?
You should prioritize hotspots when:
You can measure impact, as opposed to progress. Example: time to last byte
There is a direct, measurable impact to your users. Example: error rates, conversion rate
There’s no finish line or finish line is 2+ years out. Example: “no bugs in codebase”
You should focus on completion when:
There is a step-function reduction in overhead or cost when the work completes, often from fully deprecating the existing solution. Example: deprecating an expensive or slow data pipeline
Split approaches greatly increase complexity in team’s mental model. Example: having three HTTP client libraries across the codebase with different failure semantics
Staffed by a transient project team rather than a long-term owner. Example: Forming a “tiger team” to migrate off a failing approach to data storage
The big takeaway though for me is that
rather than one strategy being dominant,
it’s most helpful to approach each project with both options in mind and make a deliberate choice.