Irrational Exuberance!

Accelerate's definition of developer productivity.

June 27, 2018. Filed under infrastructuredevtools

Last week I read Accelerate by Nicole Forsgren, Jez Humble and Gene Kim. I was particularly struck by the discussion on measuring developer productivity. At pretty much every company I know, the question of how to measure developer productivity comes up, becomes a task force, and produces something unsatisfying.

Accelerate's definition is quite good! It boils down to four measures:

  1. Delivery lead time. How long does it take to translate a customer request into a complete, delivered thing? This is the hardest one to translate, as it has to be adapted for different businesses, and is one place where I find the assembly-line thinking of DevOps overly constrained. One approach is measuring how long it takes to ship a customer feature ask once you've decided to staff it (e.g. filtering out requests you don't plan to do). Tracking this requires a well maintained issue tracker, with appropriate tags well maintained.
  2. Deployment frequency. How frequently are folks deploying code, and how is this number moving as the number of engineers increases? This is an easy to measure proxy for "batch size", which is useful for reducing rework (doing faulty work such that you have to redo it) and miswork (doing work that isn't needed and isn't used).
  3. Time to restore service. When you have an incident, how long does it take to recover? For very small companies, you could rely on availability metrics to track this, but in the long run I suspect this really requires a meta-data rich incident management program to generate the required data around the various kinds of incidents, events and degradations you encounter. Such a program also greatly aids tracking the silent costs of incidents, particularly the cleanup operations that linger long after the initial problem is triaged.
  4. Change rate fail. How often do changes fail? With fully automated deploys, this would be the rollback rate. For less automated systems, you can build a proxy from redeploys of old versions as well as incidents. This one is particularly interesting because the quest for good measurement steers you towards full automation, and a meta-data rich incident management program. How good measurement drags you towards better things is part of its magic.

If these ideas are new for you, I'd recommend reading The Phoenix Project or The Goal, which cover the ideas behind the constraint-oriented system optimization approach entailed in these metrics.

Overall, this collection of four metrics was well worth the price of admission for me, and is one of the best definitions of developer productivity I've encountered so far! (The rest of the book is quite solid, although the approach of focusing on what measurably works leads to fairly predictable content that won't be new to many readers.)