A case against top-down global optimization.

Published on June 23, 2018. management (209)

After writing Staying on the path to high-performance teams, quite a few folks asked the same follow-up question, “Once a team has repaid its technical debt, shouldn’t the now surplus team members move to other teams?”

This makes a lot of sense, because the team, with so little technical debt left, is now overstaffed relative to its global priority. Repeated across many teams, this could lead to an organization having have far too many engineers allocated against last year’s problems, and too few against today’s.

This is an important problem to address!

First, let me explain why I’m skeptical of reallocating individuals to address global priority shifts, and then I’ll suggest a couple alternative approaches to this conundrum.

Team first

Fundamentally, I believe that sustained productivity comes from high performing teams, and that disassembling a high-performing team leads to a significant loss of productivity, even if the members are fully retained. In this world view, high performing teams are sacred, and I’m quite hesitant to dissemble them.

Teams take a long time to gel. When a group has been working together for a few years, they understand each other, and know how to set each other up for success in a truly remarkable way. Shifting folks across teams can reset the clock on gelling, especially for teams in the early stages of gelling, or if there are significant differences in team culture. That’s not to say that you want teams to never change, that leads to stagnation, but perhaps that preserving a team’s gelled state requires restrained growth.

Sometimes you will want to grow faster than a gelled team allows, and that’s ok! The lesson is that you have to account for re-gelling costs after periods of change, not that you should never change them. This is part of why my proposed model recommends rapidly hiring into teams loaded down by technical debt, not on innovating teams, which avoids incurring re-gelling costs on high performing teams.

Fixed costs

Fixed cost of running smaller teams versus larger teams.

Another reason I lean away from moving folks off high-performing teams is that most teams have high fixed costs and relatively small variable costs: moving one person can shift an innovating team back into falling behind, and now neither team is doing particularly well. This is especially true on teams responsible for production products and services.

My rule of thumb is that it takes eight engineers on a team to support a two-tier on-call rotation, so I’m generally reluctant to move any team below that line, but fixed costs come in many other varieties: “keeping the lights on” work, precommitted contracts, support questions from other teams, etc.

There are some teams with very low fixed costs–a startup without any users, a team supporting a product that you’ve turned off entirely–and I suspect the rules for those teams are different. I also suspect such teams are quite uncommon in successful companies.

Slack

The premise of moving folks to optimize global efficiency also implies a deeper understanding of how productivity is generated than I’ve ever personally achieved. I’m a strong believer in not adding more resources to a team with visible slack, but am less convinced that the inverse applies.

The expected time to complete a new task approaches infinity as a team’s utilization approaches 100%. and most teams have many dependencies on other teams. Together, these mean you can often slow a team down by shifting resources to it, because doing so creates new upstream constraints.

In further defense of slack, I find that teams put spare capacity to great use by improving areas within their aegis, in both incremental and novel ways. As a bonus, they tend to do these improvements with minimal coordination costs, such that the local productivity doesn’t introduce drag on the surrounding system.

Most importantly as an organizational debugger, keeping slack-ful teams slack-ful means I don’t have to consider them when debugging the overall organizational throughput. I’ve found it much easier to work a couple constraints at a time, solving forward without needing to revisit previous constraints.

The Goal and Thinking in Systems: A Primer are both phenomenal books on this topic.

Shift scope; rotate

Ok, so what does work? I’ve found it most fruitful to move scope between teams, preserving the teams themselves. If a team has significant slack, then incrementally move responsibility to it, at which point they’ll start locally optimizing over their expanded workload. It’s best to do this slowly to maintain slack in the team, but if it’s a choice of moving folks rapidly or scope rapidly, I’ve found the later is more effective and less disruptive.

Shifting scope works better than moving people because it avoids re-gelling costs, and it preserves system behavior. Preserving behavior keeps your existing mental model intact, and if it doesn’t work out, you can always revert a workload change with less disruption than a staffing change.

The other approach that I’ve seen work well is to rotate folks for a fixed period into an area that needs help. The fixed nature allows them to retain their identity and membership in their current team, giving their full focus on helping out, rather than splitting their focus between the work and finding membership in the new team. This is also a safe way to measure how much slack the team really has!

A coworker of mine suggested that some companies have very successfully moved towards the swarming model (at the organization level, not just at the team level), and I’d be fascinated to hear from folks who’ve successfully gone the other direction! One of the most exciting aspects of organizational design is that there are so many different approaches that work well.