Introducing SREs, TPMs and other specialized roles.

August 13, 2018. Filed under management 127

Folks are sometimes surprised to learn that I started out working as a frontend engineer. I'd like to imagine it's because I'm so terribly knowledgeable about infrastructure, but I suspect it's mostly grounded in my unconscionably poor design aesthetic. Something that has stuck with me from that experience was feeling treated as a second-tier engineer: folks were unwilling to do any frontend work, but were careful to categorize it as trivial. The following decade has seen radical improvements in browser compatibility and JavaScript tooling, and today's frontend engineers occupy an esteemed position in the hive mind's subtle hierarchy of roles.

While nodes have swapped positions, the hierarchy of roles remains alive and well, which at its clearest when folks propose creating a job description or career ladder for a new role. Most recently the question of whether to create a dedicated career ladder for site reliability engineers has been on my mind.

This particular question is dear to me, as I had the chance to design the initial iteration of Uber's SRE role, and while I think the design was reasonably good, there are also so many ways it could have gone more smoothly. Faced with the decision of whether to do it a second time, my first instinct was to freeze and think of the ways it didn't work.

Grappling with the problem for some time, I remained conflicted, and decided to get more systematic around making this decision, the results of which I've written up here. Altogether, there are four interesting questions to dig into:

  1. What are the pitfalls that these roles fall into?
  2. If we do decide to create one, how do we set them up for success?
  3. What are the benefits of specialized roles?
  4. Putting it all together, when should you make a new role?

At the end, creating a new role will still be a difficult decision, but we'll be armed with a framework to help make it.


The major challenges I've encountered rolling out new roles are:

  • Class systems. Many times folks look at new roles as less important, often framing them as service roles to absorb work they're disinterested in. Sometimes roles are even explicitly designed this way, intended to reduce work for another role as opposed to having an empowering mission of their own.
  • Brittle organization. As you move away from generalized roles and towards specialists, an unexpected consequence is that your organization has far more single points of failure. Where everyone on a team was once able to perform all tasks fairly effectively, now if your project manager leaves, then you'll find that no one is able to fill the role very effectively. This brittleness is particularly acute in organizations with frequent structural changes.
  • Pattern matching. Designing a new role for your organization tends to involve dozens of important decisions to align it with your needs. Unfortunately, generally folks don't take much time to appreciate these distinctions, and instead pattern match on how they've seen the role done elsewhere. This is a powerful force. Some meaningfully large percentage of folks will both avoid taking any steps to learn how the role is intended to function–reading documentation, asking about the approach–and continue to express surprise that it doesn't work exactly the way they saw at a previous company.
  • Task offloading. When a new role is created, the role's designers have a very clear vision of how they want the new function to work. Many other folks are not particularly concerned with how you want the function to work, and will view it as an opportunity to offload tasks that they find challenging, difficult or disinteresting. This can lead to new roles being immediately underwater, which often feels like success to leaders attempting to grow the size of their organization, but can easily translate into an unlovable work experience for those performing the role.
  • Too "trivial" to value. Many roles start by taking on work that is viewed as disinteresting by the role shedding that responsibility, and consequently folks in that role tend to view that work as trivial and unimportant. This often translates into the new roles struggling to have their impact be recognized.
  • Too "trivial" to promote. Similar to the above, often you'll find the work done by new roles to be valued very highly in terms of impact, but not viewed as sufficiently "strategic" to merit promotion, particularly beyond career level. This can lead to folks being obligated to change roles if they want to attain higher tiers of achievement.
  • Headcount obstacles. Companies eventually develop a series of arcane rituals by which a series of emails, meeting and incantations is translated into an annual headcount plan. These systems are, quite reasonably, designed around supporting the needs of large, existing roles. Consequently, they tend to make it pretty challenging to get headcount for new roles, particularly if it's in tension with existing functions that need to expand.
  • Recruiting rare humans. For entirely great reasons, folks want the first hires they make into a new role to be strong role models for the entire function. This often leads to a piling on of requirements until it's impossible for any candidate to pass the bar.
  • Inability to evaluate. Almost the opposite of the above challenge, sometimes the existing organization has so little experience with the new function they wish to create that they simply don't have a usable means by which to evaluate candidates. This can lead to evaluations focused on qualities which are largely independent of what the candidates would be doing if they accept the job.

Facilitating success

If we do want to create a new role, it's important to take stock of what we'll need to do to make folks in the role successful. The ingredients that I've found necessary for a new role to bootstrap successfully are:

  • Executive sponsor. Not necessarily an executive, but you'll need to find an authorized, senior leader who is committed to the success of the new function. Especially for the first performance and headcount cycles, there will be a number of rough edges that will require significant organizational support to navigate. Finding a sponsor who'll be able to provide the necessary support is the most obvious constraint on creating new roles. If you can't find a sponsor, it's usually important feedback that leadership doesn't believe the new role will have a good return on invested energy.
  • Recruiting partner. A new role will require significant support from your recruiting organization in order for it to succeed. Every role being recruited against has a high fixed cost, and adding new roles can make it difficult for recruiters to hit the targets their performance is measured against. Make sure that your recruiting team is able to support a new role! If they aren't, the first step may be working with your executive sponsor to direct more headcount towards recruiting.
  • Self-sustaining mission. New roles are frequently described in terms of how they'll impact other functions, rather than in terms of what they'll accomplish. For example, you might describe Technical Program Managers (TPMs) as offloading project management responsibilities from engineering managers. This approach frames the role as an auxiliary, support function, which makes it difficult to recognize the work's impact. You must be able to frame the role's work without referencing other existing roles in order for it to succeed long-term.
  • Career ladder. In pretty much all cases, new roles should have a career ladder from the beginning. The career ladder is the foundation of a successful performance management system, and it's not possible for a role to be valued or evaluated coherently without a thoughtful career ladder. Sometimes folks rush ahead to hire before writing the ladder, but the work required to design an effective interview loop is roughly equivalent to writing a career ladder, so I've found that skipping this step is an act of false economy.
  • Role model. Who will be the external and internal role models for this role? A good role model is a human embodiment of the intent codified in your career ladder. You want to have a person you can point towards.
  • Dedicated calibrations. Most performance management systems rely on a calibration system to ensure that performance designations are being assigned in a consistent fashion across teams and roles. Sometimes folks try to perform calibration with mixed roles in a single session, which leads to smaller roles being treated as an afterthought. Often the designations will be approved without much thought, or they'll be pushed towards the center, and neither of those scenarios creates a useful feedback loop for folks in the new role. It's best to consider them separately in a dedicated calibration session for the one role. If that's not possible, second best is to consider all smaller roles together, where various forms of specialized contributions can be considered thoughtfully.


If creating a new role was all costs and challenges, it would be easy to decide not to move forward, but there are pretty significant advantages to be had:

  • Efficiency. This is the flipside of brittle organization: folks in specialized roles are able to spend more time doing a smaller set of tasks, which leads into great expertise in that area. For areas where existing folks are spending significant amounts of time, this can specialized efficiency can translate into a significantly more overall throughput without increasing headcount. I think this is the most important advantage, and is especially valuable for teams or companies where financial resources are the limit on growth (which is most teams at most companies).
  • Efficiently solve constraints. This is an extension of the efficiency point, but subtly different: with specialists you can add exactly the kind of capacity you are missing, which is very powerful for efficiently solving constraints. If your organization is low on project management bandwidth, you could add five new managers who can each take on a bit of it, or you might be able to add a single project manager who individually adds as much relevant bandwidth as the five managers combined.
  • Recognition. Simply creating a new role will a absolutely not cause folks to suddenly start to value work that they previously dismissed, but it can be a useful component in that shift. In particular, it will provide additional structural mechanisms to support recognition, such as distinct career ladders, calibration sessions and even compensation structures.
  • Evaluating for strengths. It's often challenging to interview specialists effectively, typically evaluating them based on how they'd perform for your generalist position, while missing out on their peculiar areas of excellence. Splitting into a new role makes it possible to target the interview process on the areas that are most important.
  • Increased hiring pool. You're now able to consider a new pool of candidates in your hiring funnel, which potentially expands the total number of candidates you can consider.
  • Specialized compensation. In some cases, the market compensation for specialists is significantly higher than that for generalists, and in that case it's often quite a challenge to hire specialists without specialized compensation bands.

What to do?

Once you're familiar with the challenges and the costs of provisioning a new role, all that is left is to consider the advantages and make a quick judgement call. Ah well, actually it's still a pretty hard decision to make!

Some questions to ask yourself to guide this decision are:

  • Is there a less extreme way to address the recognition gap? Maybe you could adjust the existing career ladder instead.
  • Do you have a plan for changing how the company values the work? Creating a new role won't inherently change how the company values this work, and you'll still need to do the hard work of expanding your company's values.
  • If you have a plan to change company values, could you trial run that plan before introducing the new role? This helps derisk the experiment for the folks you are trying to help, and is much easier to boot.
  • Does the function already exist in secret? Sometimes you'll find that roles have already split, and you're less making a new function than recognizing an existing one.
  • Will this increase short term recognition of performance but ultimately hurt the career growth of folks who change roles? Creating a new role is absolutely the kind of thing that can initially feel like progress but ultimately set folks back significantly, requiring them to transition roles later.
  • Are the number of folks impacted sufficiently numerous, and the recognition gap of value significantly large to cover the sizable costs of creating and nurturing a new role?
  • Who will pay the maintenance costs for the new role? If the answer is that you'll personally pay them, who will take up the torch if you leave?

As you think through those questions, hopefully the right approach for your situation gets a bit clearer. As a rule of thumb, I would always create a new role if it immediately covered twenty folks, would reluctantly create a new role if it would cover twenty folks within two years, and would be skeptical of creating a new role that couldn't meet one of those two conditions!