Performance & Compensation (for Eng Execs).

Published on September 3, 2023. executive (42), management (213)

Uber’s original performance process was called “T3B3” and was remarkably simple: write the individuals top 3 strengths, and top 3 weaknesses, and share the feedback with them directly in person. There was a prolonged fight against even documenting the feedback, which was viewed as discouraging honesty. On the other side of things, there are numerous stories of spending months crafting Google promotion packets that still don’t get their authors promoted. Among those who’ve worked within both Uber and Google’s promotion processes, there are advocates and detractors, and absolutely no consensus on what an ideal performance process looks like.

Compensation is a subtly different set of problems, but similarly there are no universally appreciated compensation processes out there. Highly structured, centrally orchestrated compensation systems often converge on most folks at a given level receiving similar compensation, even if their impact is quite different. More dynamic compensation systems disproportionately reward top performers, which introduces room for bias.

Because there’s no agreement on what performance or compensation process you should use, you’ll likely end up working within a variety of systems. This post digs into:

The conflicting goals between those designing, operating, and participating in performance and compensation processes
How to run performance processes, including calibrations, and their challenges
How to participate in a compensation process effectively
How often you should run performance and compensation cycles
Why your goal should be an effective process rather than a perfect one

Every one of these systems is loaded with tradeoffs and traps that you’ll need to be wary of, and after finishing this post, you should be prepared to plot the right course for your organization through them.

This is an unedited chapter from O’Reilly’s The Engineering Executive’s Primer.

Conflicting goals

Going back to Uber’s T3B3 performance process–where you told someone their top and bottom three areas for a given half–what’s most remarkable was its radical simplicity. It was focused exclusively on providing useful feedback to the recipient. To this day, I find that clarity of purpose very remarkable, and genuinely rare.

Most performance and compensation systems have far less clarity of purpose, because they try to balance many priorities from many stakeholders. Your typical process at a given company is trying to balance all of these goals:

Individuals: want to get useful feedback to grow, want to get promoted as soon as possible, want to maximize their compensation, and they want to do it quickly.
Managers: want to provide fair and useful feedback to their team, want to promote their team as appropriate (and in alignment with the various commitments they’ve made to their team), want to provide appropriate compensation (and, once again, do so in alignment with the various commitments they’ve made to their team). Oh, and they also want to do it quickly as well!
People (or Human Resources) Team: want to ensure individuals receive valuable feedback, create a “floor” for quality of feedback that individuals receive, document feedback that can be used in performance management and legal scenarios later to support the company’s perspective, need to demonstrate performance process for compliance purposes (e.g. SOC2 requires annual performance reviews), and create structured input to include in calculating compensation.
Executives: decide who to promote based on inconsistent evaluations across managers, optimize allocation of a fixed compensation budget to meet organizational objectives, and minimize the impact of inexperienced and misaligned managers on promotions and compensation.

I’ve never encountered, or heard of, a process that solves all these problems elegantly. My informed guess is that there simply isn’t any process that works with hundreds of people that isn’t a bit laborious to operate within. There’s also no way to flawlessly balance the goals of objective, consistent outcomes and recognizing exceptional individuals.

There’s a lot of room for improvement in these processes, and they can absolutely always be improved, but the tension in these process is inherent to the participants’ conflicting goals. These conflicting goals are real, fundamental, unavoidable, and must be kept in mind as you make decisions about how your process works.

Performance & Promotions

I’ll start out talking about performance processes, including promotions. Your baseline performance process is each manager providing written feedback for each of their direct reports, including a decision on whether to promote them, but there are quite a few details and variations to consider.

The first variations to consider are whether to include peer and upward feedback. Upward feedback is a constrained problem, as each person should only have one manager. In the worst case, asking for upward feedback generates low-value feedback, often because the individual doesn’t want to criticize their manager, but it doesn’t take up too much time.

Peer feedback can take up a significant amount of time, particularly for highly connected individuals who may be asked to provide peer feedback on ten or more individuals. This is usually accompanied with the advice that you can decline peer feedback requests if you get too many, but many individuals find it difficult to decline peer feedback requests, even if they know they should.

More importantly, my experience is that peer feedback is very inconsistent, and I’ve come to believe that each’s team’s beliefs are the value of peer feedback determine whether the feedback is actually useful. I’ve managed teams who feel peer feedback is too uncomfortable to give honestly, and those teams have provided useless peer feedback: in those cases, it’s not worth collecting peer feedback. I’ve also managed teams who believed feverishly in the value of peer feedback, and those teams generated insightful, valuable feedback. As such, I prefer to lean towards empowering managers to make the decision on collecting peer feedback for their team. Often this is a policy decision enacted for the overall company, and in that case it’s not a battle I’d pick.

Levels and leveling rubrics

Agreeing on performance ratings and who should be promoted is nearly impossible without written criteria that describe the dimensions of expected performance for each level. However, before we can talk about leveling rubrics, first we have to talk about levels.

Most companies have paired titles and levels, such as:

Entry Level Engineer (Level 3)
Software Engineer (Level 4)
Senior Software Engineer (Level 5)
Staff Software Engineer (Level 6)

The specific levels vary widely across companies (there many sites that show how levels differ across companies), and what is a “Level 3” at some companies might be a “60” at another, and a “601” at a third. There is no consistent leveling standard across companies. It’s fairly common for Software Engineering levels to start at “Level 3”, as companies use levels across many functions, and often reserve “Level 1” for entry-level roles in roles with fewer entry requirements.

Titles vary even more widely across the industry, and there certainly isn’t a universal standard to adopt. If you are in the position of setting titles for your company, I recommend using the fairly typical progression of Entry-Level Software Engineer, Software Engineer, Senior Software Engineer, Staff Software Engineering, and Sr Staff Software Engineer. If you’re tempted to experiment with new titles, note that the downside is that it makes your hiring process more complex since you have to explain what the titles mean, and you will lose some candidates who are worried the non-standard titles will harm their career trajectory.

Once you establish Engineering’s titles and levels, the next step is documenting the leveling rubrics that describe expectations for operating within each level (again, there are a variety of sites that collect publicly available leveling rubrics from many companies). This can be a very sizable endeavor, and I’d recommend skipping the hardest part by picking a reasonably good one that’s available online, creating a working group to tweak the details, and then refining it after every performance cycle to address issues that come up.

Additionally, I’d emphasize a few things that I’ve learned the hard way over time:

Prefer concise leveling rubrics over comprehensive ones: there’s a strong desire for leveling rubrics to represent the complete, clear criteria for being promoted. The challenge, of course, is that many folks are exceptionally good at gaming specific criteria. For example, Stripe’s promotion criteria included mentorship, and I encountered folks who claimed to mentor others because they scheduled a meeting with that person, unrequested, and said that constituted mentorship.
Concise rubrics require more nuanced interpretation, but attempts to game rubrics mean that all options in practice require significant interpretation. You can respond to each attempt at gaming with even more comprehensive documentation, but your rubrics will quickly become confusing to use, more focused on preventing bad behavior than providing clear guidance for the well-intentioned.
Prefer broad job families over narrow job families: a classic executive decision is whether Site Reliability Engineers and Software Engineers should have different leveling criteria. Let’s say you decide that yes, separate criteria would be more fair. Great! Shouldn’t you also have separate rubrics for Data Engineers, Data Scientists, Frontend Engineers, and Quality Assurance Engineers?
Yes, each of those functions would be better served by having its own rubric, but maintaining rubrics is expensive, and tuning rubrics requires using them frequently to evaluate many people. Having more rubrics generally means making more poorly tuned promotion decisions, and creating the perception that certain functions have an easier path to promotion. I strongly recommend reusing and consolidating as much as possible, especially when it comes to maintaining custom rubrics for teams with fewer than ten people: you’ll end up exercising bespoke judgment when evaluating performance on narrow specializations whether or not you introduce a custom rubric, and it’s less expensive to use a shared process.
Capture the how (behavior) in addition to the what (outcomes): some rubrics are extremely focused on demonstrating certain capabilities, but don’t have a clear point of view about being culturally aligned on accomplishing those goals. I think that’s a miss, because it means you’ll promote folks who are capable but accomplish goals in ways that your company doesn’t want. Rubrics–and promotions–should provide a clear signal that someone is on the path to success at the company they work in, and that’s only possible if you maintain behavioral expectations.

My final topic around with levels and leveling rubrics is that you should strive for them to be an honest representation of how things work. Many companies have a stated leveling and promotion criteria–often designed around fairness, transparency and so on–which is supplemented by a significant invisible process underneath that governs how things actually work. Whenever possible, say the awkward part out loud, and let your organization engage with what’s real. If promotions are constrained by available budget and business need, it’s better to acknowledge that than to let the team spend their time inventing an imaginary sea of rules to explain unexpected outcomes.

Promotions and calibration

With leveling criteria, you can now have grounded discussions around which individuals have moved from one level to another. Most companies rely on managers to make a tentative promotion nomination, then rely on a calibration process to ratify that nomination. Calibration is generally a meeting of managers who talk through each person’s tentative rating and promotion decision, with the aim of making consistent decisions across the organization.

In an organization with several hundred engineers, a common calibration process looks like:

Managers submit their tentative ratings and promotion decisions.
Managers in a sub-organization (e.g. “Infrastructure Engineering”) meet together in a group of 5-8 managers, including the manager responsible for all of the sub-organization (e.g. “Director of Infrastructure Engineering” who the other managers report into), to discuss each of the tentative decisions within their sub-organization.
Managers reporting to the Engineering executive meet together with the Engineering executive, and re-review tentative decisions for the entire organization. In practice, this is too many folks to review in detail, so this round typically focuses on promotions, top performers, and bottom performers.
The Engineering executive will review the final decisions with the People team, and then align with other executives to maintain some degree of consistency across organizations. They’ll also review how the proposed ratings and promotion decisions will impact the current company budget.

The above example has three rounds of calibration (sub-organization, organization, executives), and each round will generally take three to five hours from the involved managers. The decisions significantly impact your team’s career, and the process is a major time investment.

The more calibrations that I’ve done, the more I’ve come to believe that outcomes depend significantly on each manager’s comfort level with the process. One way to reduce the impact of managers on their team’s ratings is to run calibration practice sessions for new managers and newly joined managers, to give them a trial run at the process before their performance dictates their team’s performance outcomes.

Another way is for you, as the functional executive, to have a strong point of view on good calibration hygiene. You will encounter managers who filibuster disagreement about their team, and you must push through that filibuster to get to the correct decisions despite their resistance. You will also find managers who are simply terrible at presenting their team’s work in calibration meetings, and you should try to limit the impact on their team’s ratings. In either case, your biggest contribution in any given calibration cycle is giving feedback to your managers to prepare them to do a better job in the subsequent cycle.

While most companies rely on the same group to calibrate performance ratings and decide on promotions, some companies rely on a separate promotion committee for the later decision, particularly for senior roles. The advantage of this practice is that you can bring folks with the most context into the decision, such that Staff-plus engineers can arbitrate promotions to Staff-plus levels, rather than relying exclusively on managers to do so. The downside is that it is a heavier process, and often generates a gap between feedback delivered by the individual’s manager and the decision rendered by the promotion committee, which can make the process feel arbitrary.

Demotions

The flipside of promotions are demotions, often referred to via the somewhat opaque euphemism, “down leveling.” Companies generally avoid talking about this concept, and will rarely acknowledge its existence in any formal documentation, but it is a real thing that does indeed happen.

There are three variants to consider:

Demotion with compensation adjustment: for example, your level goes from Senior Engineering Manager (L6) to Engineering Manager (L5), and your compensation is adjusted to be appropriate for an Engineering Manager (L5). Equity grants are, of course, particularly messy to adjust in this way.
Demotions without compensation adjustment: as above, your level goes from Senior Engineering Manager (L6) to Engineering Manager (L5), but your compensation is not adjusted down to match the new level. This is good for you, but in most compensation systems you will exceed (or be close to exceeding) the pay band for the previous level, which means you will see very limited adjustments going forward.
Title demotion without level adjustment: your title goes from a Senior Engineering Manager to Engineering Manager (L6), while maintaining the same level (L6). This means compensation will keep treating you in the same way, but organizationally you’ll be treated as a member of the lower level, e.g. not publicly considered a Senior Engineering Manager, not included in forums for Senior Engineering Managers, and so on.

All of these approaches are a mix of fair or unfair, and come with heavy or light bureaucratic aftereffects to deal with going forward. These bureaucratic challenges are why most companies try to avoid demotions entirely. Further, the concept of “constructive dismissal” means that demotions need the same degree of documentation as dismissals. It’s certainly not a time saving approach.

I avoided demotions entirely for a long time, but I have found demotions to be effective in some cases. First, there are scenarios where you mis-level a new hire. They might come in as a Staff Engineer (L6), but operate as a Senior Engineer (L5). In that scenario, your options are either to undermine your leveling for everyone by retaining an underperforming Staff Engineer–which will make every promotion discussion more challenging going forward–or to adjust their level down. I’ve done relatively few demotions, but few is not zero. I have demoted folks in my organizations, as well as those I directly managed, and the outcomes were better than I expected in every case where outright dismissal felt like the wrong solution.

Floor for Feedback

When you’re designing processes, I think it’s helpful to think about whether you’re trying to raise the floor of expected outcomes (“worst case, you get decent feedback once a year”) or trying to raise the ceiling (“best case, you get life changing feedback”). Very few processes successfully do both, and both performance processes focus on raising the floor of delivered feedback. This is highlighted by the awkward, but frequent, advice that feedback in a performance process should never be a surprise.

Because performance processes usually optimize for everyone receiving some feedback, it’s unwise to rely on them as the mechanism to give feedback to your team. Instead, you should give feedback in real time, on an ongoing basis, without relying much on the performance process to help. If you’re giving good feedback, it simply won’t help much.

This is particularly true as your team gets more senior. If senior folks are getting performance feedback during the performance process, then something is going very wrong. They should be getting it much more frequently.

Managing other functions

One of the trickiest aspects of performance management is when you end up managing a function that you’ve never personally worked in. You may be well calibrated on managing software engineer’s performance, but feel entirely unprepared to grade Data Scientists or Quality Assurance Engineers. That’s tricky when you end up managing all three.

What I’ve found effective:

Leave behind your functional biases (e.g. “QA is easy”) that you may have developed earlier in your career.
Don’t be afraid to lead, even if you don’t know the area well. You are the functional area’s executive, and if you don’t push on performance within the function, no one else will.
Learn the area’s fundamentals: watch them in their workflows, read the foundational texts, attend the tech talks, speak to domain experts in and outside of your company, and so on.
Find high judgment individuals internally to lean on, validate ideas with, and consult for input. Be careful how you pick those individuals, as it can go wrong if you lean on individuals that the team doesn’t respect.
Prioritize hiring a functional leader who can operate as the area’s quasi-executive. Ultimately, you will never have enough time to become an expert in each area you work in, and that problem will only compound as you move into more senior roles at larger companies.

This certainly is tricky, but don’t convince yourself that it can’t be done. Most executives in moderately large companies are responsible for functions that they never worked in directly.

Compensation

As an Engineering executive, you will generally be the consumer of a compensation process designed by your People team. In that case, your interaction may come down to reviewing the proposed changes, inspecting for odd decisions, collecting feedback from senior managers about the proposals for their team, and making spot changes to account for atypical circumstances.

That said, I have found it useful to have a bit more context on how these systems typically work, and I will walk through some of the key aspects of how these processes typically work:

Companies typically build compensation bands by looking at aggregated data acquired from compensation benchmarking companies. Many providers of this data rely on companies submitting their data, and try to build a reliable dataset despite each company relying on their own inconsistent leveling rubrics. You’ll often be pushed to accept compensation data as objective truth, but recognize that the underlying dataset is far from perfect, which means compensation decisions based on that dataset will be imperfect as well.
Compensation benchmarking is always done against a self-defined peer group. For example, you might say you’re looking to benchmark against Series A companies headquartered in Silicon Valley. Or Series B companies headquartered outside of “Tier 1 markets” (“Tier 1” being, of course, also an ambiguous term). You can accomplish most compensation goals by shifting your peer group: if you want higher compensation, pick a more competitive peer group, if you want lower compensation, do the opposite. Picking peers is more an art than a science, but it’s another detail to pay attention to if you’re getting numbers that feel odd.
Once you have benchmarks, you’ll generally discuss compensation using the compa ratio, which expands to “comparative ratio.” Someone whose salary is 90% of the benchmark for their level has a 0.9 compa ratio, and someone who has 110% of the benchmark for their level has a 1.1 compa ratio.
Each company will have a number of compensation policies described using compa ratios. For example, most companies have a target compa ratio for new hires of approximately 0.95 compa, and aim for newly promoted individuals to reach approximately 0.9 compa at their new level after their promotion. Another common example is for companies to have a maximum compensation of 1.1 compa ratio for a given level: after reaching that ratio, your compensation would only increase as the market shifts the bands or if you were promoted.
Every company has a geographical adjustment component of their compensation bands. A simple, somewhat common, implementation in the United States is to have three tiers of regions–Tier 1, Tier 2 and Tier 3–with Tier 2 taking a 10% compensation reduction, and Tier 3 taking a 20% reduction. Tier 1 might be Silicon Valley and New York, Tier 2 might be Seattle and Boston, and Tier 3 might be everywhere else. Of course, some companies go far, far deeper into both of these topics as well, but structurally it will be something along these lines.

Whatever the compensation system determines as the correct outcome, that output will have to be checked against the actual company budget. If the two numbers don’t align, then it’s almost always the compensation system that adjusts to meet the budget. Keep this in mind as you get deep into optimizing compensation results: no amount of tweaking will matter if the budget isn’t there to support it.

Whatever the actual numbers end up being, remember that framing the numbers matters at least as much as the numbers themselves. A team that is used to 5-7% year over year increases will be very upset by a 3% increase, even if the market data shows that compensation bands went down that year. If you explain the details behind how numbers are calculated, you can give your team a framework to understand the numbers, which will help them come to terms with any surprises that you have to deliver.

How often should you run cycles?

Everyone has strong opinions about the frequency of their company’s performance cycles. If you run once a year, folks will be frustrated that a new hire joining just after the cycle might not get any formal feedback for their first year. If you run every quarter, the team will be upset about spending so much time on the process, even if the process is lightweight. This universal angst is liberating, because it means there’s no choice that will make folks happy, so you can do what you think will be most effective.

For most companies, I recommend a twice annual process. Some companies do performance twice a year, but only do promotions and compensation once a year, which reduces the overall time to orchestrate the process. There’s little evidence that doing more frequent performance reviews is worthwhile.

The only place I’ll take a particularly firm stand is against processes that anchor on each employee’s start date and subsequent anniversaries. For example, each employee gets a performance review on their anniversary of joining the company. This sort of process is very messy to orchestrate, makes it difficult to make process changes, and prevents inspecting an organization’s overall distributions of ratings, promotions or compensation. It’s an aesthetically pleasing process design, but it simply doesn’t work.

Avoid pursuing perfection

In The Engineering executive’s role in hiring, my advice is to pursue an effective rather than perfect hiring process, and that advice applies here as well. There is always another step to improve your performance or compensation process’ completeness, but good processes keep in mind the cost of implementing each additional step. Many companies with twenty employees provide too little feedback, but almost all companies with 1,000 employees spend most of their time on artifacts of performance that could be devoted instead to giving better feedback or on the business’ underlying work itself rather than meta-commentary about that work.

As an executive, you are likely the only person positioned to make the tradeoff between useful and perfect, and I encourage you to take this obligation seriously. If you abscond this responsibility, you will incrementally turn middle-management into a bureaucratic paper-pushing function rather than a vibrant hub that translates corporate strategy into effective tactics. Each incremental change may be small enough, but in aggregate they’ll have a significant impact.

If you want to get a quick check, just ask your team–particularly the manager of managers–how they feel about the current process, and you’ll get a sense of whether the process is serving them effectively. If they all describe it as slow and painful, especially those who’ve seen processes at multiple companies, then it’s worth considering if you’ve landed in the wrong place.

Summary

This post has covered the core challenges you’ll encounter when operating and evolving the performance and compensation processes for your Engineering organization. With this background, you’ll be ready to resolve the first batch of challenges you’re likely to encounter, but remember that these are extremely deep topics, with much disagreement, and many best practices of a decade ago are considered bad practice today.