Irrational Exuberancehttps://lethain.com/Recent content on Irrational ExuberanceHugo -- gohugo.ioen-usWill LarsonThu, 27 Mar 2025 05:00:00 -0700Is this strategy any good?https://lethain.com/is-this-strategy-any-good/Thu, 27 Mar 2025 05:00:00 -0700https://lethain.com/is-this-strategy-any-good/<p>We&rsquo;ve read a lot of strategy at this point in the book. We can judge a strategy&rsquo;s format, and its construction: both are useful things. However, format is a predictor of quality, not quality itself. The remaining question is, how should we assess whether a strategy is any good?</p> <p><a href="https://lethain.com/uber-service-migration-strategy/">Uber&rsquo;s service migration strategy</a> unlocked the entire organization to make rapid progress. It also led to a sprawling architecture problem down the line. Was it a great strategy or a terrible one? Folks can reasonably disagree, but it&rsquo;s worthwhile developing our point of view why we should prefer one interpretation or the other.</p> <p>This chapter will focus on:</p> <ul> <li>The various ways that are frequently suggested for evaluating strategies, such as input-only evaluation, output-only evaluation, and so on</li> <li>A rubric for evaluating strategies, and why a useful rubric has to recognize that strategies have to be evaluated in phases rather than as a unified construct</li> <li>Why ending a strategy is often a sign of a good strategist, and sometimes the natural reaction to a new phase in a strategy, rather than a judgment on prior phases</li> <li>How missing context is an unpierceable veil for evaluating other companies' strategies with high-conviction, and why you&rsquo;ll end up attempting to evaluate them anyway</li> <li>Why you can learn just as much from bad strategies as from good ones, even in circumstances where you are missing much of the underlying context</li> </ul> <p>Time to refine our judgment about strategy quality a bit.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="how-are-strategies-graded">How are strategies graded?</h2> <p>Before suggesting my own rubric, I want to explore how the industry appears to grade strategies in practice. That&rsquo;s not because I particularly agree with them&ndash;I generally find each approach is missing an important nuance&ndash;understanding their flaws is a foundation to build on.</p> <p>Grading strategy on its outputs is by far the most prevalent approach I&rsquo;ve found in industry. This is an appealing approach, because it does make sense that a strategy&rsquo;s results are more important than anything else. However, this line of thinking can go awry. We saw massive companies like Google move to service architectures, and we copied them because if it worked for Google, it would likely work for us. As discussed in the <a href="https://lethain.com/decompose-monolith-strategy/">monolith decomposition strategy</a>, it did not work particularly well for most adopters.</p> <p>The challenge with grading outputs is that it doesn&rsquo;t distinguish between &ldquo;alpha&rdquo;, how much better your results are because of your strategy, and &ldquo;beta&rdquo;, the expected outcome if you hadn&rsquo;t used the strategy. For example, the <a href="https://lethain.com/pos-acquisition-integration/">acquisition of Index</a> allowed Stripe to build a point-of-sale business line, but they were also on track to internally build that business. Looking <em>only</em> at outputs can&rsquo;t distinguish whether it would have been better to build the business via acquisition or internally. But one of those paths must have been the better strategy.</p> <p>Similarly, there are also strategies that succeed, but do so at unreasonably high costs. <a href="https://lethain.com/api-deprecation-strategy/">Stripe&rsquo;s API deprecation strategy</a> is a good example of a strategy that was <em>extremely</em> well worth the cost for the company&rsquo;s first decade, but eventually became too expensive to maintain as the evolving regulatory environment created more overhead. Fortunately, Stripe modified their strategy to allow some deprecations, but you can imagine an alternate scenario where they attempted to maintain their original strategy, which would have likely failed due to its accumulating costs.</p> <p>Confronting these problems with judging on outputs, it&rsquo;s compelling to switch to the opposite lens and evaluate strategy purely on its inputs. In that approach, as long as the sum of the strategy&rsquo;s parts make sense, it&rsquo;s a good strategy, even if it didn&rsquo;t accomplish its goals. This approach is very appealing, because it appears to focus <em>purely</em> on the strategy&rsquo;s alpha.</p> <p>Unfortunately I find this view similarly deficient. For example, the <a href="https://lethain.com/llm-adoption-strategy/">strategy for adopting LLMs</a> offers a cautious approach to adopting LLMs. If that company is outcompeted by competitors in the incorporation of LLMs, to the loss of significant revenue, I would argue that strategy isn&rsquo;t a great one, even if it&rsquo;s rooted in a proper diagnosis and effective policies. Doing good strategy requires reconciling the theoretical with the practical, so we can&rsquo;t argue that inputs alone are enough to evaluate strategy work. If a strategy is conceptually sound, but struggling to make an impact, then its authors should continue to <a href="https://lethain.com/refining-eng-strategy/">refine it</a>. If its authors take a single pass and ignore subsequent information that it&rsquo;s not working, then it&rsquo;s a failed strategy, regardless of how thoughtful the first pass was.</p> <p>While I find these mechanisms to be incomplete, they&rsquo;re still instructive. By incorporating bits of each of these observations, we&rsquo;re surprisingly close to a rubric that avoids each of these particular downfalls.</p> <h2 id="rubric-for-strategy">Rubric for strategy</h2> <p>Balancing the strengths and flaws of the previous section&rsquo;s ideas, the rubric I&rsquo;ve found effective for evaluating strategy is:</p> <ol> <li><strong>How quickly is the strategy refined?</strong> If a strategy starts out bad, but improves quickly, that&rsquo;s a better strategy than a mostly right strategy that never evolves. Strategy thrives when its practitioners understand it is a living endeavour.</li> <li><strong>How expensive is the strategy&rsquo;s refinement for implementing and impacted teams?</strong> Just as culture eats strategy for breakfast, good policy loses to poor operational mechanisms every time. Especially early on, good strategy is validated cheaply. Expensive strategies are discarded before they can be validated, let alone improved.</li> <li><strong>How well does the current iteration solve its diagnosis?</strong> Ultimately, strategy does have to address the diagnosis it starts from. Even if you&rsquo;re learning quickly and at a low cost, at some point you do have to actually get to impact. Strategy must eventually be graded on its impact.</li> </ol> <p>With this rubric in hand, we can finally assess the <a href="https://lethain.com/uber-service-migration-strategy/">Uber&rsquo;s service migration strategy</a>. It refined rapidly as we improved our tooling, minimized costs because we had to rely on voluntary adoption, and solved its diagnosis extremely well. So this was a great strategy, but how do we think about the fact that its diagnosis missed out on the consequences of a wide-spread service architecture on developer productivity?</p> <p>This brings me to the final component of the strategy quality rubric: the recognition that strategy exists across multiple phases. Each phase is defined by new information&ndash;whether or not this information is known by the strategy&rsquo;s authors&ndash;that render the diagnosis incomplete.</p> <p>The Uber strategy can be thought of as existing across two phases:</p> <ul> <li>Phase 1 used service provisioning to address developer productivity challenges in the monolith.</li> <li>Phase 2 was engaging with consequences of a sprawling service architecture.</li> </ul> <p>All the good grades I gave the strategy are appropriate to the first phase. However, the second phase was ushered in by the negative impacts to developer productivity exposed by the initial rollout. The second phase&rsquo;s grades on the rate of iteration, the cost, and the outcomes were reasonable, but a bit lower than first phase. In the subsequent years, the second phase was succeeded by a third phase that aimed to address the second&rsquo;s challenges.</p> <h2 id="does-stopping-mean-a-strategys-bad">Does stopping mean a strategy&rsquo;s bad?</h2> <p>Now that we have a rubric, we can use it to evaluate one of the important questions of strategy: does giving up on a strategy mean that the strategy is a bad one?</p> <p>The vocabulary of strategy phases helps us here, and I think it&rsquo;s uncontroversial to say that a new phase&rsquo;s evolution of your prior diagnosis might make it appropriate to abandon a strategy. For example, Digg owned our own servers in 2010, but would certainly <em>not</em> buy their own servers if they started ten years later. Circumstances change.</p> <p>Sometimes I also think that aborting a strategy in its first phase is a good sign. That&rsquo;s generally true when the rate of learning is outpaced by the cost of learning. I recently sponsored a developer productivity strategy that had some impact, but less than we&rsquo;d intended. We immortalized a few of the smaller pieces, and returned further exploration to a <a href="https://lethain.com/when-write-down-engineering-strategy/">lower altitude strategy</a> owned by the teams rather than the high altitude strategy that I owned as an executive.</p> <p>Essentially all strategies are competing with strategies at other altitudes, so I think giving up on strategies, especially high altitude strategies, is almost always a good idea.</p> <h2 id="the-unpierceable-veil">The unpierceable veil</h2> <p>Working within our industry, we are often called upon to evaluate strategies from afar. As other companies rolled out LLMs in their products or microservices for their architectures, our companies pushed us on why we weren&rsquo;t making these changes as well. The <a href="https://lethain.com/exploring-for-strategy/">exploration step</a> of strategy helps determine where a strategy might be useful for you, but even that doesn&rsquo;t really help you evaluate whether the strategy or the strategists.</p> <p>There are simply too many dimensions of the rubric that you cannot evaluate when you&rsquo;re far away. For example, how many phases occurred before the idea that became the external representation of the strategy came into existence? How much did those early stages cost to implement? Is the <em>real</em> mastery in the operational mechanisms that are never reported on? Did the external representation of the strategy ever happen at all, or is it the logical next phase that solves the reality of the internal implementation?</p> <p>With all that in mind, I find that it&rsquo;s generally impossible to accurately evaluate strategies happening in other companies with much conviction. Even if you want to, the missing context is an impenetrable veil. That&rsquo;s not to say that you shouldn&rsquo;t try to evaluate their strategies, that&rsquo;s something that you&rsquo;ll be forced to do in your own strategy work. Instead, it&rsquo;s a reminder to keep a low confidence score in those appraisals: you&rsquo;re guaranteed to be missing something.</p> <h2 id="learning-despite-quality-issues">Learning despite quality issues</h2> <p>Although I believe it&rsquo;s quite valuable for us to judge the quality of strategies, I want to caution against going a step further and making the conclusion that you can&rsquo;t learn from poor strategies. As long as you are aware of a strategy&rsquo;s quality, I believe you can learn just as much from failed strategies as from great strategy.</p> <p>Part of this is because often even failed strategies have early phases that work extremely well. Another part is because strategies tend to fail for interesting reasons. I learned just as much from Stripe&rsquo;s failed rollout of agile, which struggled due to missing operational mechanisms, as I did from Calm&rsquo;s successful transition to focus primarily on product engineering. Without a clear point of view on which of these worked, you&rsquo;d be at risk of learning the wrong lessons, but with forewarning you don&rsquo;t run that risk.</p> <p>Once you&rsquo;ve determined a strategy was unsuccessful, I find it particularly valuable to determine the strategy&rsquo;s phases and understand which phase and where in the <a href="https://lethain.com/components-of-eng-strategy/">strategy steps</a> things went wrong. Was it a lack of operational mechanisms? Was the policy itself a poor match for the diagnosis? Was the diagnosis willfully ignoring a truculent executive? Answering these questions will teach you more about strategy than only studying successful strategies, because you&rsquo;ll develop an intuition for which parts truly matter.</p> <h2 id="summary">Summary</h2> <p>Finishing this chapter, you now have a structured rubric for evaluating a strategy, moving beyond &ldquo;good strategy&rdquo; and &ldquo;bad strategy&rdquo; to a nuanced assessment. This assessment is not just useful for grading strategy, but makes it possible to specifically improve your strategy work.</p> <p>Maybe your approach is sound, but your operational mechanisms are too costly for the rate of learning they facilitate. Maybe you&rsquo;ve treated strategy as a single iteration exercise, rather than recognizing that even excellent strategy goes stale over time. Keep those ideas in mind as we head into the final chapter on <a href="https://lethain.com/how-to-get-better-at-strategy/">how you personally can get better at strategy work</a>.</p>Steps to build an engineering strategy.https://lethain.com/components-of-eng-strategy/Thu, 27 Mar 2025 04:00:00 -0700https://lethain.com/components-of-eng-strategy/<p>Often you&rsquo;ll see a disorganized collection of ideas labeled as a &ldquo;strategy.&rdquo; Even when they&rsquo;re dense with ideas, these can be hard to parse, and are a major reason why most engineers will claim their company doesn&rsquo;t have a clear strategy even though my experience is that <em>all</em> companies follow some strategy, even if it&rsquo;s undocumented.</p> <p>This chapter lays out a repeatable, structured approach to drafting strategy. It introduces each step of that approach, which are then detailed further in their respective chapters. Here we&rsquo;ll cover:</p> <ul> <li>How these five steps fit together to facilitate creating strategy, especially by preventing practitioners from skipping steps that feel awkward or challenging.</li> <li>Step 1: Exploring the wider industry&rsquo;s ideas and practices around the strategy you&rsquo;re working on. Exploration is understanding what recent research might change your approach, and how the state of the art has changed since you last tackled a similar problem.</li> <li>Step 2: Diagnosing the details of your problem. It&rsquo;s hard to slow down to understand your problem clearly before attempting to solve it, but it&rsquo;s even more difficult to solve anything well without a clear diagnosis.</li> <li>Step 3: Refinement is taking a raw, unproven set of ideas and testing them against reality. Three techniques are introduced to support this validation process: strategy testing, systems modeling, and Wardley mapping.</li> <li>Step 4: Policy makes the tradeoffs and decisions to solve your diagnosis. These can range from specifying how software is architected, to how pull requests are reviewed, to how headcount is allocated within an organization.</li> <li>Step 5: Operations are the concrete mechanisms that translate policy into an active force within your organization. These can be nudges that remind you about code changes without associated tests, or weekly meetings where you study progress on a migration.</li> <li>Whether these steps are sacred or are open to adaptation and experimentation, including when you personally should persevere in attempting steps that don&rsquo;t feel effective.</li> </ul> <p>From this chapter&rsquo;s launching point, you&rsquo;ll have the high-level summaries of each step in strategy creation, and can decide where you want to read further.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="how-the-steps-become-strategy">How the steps become strategy</h2> <p>Creating effective strategy is not rote incantation of a formula. You can’t merely follow these steps to guarantee that you&rsquo;ll create a great strategy. However, I&rsquo;ve found over and over is that strategies fail more due to avoidable errors than from fundamentally unsound thinking. Busy people skip steps. Especially steps they dislike or have failed at before.</p> <p>These steps are the scaffolding to avoid those errors. By practicing routinely, you&rsquo;ll build powerful habits and intuition around which approach is most appropriate for the current strategy you&rsquo;re working on. They also help turn strategy into a community practice that you, your colleagues, and the wider engineering ecosystem can participate in together.</p> <p>Each step is an input that flows into the next step. Your exploration is the foundation of a solid diagnosis. Your diagnosis helps you search the infinite space of policy for what you need now. Operational mechanisms help you turn policy into an active force supporting your strategy rather than an abstract treatise.</p> <p>If you&rsquo;re skeptical of the steps, you should certainly maintain your skepticism, but do give them a few tries before discarding them entirely. You may also appreciate the discussion in the chapter on <a href="https://lethain.com/bridging-eng-strategy-theory-and-practice/">bridging between theory and practice when doing strategy</a>.</p> <h2 id="explore">Explore</h2> <p>Exploration is the deliberate practice of searching through a strategy’s problem and solution spaces before allowing yourself to commit to a given approach. It&rsquo;s understanding how other companies and teams have approached similar questions, and whether their approaches might also work well for you. It&rsquo;s also learning why what brought you so much success at your former employer isn&rsquo;t necessarily the best solution for your current organization.</p> <p>The <a href="https://lethain.com/uber-service-migration-strategy/">Uber service migration strategy</a> used exploration to understand the service ecosystem by reading industry literature:</p> <blockquote> <p>As a starting point, we find it valuable to read <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf">Large-scale cluster management at Google with Borg</a> which informed some elements of the approach to Kubernetes, and <a href="https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf">Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center</a> which describes the Mesos/Aurora approach.</p></blockquote> <p>It also used a <a href="https://lethain.com/wardley-mapping/">Wardley map</a> to explore the cloud compute ecosystem.</p> <p><img src="https://lethain.com/static/blog/strategy/wardley-compute-v2.png" alt="Wardley map of evolution of service orchestration in 2014"></p> <p>For more detail, read the <a href="https://lethain.com/exploring-for-strategy/">Exploration chapter</a>.</p> <h2 id="diagnose">Diagnose</h2> <p>Diagnosis is your attempt to correctly recognize the context that the strategy needs to solve before deciding on the policies to address that context. Starting from your exploration&rsquo;s learnings, and your understanding of your current circumstances, building a diagnosis forces you to delay thinking about solutions until you fully understand your problem&rsquo;s nuances.</p> <p>A diagnosis can be largely data driven, such as the <a href="https://lethain.com/private-equity-strategy/">navigating a Private Equity ownership transition strategy</a>:</p> <blockquote> <p>Our Engineering headcount costs have grown by 15% YoY this year, and 18% YoY the prior year. Headcount grew 7% and 9% respectively, with the difference between headcount and headcount costs explained by salary band adjustments (4%), a focus on hiring senior roles (3%), and increased hiring in higher cost geographic regions (1%).</p></blockquote> <p>It can also be less data driven, instead aiming to summarize a problem, such as the <a href="https://lethain.com/pos-acquisition-integration/">Index acquisition strategy</a>&rsquo;s summary of the known and unknown elements of the technical integration prior to the acquisition closing:</p> <blockquote> <p>We will need to rapidly integrate the acquired startup to meet this timeline. We only know a small number of details about what this will entail. We do know that point-of-sale devices directly operate on payment details (e.g. the point-of-sale device knows the credit card details of the card it reads).</p> <p>Our compliance obligations restrict such activity to our “tokenization environment”, a highly secured and isolated environment with direct access to payment details. This environment converts payment details into a unique token that other environments can utilize to operate against payment details without the compliance overhead of having direct access to the underlying payment details.</p></blockquote> <p>The approach, and challenges, of developing a diagnosis are detailed in the <a href="https://lethain.com/diagnosis-for-strategy/">Diagnosis chapter</a>.</p> <h2 id="refine-test-map--model">Refine (Test, Map &amp; Model)</h2> <p>Strategy refinement is a toolkit of methods to identify which parts of your diagnosis are most important, and verify that your approach to solving the diagnosis actually works. This chapter delves into the details of using three methods in particular: <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a>, <a href="https://lethain.com/strategy-systems-modeling/">systems modeling</a>, and <a href="https://lethain.com/wardley-mapping/">Wardley mapping</a>.</p> <p><img src="https://lethain.com/static/blog/strategy/QualityMentalModels.png" alt="Systems model of requests succeeding and failing between a user, load balancer, and server."></p> <p class="tc"><em>An example of a systems modeling diagram.</em></p> <p>These techniques are also demonstrated in the strategy case studies, such as the <a href="https://lethain.com/wardley-llm-ecosystem/">Wardley map of the LLM ecosystem</a>, or the <a href="https://lethain.com/engineering-cost-model/">systems model of backfilling roles without downleveling them</a>.</p> <p>For more detail, read the <a href="https://lethain.com/refining-eng-strategy/">Refinement chapter</a>.</p> <div class="bg-light-gray br4 ph3 pv1"> <h3 id="why-isnt-refinement-earlier-or-later">Why isn&rsquo;t refinement earlier (or later)?</h3> <p>A frequent point of disagreement is that refinement should occur before the diagnosis. Another is that mapping and modeling are two distinct steps, and mapping should occur before diagnosis, and modeling should occur after policy. A third is that refinement ought to be the final step of strategy, turning the steps into a looping cycle. These are all reasonable observations, so let me unpack my rationale for this structure.</p> <p>By <em>far</em> the biggest risk for most strategies is not that you model too early or map too late, but instead that you simply skip both steps entirely. My foremost concern is minimizing the required investment into mapping and modeling such that more folks do these steps at all. Refining after exploring and diagnosing allows you to concentrate your efforts on a smaller number of load-bearing areas.</p> <p>That said, it&rsquo;s common to refine many places in your strategy creation. You&rsquo;re just as likely to have three small refinement steps as one bigger one.</p> </div> <h2 id="policy">Policy</h2> <p>Policy is interpreting your diagnosis into a concrete plan. This plan also needs to work, which requires careful study of what&rsquo;s worked within your company, and what new ideas you&rsquo;ve discovered while exploring the current problem.</p> <p>Policies can range from providing directional guidance, such as the <a href="https://lethain.com/user-data-access-strategy/">user data controls strategy</a>&rsquo;s guidance:</p> <blockquote> <p><strong>Good security discussions don’t frame decisions as a compromise between security and usability.</strong> We will pursue multi-dimensional tradeoffs to simultaneously improve security and efficiency. Whenever we frame a discussion on trading off between security and utility, it’s a sign that we are having the wrong discussion, and that we should rethink our approach.</p> <p>We will prioritize mechanisms that can both automatically authorize and automatically document the rationale for accesses to customer data. The most obvious example of this is automatically granting access to a customer support agent for users who have an open support ticket assigned to that agent. (And removing that access when that ticket is reassigned or resolved.)</p></blockquote> <p>To committing not to make a decision until later, as practiced in the <a href="https://lethain.com/pos-acquisition-integration/">Index acquisition strategy</a>:</p> <blockquote> <p>Defer making a decision regarding the introduction of Java to a later date: the introduction of Java is incompatible with our existing engineering strategy, but at this point we’ve also been unable to align stakeholders on how to address this decision. Further, we see attempting to address this issue as a distraction from our timely goal of launching a joint product within six months.</p> <p>We will take up this discussion after launching the initial release.</p></blockquote> <p>This chapter further goes into evaluating policies, overcoming ambiguous circumstances that make it difficult to decide on an approach, and developing novel policies.</p> <p>For full detail, read the <a href="https://lethain.com/policy-for-strategy/">Policy chapter</a>.</p> <h2 id="operations">Operations</h2> <p>Even the best policies have to be interpreted. There will be new circumstances their authors never imagined, and the policies may be in effect long after their authors have left the organization. Operational mechanisms are the concrete implementation of your policy.</p> <p>The simplest mechanisms are an explicit escalation path, as shown in <a href="https://lethain.com/calm-product-eng-company/">Calm&rsquo;s product engineering strategy</a>:</p> <blockquote> <p>Exceptions are granted by the CTO, and must be in writing. The above policies are deliberately restrictive. Sometimes they may be wrong, and we will make exceptions to them. However, each exception should be deliberate and grounded in concrete problems we are aligned both on solving and how we solve them. If we all scatter towards our preferred solution, then we’ll create negative leverage for Calm rather than serving as the engine that advances our product.</p></blockquote> <p>From that starting point, the mechanisms can get far more complex. This chapter works through evaluating mechanisms, composing an operational plan, and the most common sorts of operational mechanisms that I&rsquo;ve seen across strategies.</p> <p>For more detail, read the <a href="https://lethain.com/operations-for-strategy/">Operations chapter</a>.</p> <h2 id="is-the-structure-sacrosanct">Is the structure sacrosanct?</h2> <p>When someone&rsquo;s struggling to write a strategy document, one of the first tools someone will often recommend is a strategy template. Templates are great: they reduce the ambiguity of an already broad project into something more tractable. If you&rsquo;re wondering if you should use a template to craft strategy: sure, go ahead!</p> <p>However, I find that well-meaning, thoughtful templates often turn into lumbering, callous documents that serve no one well. The secret to good templates is that someone has to own it, and that person has to care about the template writer first and foremost, rather than the various constituencies that want to insert requirements into the strategy creation process. The security, compliance and cost of your plans matter a lot, but many organizations start to layer in more and more requirements into these sorts of documents until the idea of writing them becomes prohibitively painful.</p> <p>The best advice I can give someone attempting to write strategy, is that you should discard every element of strategy that gets in your way <em>as long as</em> you can explain what that element was intended to accomplish. For example, if you&rsquo;re drafting a strategy and you don&rsquo;t find any operational mechanisms that fit. That&rsquo;s fine, discard that section. Ultimately, the structure is not sacrosanct, it&rsquo;s the thinking behind the sections that really matter.</p> <p>This topic is explored in more detail in the chapter on <a href="https://lethain.com/readable-engineering-strategy-documents/">Making engineering strategies more readable</a>.</p> <h2 id="summary">Summary</h2> <p>Now, you know the foundational steps to conducting strategy. From here, you can dive into the details with the strategy case studies like <a href="https://lethain.com/llm-adoption-strategy/">How should you adopt LLMs?</a> or you can maintain a high altitude starting with how <a href="https://lethain.com/exploring-for-strategy/">exploration creates the foundation for an effective strategy</a>.</p> <p>Whichever you start with, I encourage you to eventually work through both to get the full perspective.</p>Operational mechanisms for strategy.https://lethain.com/operations-for-strategy/Thu, 20 Mar 2025 04:00:00 -0700https://lethain.com/operations-for-strategy/<p>Even the best policies fail if they aren&rsquo;t adopted by the teams they&rsquo;re intended to serve. Can we persistently change our company&rsquo;s behaviors with a one-time announcement? No, probably not.</p> <p>I refer to the art of making policies work as &ldquo;operations&rdquo; or &ldquo;strategy operations.&rdquo; The good news is that effectively operating a policy is two-thirds avoiding common practices that simply don&rsquo;t work. The other one-third takes some practice, but can be practiced in any engineering role: there&rsquo;s no need to wait until you&rsquo;re an executive to start building mastery.</p> <p>This chapter will dig into those mechanisms, with particular focus on:</p> <ul> <li>How policies are supported by operations, and how operations are composed of mechanisms that ensure they work well</li> <li>Evaluating operational mechanisms to select between different options, and determine which mechanisms are unlikely to be an effective choice</li> <li>Composing an operational plan for the specific set of policies that you are looking to support</li> <li>Common varieties of effective mechanisms such as approval forums, inspection mechanisms, nudges, and so on. We&rsquo;ll also explore the sorts of mechanisms that tend to work poorly</li> <li>How to adjust your approach to operations if you are in an engineering role rather than an executive role</li> <li>How cargo-culting remains the largest threat to effective strategy operations</li> </ul> <p>Let&rsquo;s unpack the details of turning your <em>potentially</em> good policy into an impactful policy.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="what-are-operational-mechanisms">What are operational mechanisms?</h2> <p>Operations are how a policy is implemented and reinforced. Effective operations ensure that your policies actually accomplish something. They can range from a recurring weekly meeting, to an alert that notifies the team when a threshold is exceeded, to a promotion rubric requiring a certain behavior to be promoted.</p> <p>In the strategy for <a href="https://lethain.com/private-equity-strategy/">working with new private equity ownership</a>, we introduce a policy to backfill hires at a lower level, and also limit the maximum number of principal engineers:</p> <blockquote> <p><strong>We will move to an “N-1” backfill policy</strong>, where departures are backfilled with a less senior level. We will also institute a strict maximum of one Principal Engineer per business unit, with any exceptions approved in writing by the CTO–this applies for both promotions and external hires.</p></blockquote> <p>That introduces an explicit operational mechanism of escalations going to the CTO, but it also introduces an implicit and undefined mechanism: how do we ensure the backfills are actually down-leveled as the policy instructs? It might be a group chat with engineering recruiting where the CTO approves the level of backfilled roles. Instead, it might be the responsibility of recruiting to enforce that downleveling. In a third approach, it might be taken on trust that hiring managers will do the right thing. Each of those three scenarios is a potential operational solution to implementing this policy. Operations is picking the right one for your circumstances, and then tweaking it as you learn from running it.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><strong>Operations in government</strong></p> <p>For another interesting take on how critical operations are, <em><a href="https://www.recodingamerica.us/">Recoding America</a></em> by Jennifer Pahlka is well worth the read. It explores how well-intended government legislation often isn&rsquo;t implementable, which results in policies that require massive IT investments but provide little benefit to constituents.</p> </div> <h2 id="how-to-evaluate-mechanisms">How to evaluate mechanisms</h2> <p>In order to determine the most effective operational mechanisms for the problems you&rsquo;re working on, it&rsquo;s useful to have a standardized rubric for evaluating mechanisms. While this rubric isn&rsquo;t perfectly universal&ndash;customize it for your needs&ndash;having any rubric will make it easier to evaluate your options consistently.</p> <p>The rubric I use to evaluate whether an operational mechanism will be effective is:</p> <ol> <li><strong>Measurability</strong>: Can you measure both leading and lagging indicators to <a href="https://lethain.com/inspection/">inspect</a> the mechanism&rsquo;s impact? If you have to choose between the two, measuring leading indicators allows much quicker evaluation and iteration on your mechanisms.</li> <li><strong>Adoption cost</strong>: How much work will <a href="https://lethain.com/migrations/">migrating</a> to this mechanism require? Can this work be done incrementally or does it require a major, coordinated shift?</li> <li><strong>User ease (or burden)</strong>: After adopting this policy, how much easier (or harder) will it be for users to perform their work? If things will be harder, are those users able to tolerate the additional time?</li> <li><strong>Provider ease (or burden)</strong>: How much additional ongoing maintenance will this mechanism require from the centralized or platform team providing it? For example, if every new architecture proposal requires a thorough review by your Security team, does the Security team have the actual ability to support those reviews?</li> <li><strong>Reliance on authority</strong>: How much does this mechanism depend on a top-down authority&rsquo;s active support? If the sponsoring executive departs, will this mechanism remain effective? Is that an effective tradeoff in this case?</li> <li><strong>Culturally aligned</strong>: Is this something that your organization is going to do, or something that they are going to fight against each step? Is there a way you can adjust the framing to make it more acceptable to your organization&rsquo;s culture?</li> </ol> <p>Generally, I find folks are good at evaluating mechanisms against these critera, but somewhat worse at accepting the consequences of their evaluation. For example, falling in love with a particular mechanism and then trying to force the organization to accept a mechanism whose adoption cost is unbearably high, or introduce a mechanism that creates significant user burden onto a team that is already struggling with tight efficiency goals like a customer support team.</p> <p>Self-awareness helps here, but so does consulting others to point out the errors in your reasoning, which is a core part of how I&rsquo;ve found success in adopting operational mechanisms.</p> <h2 id="composing-an-operational-plan">Composing an operational plan</h2> <p>Your operational plan is the sum of the mechanisms used to support your policies. While evaluating each individual mechanism in isolation is part of creating an operations plan, it&rsquo;s also valuable to consider how the mechanisms will work together:</p> <ol> <li> <p><strong>Review the policies you&rsquo;ve developed.</strong> What sort of mechanisms seem most likely to support these policies? How might these mechanisms be pooled together to avoid redundancy?</p> </li> <li> <p><strong>Review the operational mechanisms that have worked in your organization.</strong> What mechanisms have been used to best effect, and which have left a sufficiently bad taste in the organization&rsquo;s collective memory that they&rsquo;ll be hard to reuse effectively?</p> </li> <li> <p><strong>Which new mechanisms showed up in your <a href="https://lethain.com/exploring-for-strategy/">exploration</a>?</strong> In your exploration phase, you&rsquo;ll frequent encounter mechanisms that your organization hasn&rsquo;t previously tried. If any of them seem particularly well-suited to the policies you&rsquo;re considering, and none of your organization&rsquo;s frequently used mechanisms are good fits, then consider testing a new one.</p> </li> <li> <p><strong>Evaluate mechanisms against the evaluation rubric.</strong> For each of the mechanisms you&rsquo;re considering using, apply the rubric from the above <em>How to evaluate mechanisms</em> to validate they&rsquo;re good fits.</p> </li> <li> <p><strong>Consolidate into an operational plan.</strong> Now that you&rsquo;ve determined the mechanisms you want to consider, work on fitting the full set of mechanisms into one coherent plan. Be particularly mindful of the ease, or burden, the integrated plan creates for both your users and platform providers.</p> </li> <li> <p><strong>Validate plan with users and providers.</strong> Many plans make sense from afar, but fail due to imposing an unreasonable burden. Or the burden might be acceptable, but the actual workflow simply won&rsquo;t work at all.</p> </li> <li> <p><strong>Consider validating via <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a>.</strong> If you run the above process, and can&rsquo;t come to an agreement with stakeholders on your proposed plan, then simply commit to running a strategy testing process including the plan. This will create space for everyone to build confidence in the approach before they feel forced to make a commitment to following it long-term.</p> <p>Even if you don&rsquo;t use strategy testing for your plan, at least commit to scheduling a review in three months reflecting on how things have worked out.</p> </li> </ol> <p>Your operational plan is the vehicle that delivers your policies to your organization. It&rsquo;s extremely tempting to skip refining the details here, but it&rsquo;s a relatively quick step and will completely change your strategy&rsquo;s outcomes.</p> <h2 id="common-mechanisms">Common mechanisms</h2> <p>Most companies have a handful of frequently used operational mechanisms. Some of those mechanisms are company specific, such as <a href="https://forum.commoncog.com/t/the-amazon-weekly-business-review-commoncog/1958">Amazon&rsquo;s weekly business review</a>, and others repeat across companies like requiring executive approval. Across the many mechanisms you&rsquo;ll encounter, you can generally cluster them into recurring categories. This section covers the mechanisms I&rsquo;ve found consistently effective.</p> <h3 id="approval-and-advice-forums">Approval and advice forums</h3> <p>At a high level, new policies are obvious, simple and apply cleanly to the problem they are intended to solve. However, when you apply those policies to detailed, complex circumstances, it&rsquo;s often ambiguous how to stay loyal to a policy&rsquo;s intentions. Approval and advice forums are a common solution to that problem.</p> <p><a href="https://lethain.com/calm-product-eng-company/">Calm&rsquo;s product engineering strategy</a> shows what the simplest, and most common, approval forum looks like in practice:</p> <blockquote> <p><strong>Exceptions are granted by the CTO, and must be in writing.</strong> The above policies are deliberately restrictive. Sometimes they may be wrong, and we will make exceptions to them. However, each exception should be deliberate and grounded in concrete problems we are aligned both on solving and how we solve them. If we all scatter towards our preferred solution, then we’ll create negative leverage for Calm rather than serving as the engine that advances our product.</p> <p>All exceptions must be written. If they are not written, then you should operate as if it has not been granted. Our goal is to avoid ambiguity around whether an exception has, or has not, been approved. If there’s no written record that the CTO approved it, then it’s not approved.</p></blockquote> <p>This example also has several weaknesses that happen in many approval forums. Most importantly, it doesn&rsquo;t make it clear how to get approvals. It would be stronger if it explicitly explained how to get an approval (perhaps go ask in <code>#cto-approvals</code>), and where to find prior approvals to help someone considering requesting an exception to calibrate their request.</p> <p>Approvals don&rsquo;t necessarily need to come from senior leadership. Instead, the senior leadership can loan their authority on a topic to another group. The <a href="https://lethain.com/llm-adoption-strategy/">LLM adoption strategy</a> provides a good example of this:</p> <blockquote> <p>Start with Anthropic. We use Anthropic models, which are available through our existing cloud provider via AWS Bedrock. To avoid maintaining multiple implementations, where we view the underlying foundational model quality to be somewhat undifferentiated, we are not looking to adopt a broad set of LLMs at this point. This is anchored in our Wardley map of the LLM ecosystem.</p> <p>Exceptions will be reviewed by the Machine Learning Review in #ml-review</p></blockquote> <p>In a more community-minded organization, the approval forums might not require senior leadership involvement at all. Instead, the culture might create an environment where the forums&rsquo; feedback is taken seriously on its own merits.</p> <p>Every company does approval forums a bit differently, ranging from our experiments at <a href="https://lethain.com/navigators/">Carta with Navigators</a>, granting executive authority for technical decisions to named engineers in each area, to Andrew Harmel-Law&rsquo;s discussion of this topic in <em><a href="https://www.amazon.com/Facilitating-Software-Architecture-Empowering-Architectural-ebook/dp/B0DMHGWCPN/">Facilitating Software Architecture</a></em>. You can spend a lot of time arguing the details here, my experience is that having the right participants and a good executive sponsor matter a lot, and the other pieces matter a lot less.</p> <h3 id="inspection">Inspection</h3> <p>While even the best policies can fail, the more common scenario is that a policy will sort-of work, and need some modest adjustments to make it more successful. An <a href="https://lethain.com/inspection/">inspect</a> mechanism allows you to evaluate whether your policy&rsquo;s is succeeding and if you need to make adjustments.</p> <p>The <a href="https://lethain.com/user-data-access-strategy/">user-data access strategy</a> provides an example:</p> <blockquote> <p><strong>Measure progress on percentage of customer data access requests justified by a user-comprehensible, automated rationale.</strong> This will anchor our approach on simultaneously improving the security of user data and the usability of our colleagues’ internal tools. If we only expand requirements for accessing customer data, we won’t view this as progress because it’s not automated (and consequently is likely to encourage workarounds as teams try to solve problems quickly). Similarly, if we only improve usability, charts won’t represent this as progress, because we won’t have increased the number of supported requests.</p> <p>As part of this effort, we will create a private channel where the security and compliance team has visibility into all manual rationales for user-data access, and will directly message the manager of any individual who relies on a manual justification for accessing user data.</p></blockquote> <p>This example is a good start, but fully realizing an inspection mechanism requires concretely specifying where and how the data will be tracked. A better version of this would include a link to the dashboard you&rsquo;ll look at, and a commitment to reviewing the data on a certain frequency.</p> <p>For a recent inspection mechanism, I created a recurring invite with a link to the relevant data dashboard, and a specific chat channel for discussion, and invited the working group who had agreed to review the data on that cadence. This wasn&rsquo;t a synchronous meeting, but rather a commitment to independently review, and discuss anything that felt surprising.</p> <p>Your particular mechanisms could be threshold-triggered alerts, something you fold into an existing metrics review meeting, a script you commit to running and reviewing periodically, or something else. The most important thing is that it cannot silently fail.</p> <h3 id="nudges">Nudges</h3> <p>While it&rsquo;s common to hear complaints about how a team isn&rsquo;t following a new policy, as if it were a deliberate choice they&rsquo;d made, I find it more common that people want to do things the new way, but rarely take time to learn how to do it. Nudges are providing individuals with context to inform them about a better way they might do something, and they are an exceptionally effective mechanism.</p> <p>Grounding this in an example, at Stripe we had a policy of allowing teams to self-authorize introducing new cloud hosting costs. This worked well almost all the time. However, sometimes teams would accidentally introduce large cost increases without realizing it, and teams that introduced those spikes almost never had any awareness that they had caused the problem. Even if we&rsquo;d told them they must not introduce unapproved spending spikes, they simply didn&rsquo;t perceive they&rsquo;d done it.</p> <p>We had the choice between preventing all teams from introducing new spend, or we could try using a nudge. The nudge we added informed teams when their cloud spend accelerated month over month, directed to charts that explained the acceleration, and told them where to go to ask questions. Nudges pair well with inspections, and there was also a monthly review by the Efficiency Engineering team to review any spikes and reach out where necessary.</p> <p>Maybe we could have forced all teams to review new spend, but this nudge approach didn&rsquo;t require an authoritative mandate to implement. It also meant we only spent time advising teams that <em>actually</em> spent too much, instead of having to discuss with every team that <em>might</em> spend too much.</p> <p>As another example making that point, a working group at Carta added a nudge to inform managers of untested pull requests merged by their team. Some managers had previously said they simply didn&rsquo;t know when and why their team had merged untested pull requests, and this nudge made it easy to detect. The nudge also respected their attention by not sending a notification at all if there wasn&rsquo;t a new, untested pull request.</p> <p>With poor ergonomics, nudges can be an overwhelming assault on your colleagues attention, but done well, I continue to believe they are the most effective operational mechanism.</p> <h3 id="documentation">Documentation</h3> <p>Policies can&rsquo;t be enforced by people who don&rsquo;t know they exist, or by people who don&rsquo;t know how to follow those policies. In my experience, nudges are the most effective way of solving both of those problems, because nudges bring information to people at exactly the moment that information would be useful. At most companies, well-done nudges are relatively uncommon, and the far more common solution to lack of information is documentation and training.</p> <p>There are so many approaches to both of these topics, and I&rsquo;ve not found my own approaches here particularly effective. Consequently, I am hesitant to give much advice on what will work best for you. The best I can offer is that following standard practices for your company, even if the outcomes seem imperfect, is probably your best bet. Internal knowledge bases tend to rot quickly, and introducing yet another knowledge base is almost always the illusion of progress rather than real progress. Even when you really don&rsquo;t like the current one.</p> <p>Finally, remember that success for documentation and training is not necessarily that everyone in the company knows how a new policy works. Instead, as discussed in <a href="https://lethain.com/is-engineering-strategy-useful/">the chapter on whether strategy is useful</a>, a more useful goal is informational herd immunity: as long as someone on each team understands your policy, the team will generally be capable of following it.</p> <h3 id="automation">Automation</h3> <p>Relying on humans to respond is slow, and the quality of human response is highly varied. In many cases, automation provides the most effective and most scalable mechanism to support your policies&rsquo; rollout.</p> <p>Automation was key in the <a href="https://lethain.com/uber-service-migration-strategy/">Uber service migration strategy</a>, moving us out of a manual, slow process that was taking up a great deal of user and provider time:</p> <blockquote> <p>Move to structured requests, and out of tickets. Missing or incorrect information in provisioning requests create significant delays in provisioning. Further, collecting this information is the first step of moving to a self-service process. As such, we can get paid twice by reducing errors in manual provisioning while also creating the interface for self-service workflows.</p></blockquote> <p>In that case, better automation allowed us to eliminate a series of back-and-forth negotiations to collect data, and to instead get the necessary information in a single step. Occasionally we still ran into users who couldn&rsquo;t fill in the form, but now we could focus on providing a good manual experience for those rare exceptions.</p> <p>As you use automation as a core strategy mechanism, it&rsquo;s important to recognize that designing an effective user experience is a prerequisite to automation having a positive impact. If you view the user experience of your automation as a secondary concern, then you are unlikely to make much impact with automation.</p> <h3 id="deferment-to-future-work">Deferment to future work</h3> <p>Sometimes there&rsquo;s something you really want a policy to do, but you also know that you have no reasonable mechanism to do it. In that case, you may find explicitly deferring action on the topic useful.</p> <p>The strategy for <a href="https://lethain.com/pos-acquisition-integration/">integration of the Index acquisition at Stripe</a> uses this mechanism:</p> <blockquote> <p>Defer making a decision regarding the introduction of Java to a later date: the introduction of Java is incompatible with our existing engineering strategy, but at this point we’ve also been unable to align stakeholders on how to address this decision. Further, we see attempting to address this issue as a distraction from our timely goal of launching a joint product within six months.</p> <p>We will take up this discussion after launching the initial release.</p></blockquote> <p>As did the strategy for <a href="https://lethain.com/private-equity-strategy/">working with a private equity acquirer</a>:</p> <blockquote> <p>We believe there are significant opportunities to reduce R&amp;D maintenance investments, but we don’t have conviction about which particular efforts we should prioritize. We will kickoff a working group to identify the features with the highest support load.</p></blockquote> <p>There&rsquo;s no shame in deferral. As much as you want to make progress on a certain area, it&rsquo;s better to explicitly acknowledge that you can&rsquo;t make progress on it&ndash;and clarify when you will be able to&ndash;then to allow the organization to churn on an intractable problem.</p> <h3 id="meetings">Meetings</h3> <p>Meetings are the final mechanism, and you can fit any and all of the above mechanisms into a meeting. They are a universal mechanism, although frequently overused because they can do an adequate job of operating almost any policy.</p> <p>The most common mechanism is a reporting meeting, such as reporting progress in the Executive Weekly Meeting as <a href="https://lethain.com/llm-adoption-strategy/">suggested in the LLM adoption strategy</a>:</p> <blockquote> <p><strong>Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets.</strong> Through modeling our driver lifecycle, we determined that improving onboarding time will have little impact on the total number of active drivers. Instead, we are focusing on mechanisms to reactivate departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers.</p> <p>Report on progress monthly in Exec Weekly Meeting, coordinated in #exec-weekly</p></blockquote> <p>The other common meeting archetype is the <a href="https://lethain.com/testing-strategy-iterative-refinement/">weekly working meeting</a> introduced in the chapter on strategy testing. Meetings are almost always the most expensive mechanism you can find to solve a problem, but they are easy to suggest, run, and iterate on.</p> <p>If you can&rsquo;t find any other mechanism you believe in, then a meeting is a decent starting point. Just don&rsquo;t get too fond of them, and try to iterate your way to canceling every meeting that you start.</p> <h2 id="anti-patterns">Anti-patterns</h2> <p>In addition to the effective operational methods discussed above, there are a number of additional mechanisms that are frequently used, but which I consider anti-patterns. They can provide some value, but there&rsquo;s almost always a better alternative.</p> <ol> <li> <p><strong>Top-down pronouncements</strong>: Sometimes a policy will be operationalized by simply declaring it must be followed. It&rsquo;s common to see a leader declare that a policy is now in effect, assuming that the announcement is a useful way to implement the new policy.</p> <p>For example, some &ldquo;return to office&rdquo; policies dictate that the team must work from their office, but driving a real change requires motivating those individuals to actually return.</p> </li> <li> <p><strong>Education-as-announcements rollouts</strong>: The default way that many companies roll out policies is through one-time &ldquo;education,&rdquo; often as an all-company announcement for existing employees. They might follow up by updating training for onboarding new-hires. Education sounds great, but a couple of trainings will never change organizational behavior.</p> <p>Changing behavior requires ongoing reminders, visible role models, inspection to understand why some teams are <em>not</em> adopting the behavior, and so on. Education can be a good component of operationalizing a policy, but it cannot stand on its own.</p> </li> <li> <p><strong>Mandatory recurring trainings:</strong> These are a staple of compliance driven policies, generally because of laws which require providing a certain number of hours of relevant training each year.</p> <p>There are two deep challenges with mandatory trainings. First, because attendance is <em>required</em>, people tend to make little effort to make the content good. Second, many folks don&rsquo;t pay attention because they expect the content to be low quality. It&rsquo;s not uncommon to hear people say that they&rsquo;ve never heard of a policy that they&rsquo;ve performed annual training on for multiple years.</p> <p>It&rsquo;s possible to overcome these barriers, but in a situation where you&rsquo;re accountable for changing outcomes, as opposed to shifting legal obligations away from the company, these tend to work poorly.</p> </li> <li> <p><strong>Just change the culture.</strong> Some leaders frame most problems as cultural problems, which is a reasonable frame: most things can be usefully viewed as a cultural problem. Unfortunately, it&rsquo;s common for those who rely heavily on the cultural frame to also have a simplistic view about how culture is changed.</p> <p>Changing an organization&rsquo;s culture is tricky, and requires a combination of many techniques to create visible leaders role modeling the new behavior, and reinforcement mechanisms to ensure pockets of dissent are weeded out. Anyone who frames culture change as a simple or instant change is living in an imaginary world.</p> </li> </ol> <p>If you&rsquo;re using one of these approaches, it isn&rsquo;t necessarily a bad choice. Instead, you should just make sure you can explain why you&rsquo;re using it, and then you need to also make sure you believe that explanation. If you don&rsquo;t, look for a mechanism from the earlier</p> <h2 id="what-if-youre-not-an-executive">What if you&rsquo;re not an executive?</h2> <p>It&rsquo;s easy to get discouraged when you think about which operational mechanisms are available to you as a non-executive. So many of the frequently seen mechanisms like running mandatory recurring meetings, or a binding architecture review process are not accessible to you.</p> <p>That is true: they&rsquo;re not accessible to you. However, there&rsquo;s always a related mechanism that can be implemented with less authority. The binding architecture process can be replaced with an architectural advice process. The mandatory review of pull requests can be replaced with a nudge.</p> <p>Although it may be more common to see the authoritative mechanisms in the companies you work in, my experience working as an executive is that these authoritative mechanisms don&rsquo;t work particularly well. They do a great job of technically shifting accountability to the wider organization, but they often don&rsquo;t change behavior at all. So, instead of getting frustrated by what you can&rsquo;t do, focus instead on the mechanisms that are available to you today. Add nudges, focus on the real dynamics of how colleagues do work in your organization, and build a real dataset.</p> <p>It&rsquo;s very hard to get an executive to support your initiative before the mechanisms and data exist to support it, and very easy to get their support once they do. Once you&rsquo;ve done what you can without authority to build confidence, if you really do need more authority, then you&rsquo;re in a good place to escalate to get an executive to support your policies.</p> <h2 id="beware-cargo-culting">Beware cargo-culting</h2> <p>The longer that I am in the industry, the more I am surprised by how few strategists seem to care if their approach actually works. Instead, they seem focused on doing something that <em>might</em> work, offloading accountability to either the organization or some team, and then moving off to the next problem.</p> <p>Perhaps this is driven by an unfortunate reality that leaders are often evaluated by how they appear, rather than by what they accomplish. Whether or not that&rsquo;s the underlying reason for why it happens, it does make it surprisingly difficult to know which patterns to borrow from strategy rollouts and implementations.</p> <p>The best advice, unfortunately, is to remain skeptically optimistic. Collect ideas widely, but force the ideas to prove their merit.</p> <h2 id="summary">Summary</h2> <p>Now that you&rsquo;ve finished this chapter, you&rsquo;re significantly more qualified to write a complete, useful strategy than I was a decade into my career. Often skipped, the operations behind your strategy are at least as essential as any other step, and any strategy without them will fade quietly into your organization&rsquo;s history.</p> <p>In addition to being able to rollout a strategy of your own, this chapter also provides a useful rescue toolkit you can use to put an existing, floundering strategy back on track. If you don&rsquo;t see an opportunity to write new strategy within your organization, then there&rsquo;s still probably room to flex your operational skill.</p>Career advice in 2025.https://lethain.com/career-advice-2025/Sat, 15 Mar 2025 04:00:00 -0700https://lethain.com/career-advice-2025/<p>Yesterday, the tj-actions repository, a popular tool used with Github Actions was compromised (for more background read <a href="https://www.stepsecurity.io/blog/harden-runner-detection-tj-actions-changed-files-action-is-compromised">one</a> of these <a href="https://semgrep.dev/blog/2025/popular-github-action-tj-actionschanged-files-is-compromised/">two</a> articles). Watching the infrastructure and security engineering teams at Carta respond, it highlighted to me just how much LLMs can’t meaningfully replace many essential roles of software professionals. However, I’m also reading Jennifer Palkha’s <a href="https://www.recodingamerica.us/">Recoding America</a>, which makes an important point: decision-makers can remain irrational longer than you can remain solvent. (Or, in this context, remain employed.)</p> <p>I’ve been thinking about this a lot lately, as I’ve ended up having more “2025 is not much fun”-themed career discussions with prior colleagues navigating the current job market. I’ve tried to pull together my points from those conversations here:</p> <ol> <li> <p>Many people who first entered senior roles in 2010-2020 are finding current roles a lot less fun. There are a number of reasons for this. First, managers were generally evaluated in that period based on their ability to hire, retain and motivate teams. The current market doesn’t value those skills particularly highly, but instead prioritizes a different set of skills: working in the details, pushing pace, and navigating the technology transition to foundational models / LLMs.</p> <p>This means many members of the current crop of senior leaders are either worse at the skills they currently need to succeed, or are less motivated by those activities. Either way, they’re having less fun.</p> <p>Similarly, the would-be senior leaders from 2010-2020 era who excelled at working in the details, pushing pace and so on, are viewed as stagnate in their careers so are still finding it difficult to move into senior roles. This means that many folks feel like the current market has left them behind. This is, of course, not universal. It is a <em>general</em> experience that <em>many</em> people are having. Many people are not having this experience.</p> </li> <li> <p>The technology transition to Foundational models / LLMs as a core product and development tool is causing many senior leaders’ hard-earned playbooks to be invalidated. Many companies that were stable, durable market leaders are now in tenuous positions because foundational models threaten to erode their advantage. Whether or not their advantage is truly eroded is uncertain, but it is clear that usefully adopting foundational models into a product requires more than simply shoving an OpenAI/Anthropic API call in somewhere.</p> <p>Instead, you have to figure out how to design with progressive validation, with critical data validated via human-in-the-loop techniques before it is used in a critical workflow. It also requires designing for a rapidly improving toolkit: many workflows that were laughably bad in 2023 work surprisingly well with the latest reasoning models. Effective product design requires architecting for both massive improvement, and no improvement at all, of models in 2026-2027.</p> <p>This is equally true of writing software itself. There’s so much noise about how to write software, and much of it’s clearly propaganda–this blog’s opening anecdote regarding the tj-actions repository prove that expertise remains essential–but parts of it aren’t. I spent a <a href="https://lethain.com/our-own-agents-our-own-tools/">few weeks in the evenings working on a new side project via Cursor in January</a>, and I was surprised at how much my workflow changed even through Cursor itself was far from perfect. Even since then, Claude has advanced from 3.5 to 3.7 with extended thinking. Again, initial application development might easily be radically different in 2027, or it might be largely unchanged after the scaffolding step in complex codebases. (I’m also curious to see if context window limitations drive another flight from monolithic architectures.)</p> <p>Sitting out this transition, when we are relearning how to develop software, feels like a high risk proposition. Your well-honed skills in team development are already devalued today relative to three years ago, and now your other skills are at risk of being devalued as well.</p> </li> <li> <p>Valuations and funding are relatively less accessible to non-AI companies than they were three years ago. Certainly elite companies are doing alright, whether or not they have a clear AI angle, but the cutoff for remaining elite has risen. Simultaneously, the public markets are challenged, which means less willingness for both individuals and companies to purchase products, which slows revenue growth, further challenging valuations and funding.</p> <p>The consequence of this if you’re at a private, non-AI company, is that you’re likely to hire less, promote less, see less movement in pay bands, and experience a less predictable path to liquidity. It also means fewer open roles at other companies, so there’s more competition when attempting to trade up into a larger, higher compensated role at another company.</p> <p>The major exception to this is joining an AI company, but generally those companies are in extremely competitive markets and are priced more appropriately for investors managing a basket of investments than for employees trying to deliver a predictable return. If you join one of these companies today, you’re probably joining a bit late to experience a big pop, your equity might go to zero, and you’ll be working extremely hard for the next five to seven years. This is the classic startup contract, but not necessarily the contract that folks have expected over the past decade as maximum compensation has generally come from joining a later-stage company or member of the Magnificent Seven.</p> </li> <li> <p>As companies respond to the reduced valuations and funding, they are pushing their teams harder to find growth with their existing team. In the right environment, this can be motivating, but people may have opted into to a more relaxed experience that has become markedly less relaxed without their consent.</p> </li> </ol> <p>If you pull all those things together, you’re essentially in a market where <a href="https://lethain.com/forty-year-career/">profit and pace are fixed</a>, and you have to figure out how you personally want to optimize between people, prestige and learning. Whereas a few years ago, I think these variables were much more decoupled, that is not what I hear from folks today, even if their jobs were quite cozy a few years ago.</p> <p>Going a bit further, I know folks who are good at their jobs, and have been struggling to find something meaningful for six-plus months. I know folks who are exceptionally strong candidates, who can find reasonably good jobs, but even they are finding that the sorts of jobs they want simply don’t exist right now. I know folks who are strong candidates but with some oddities in their profile, maybe too many short stints, who are now being filtered out because hiring managers need some way to filter through the higher volume of candidates.</p> <p>I can’t give advice on what <em>you</em> should do, but if you’re finding this job market difficult, it’s certainly not personal. My sense is that’s basically the experience that everyone is having when searching for new roles right now. If you are in a role today that’s frustrating you, my advice is to try harder than usual to find a way to make it a rewarding experience, even if it’s not perfect. I also wouldn’t personally try to sit this cycle out unless you’re comfortable with a small risk that reentry is quite difficult: I think it’s more likely that the ecosystem is meaningfully different in five years than that it’s largely unchanged.</p> <p>Altogether, this hasn&rsquo;t really been the advice that anyone wanted when they chatted with me, but it seems to generally have resonated with them as a realistic appraisal of the current markets. Hopefully there&rsquo;s something useful for you in here as well.</p>Setting policy for strategy.https://lethain.com/policy-for-strategy/Thu, 13 Mar 2025 04:00:00 -0700https://lethain.com/policy-for-strategy/<p>This book&rsquo;s introduction started by defining strategy as &ldquo;making decisions.&rdquo; Then we dug into <a href="https://lethain.com/exploring-for-strategy/">exploration</a>, <a href="https://lethain.com/diagnosis-for-strategy">diagnosis</a>, and <a href="https://lethain.com/refining-eng-strategy/">refinement</a>: three chapters where you could argue that we didn&rsquo;t decide anything at all. Clarifying the problem to be solved is the prerequisite of effective decision making, but eventually decisions do have to be made. Here in this chapter on policy, and the <a href="https://lethain.com/operations-for-strategy/">following chapter on operations</a>, we finally start to actually make some decisions.</p> <p>In this chapter, we&rsquo;ll dig into:</p> <ul> <li>How we define policy, and how setting policy differs from operating policy as discussed in the next chapter</li> <li>The structured steps for setting policy</li> <li>How many policies should you set? Is it preferable to have one policy, many policies, or does it not matter much either way?</li> <li>Recurring kinds of policies that appear frequently in strategies</li> <li>Why it&rsquo;s valuable to be intentional about your strategy&rsquo;s altitude, and how engineers and executives generally maintain different altitudes in their strategies</li> <li>Criteria to use for evaluating whether your policies are likely to be impactful</li> <li>How to develop novel policies, and why it&rsquo;s rare</li> <li>Why having multiple bundles of alternative policies is generally a phase in strategy development that indicates a gap in your diagnosis</li> <li>How policies that ignore constraints sound inspirational, but accomplish little</li> <li>Dealing with ambiguity and uncertainty created by missing strategies from cross-functional stakeholders</li> </ul> <p>By the end, you&rsquo;ll be ready to evaluate why an existing strategy&rsquo;s policies are struggling to make an impact, and to start iterating on policies for strategy of your own.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="what-is-policy">What is policy?</h2> <p>Policy is interpreting your <a href="https://lethain.com/diagnosis-for-strategy/">diagnosis</a> into a concrete plan. That plan will be a collection of decisions, tradeoffs, and approaches. They&rsquo;ll range from coding practices, to hiring mandates, to architectural decisions, to guidance about how choices are made within your organization.</p> <p>An effective policy solves the entirety of the strategy&rsquo;s diagnosis, although the diagnosis itself is encouraged to specify which aspects can be ignored. For example, the <a href="https://lethain.com/private-equity-strategy/">strategy for working with private equity ownership</a> acknowledges in its diagnosis that they don&rsquo;t have clear guidance on what kind of reduction to expect:</p> <blockquote> <p>Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&amp;D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction.</p></blockquote> <p>Faced with that uncertainty, the policy simply acknowledges the ambiguity and commits to reconsider when more information becomes available:</p> <blockquote> <p>We believe our new ownership will provide a specific target for Research and Development (R&amp;D) operating expenses during the upcoming financial year planning. We will revise these policies again once we have explicit targets, and will delay planning around reductions until we have those numbers to avoid running two overlapping processes.</p></blockquote> <p>There are two frequent points of confusion when creating policies that are worth addressing directly:</p> <ol> <li> <p>Policy is a subset of strategy, rather than the entirety of strategy, because policy is only meaningful in the context of the strategy&rsquo;s diagnosis. For example, the <a href="https://lethain.com/engineering-cost-model/">&ldquo;N-1 backfill policy&rdquo;</a> makes sense in the context of <a href="https://lethain.com/private-equity-strategy/">new, private equity ownership</a>. The policy wouldn&rsquo;t work well in a rapidly expanding organization.</p> <p>Any strategy without a policy is useless, but you&rsquo;ll also find policies without context aren&rsquo;t worth much either. This is particularly unfortunate, because so often strategies are communicated without those critical sections.</p> </li> <li> <p>Policy describes how tradeoffs should be made, but it doesn&rsquo;t verify how the tradeoffs are actually being made in practice. The next chapter on operations covers how to inspect an organization&rsquo;s behavior to ensure policies are followed.</p> <p>When reworking a strategy <a href="https://lethain.com/readable-engineering-strategy-documents/">to be more readable</a>, it often makes sense to merge policy and operation sections together. However, when drafting strategy it&rsquo;s valuable to keep them separate. Yes, you <em>might</em> use a weekly meeting to review whether the policy is being followed, but whether it&rsquo;s an effective policy is independent of having such a meeting, and what operational mechanisms you use will vary depending on the number of policies you intend to implement.</p> </li> </ol> <p>With this definition in mind, now we can move onto the more interesting discussion of how to set policy.</p> <h2 id="how-to-set-policy">How to set policy</h2> <p>Every part of writing a strategy feels hard when you&rsquo;re doing it, but I personally find that writing policy either feels uncomfortably easy or painfully challenging. It&rsquo;s never a happy medium. Fortunately, the exploration and diagnosis usually come together to make writing your policy simple: although sometimes that simple conclusion may be a difficult one to swallow.</p> <p>The steps I follow to write a strategy&rsquo;s policy are:</p> <ol> <li> <p><strong>Review diagnosis</strong> to ensure it captures the most important themes. It doesn&rsquo;t need to be perfect, but it shouldn&rsquo;t have omissions so obvious that you can immediately identify them.</p> </li> <li> <p><strong>Select policies</strong> that address the diagnosis. Explicitly match each policy to one or more diagnoses that it addresses. Continue adding policies until every diagnosis is covered.</p> <p>This is a broad instruction, but it&rsquo;s simpler than it sounds because you&rsquo;ll typically select from policies <a href="https://lethain.com/exploring-for-strategy/">identified during your exploration phase</a>. However, there certainly is space to tweak those policies, and to reapply familiar policies to new circumstances.</p> <p>If you do find yourself developing a novel policy, there&rsquo;s a later section in this chapter, <em>Developing novel policies</em>, that addresses that topic in more detail.</p> </li> <li> <p><strong>Consolidate policies</strong> in cases where they overlap or adjoin. For example, two policies about specific teams might be generalized into a policy about all teams in the engineering organization.</p> </li> <li> <p><strong>Backtest policy</strong> against recent decisions you&rsquo;ve made. This is particularly effective if you maintain a <a href="https://infraeng.dev/decision-log/">decision log</a> in your organization.</p> </li> <li> <p><strong>Mine for conflict</strong> once again, much as you did in developing your diagnosis. Emphasize feedback from teams and individuals with a different perspective than your own, but don&rsquo;t wholly eliminate those that you agree with. Just as it&rsquo;s easy to crowd out opposing views in diagnosis if you don&rsquo;t solicit their input, it&rsquo;s possible to accidentally crowd out your own perspective if you anchor too much on others&rsquo; perspectives.</p> </li> <li> <p><strong>Consider refinement</strong> if you finish writing, and you just aren&rsquo;t sure your approach works &ndash; that&rsquo;s fine! Return to the refinement phase by deploying <a href="https://lethain.com/refining-eng-strategy/">one of the refinement techniques</a> to increase your conviction. Remember that we <em>talk</em> about strategy like it&rsquo;s done in one pass, but almost all real strategy takes many refinement passes.</p> </li> </ol> <p>The steps of writing policy are relatively pedestrian, largely because you&rsquo;ve done so much of the work already in the exploration, diagnosis, and refinement steps. If you skip those phases, you&rsquo;d likely follow the above steps for writing policy, but the expected quality of the policy itself would be far lower.</p> <h2 id="how-many-policies">How many policies?</h2> <p>Addressing the entirety of the diagnosis is often complex, which is why most strategies feature a set of policies rather than just one. The <a href="https://lethain.com/decompose-monolith-strategy/">strategy for decomposing a monolithic application</a> is not one policy deciding not to decompose, but a series of four policies:</p> <ol> <li>Business units should always operate in their own code repository and monolith.</li> <li>New integrations across business unit monoliths should be done using gRPC.</li> <li>Except for new business unit monoliths, we don’t allow new services.</li> <li>Merge existing services into business-unit monoliths where you can.</li> </ol> <p>Four isn&rsquo;t universally the right number either. It&rsquo;s simply the number that was required to solve that strategy&rsquo;s diagnosis. With an excellent diagnosis, your policies will often feel inevitable, and perhaps even boring. That&rsquo;s great: what makes a policy good is that it&rsquo;s effective, not that it&rsquo;s novel or inspiring.</p> <h2 id="kinds-of-policies">Kinds of policies</h2> <p>While there are <em>so many</em> policies you can write, I&rsquo;ve found they generally fall into one of four major categories: approvals, allocations, direction, and guidance. This section introduces those categories.</p> <p><strong>Approvals</strong> define the process for making a recurring decision. This might require invoking an architecture advice process, or it might require involving an authority figure like an executive.</p> <p>In the <a href="https://lethain.com/pos-acquisition-integration/">Index post-acquisition integration strategy</a>, there were a number of complex decisions to be made, and the approval mechanism was:</p> <blockquote> <p>Escalations come to paired leads: given our limited shared context across teams, all escalations must come to both Stripe’s Head of Traffic Engineering and Index’s Head of Engineering.</p></blockquote> <p>This allowed the acquired and acquiring teams to start building trust between each other by ensuring both were consulted before any decision was finalized. On the other hand, the <a href="https://lethain.com/user-data-access-strategy/">user data access strategy</a>&rsquo;s approval strategy was more focused on managing corporate risk:</p> <blockquote> <p><strong>Exceptions must be granted in writing by CISO.</strong> While our overarching Engineering Strategy states that we follow an advisory architecture process as described in <em>Facilitating Software Architecture</em>, the customer data access policy is an exception and must be explicitly approved, with documentation, by the CISO. Start that process in the #ciso channel.</p></blockquote> <p>These two different approval processes had different goals, so they made tradeoffs differently. There are so many ways to tweak approval, allowing for many different tradeoffs between safety, productivity, and trust.</p> <p><strong>Allocations</strong> describe how resources are split across multiple potential investments. Allocations are the most concrete statement of organizational priority, and also articulate the organization&rsquo;s belief about how productivity happens in teams. Some companies believe you go fast by swarming more people onto critical problems. Other companies believe you go fast by forcing teams to solve problems without additional headcount. Both can work, and teach you something important about the company&rsquo;s beliefs.</p> <p>The strategy on <a href="https://lethain.com/uber-service-migration-strategy/">Uber&rsquo;s service migration</a> has two concrete examples of allocation policies. The first describes the Infrastructure engineering team&rsquo;s allocation between manual provision tasks and investing into creating a self-service provisioning platform:</p> <blockquote> <p><strong>Constrain manual provisioning allocation to maximize investment in self-service provisioning.</strong> The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks. We will move the remaining engineers to work on automation to speed up future service provisioning. This will degrade manual provisioning in the short term, but the alternative is permanently degrading provisioning by the influx of new service requests from newly hired product engineers.</p></blockquote> <p>The second allocation policy is implicitly noted in this strategy&rsquo;s diagnosis, where it describes the allocation policy in the Engineering organization&rsquo;s higher altitude strategy:</p> <blockquote> <p>Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing.</p></blockquote> <p>Allocation policies often create a surprising amount of clarity for the team, and I include them in almost every policy I write either explicitly, or implicitly in a higher altitude strategy.</p> <p><strong>Direction</strong> provides explicit instruction on how a decision <em>must</em> be made. This is the right tool when you know where you want to go, and exactly the way that you want to get there. Direction is appropriate for problems you understand clearly, and you value consistency more than empowering individual judgment.</p> <p>Direction works well when you need an unambiguous policy that doesn&rsquo;t leave room for interpretation. For example, <a href="https://lethain.com/calm-product-eng-company/">Calm&rsquo;s policy for working in the monolith</a>:</p> <blockquote> <p>We write all code in the monolith. It has been ambiguous if new code (especially new application code) should be written in our JavaScript monolith, or if all new code must be written in a new service outside of the monolith. This is no longer ambiguous: all new code must be written in the monolith.</p> <p>In the rare case that there is a functional requirement that makes writing in the monolith implausible, then you should seek an exception as described below.</p></blockquote> <p>In that case, the team couldn&rsquo;t agree on what should go into the monolith. Individuals would often make incompatible decisions, so creating consistency required removing personal judgment from the equation.</p> <p>Sometimes judgment is the issue, and sometimes consistency is difficult due to misaligned incentives. A good example of this comes in <a href="https://lethain.com/private-equity-strategy/">strategy on working with new Private Equity ownership</a>:</p> <blockquote> <p>We will move to an “N-1” backfill policy, where departures are backfilled with a less senior level. We will also institute a strict maximum of one Principal Engineer per business unit.</p></blockquote> <p>It&rsquo;s likely that hiring managers would simply ignore this backfill policy if it was stated more softly, although sometimes less forceful policies are useful.</p> <p><strong>Guidance</strong> provides a recommendation about how a decision <em>should</em> be made. Guidance is useful when there&rsquo;s enough nuance, <a href="https://lethain.com/navigating-ambiguity/">ambiguity</a>, or complexity that you <em>can</em> explain the desired destination, but you <em>can&rsquo;t</em> mandate the path to reaching it.</p> <p>One example of guidance comes from the <a href="https://lethain.com/pos-acquisition-integration/">Index acquisition integration strategy</a>:</p> <blockquote> <p><strong>Minimize changes to tokenization environment</strong>: because point-of-sale devices directly work with customer payment details, the API that directly supports the point-of-sale device must live within our secured environment where payment details are stored.</p> <p>However, any other functionality must not be added to our tokenization environment.</p></blockquote> <p>This might read like direction, but it&rsquo;s clarifying the desired outcome of avoiding unnecessary complexity in the tokenization environment. However, it&rsquo;s not able to articulate what complexity is necessary, so ultimately it&rsquo;s guidance because it requires significant judgment to interpret.</p> <p>A second example of guidance comes in the <a href="https://lethain.com/decompose-monolith-strategy/">strategy on decomposing a monolithic codebase</a>:</p> <blockquote> <p><strong>Merge existing services into business-unit monoliths where you can.</strong> We believe that each choice to move existing services back into a monolith should be made “in the details” rather than from a top-down strategy perspective. Consequently, we generally encourage teams to wind down their existing services outside of their business unit’s monolith, but defer to teams to make the right decision for their local context.</p></blockquote> <p>This is another case of knowing the desired outcome, but encountering too much uncertainty to direct the team on how to get there. If you ask five engineers about whether it&rsquo;s possible to merge a given service back into a monolithic codebase, they&rsquo;ll probably disagree. That&rsquo;s fine, and highlights the value of guidance: it makes it possible to make incremental progress in areas where more concrete direction would cause confusion.</p> <p>When you&rsquo;re working on a strategy&rsquo;s policy section, it&rsquo;s important to consider all of these categories. Which feel most natural to use will vary depending on your team and role, but they&rsquo;re all usable:</p> <ul> <li>If you&rsquo;re a developer productivity team, you might have to lean heavily on guidance in your policies and increased support for that guidance within the details of your platform.</li> <li>If you&rsquo;re an executive, you might lean heavily on direction. Indeed, you might lean <em>too</em> heavily on direction, where guidance often works better for areas where you understand the direction but not the path.</li> <li>If you&rsquo;re a product engineering organization, you might have to narrow the scope of your direction to the engineers within that organization to deal with the realities of complex cross-organization dynamics.</li> </ul> <p>Finally, if you have a clear approach you want to take that doesn&rsquo;t fit cleanly into any of these categories, then don&rsquo;t let this framework dissuade you. Give it a try, and adapt if it doesn&rsquo;t initially work out.</p> <h2 id="maintaining-strategy-altitude">Maintaining strategy altitude</h2> <p>The chapter on <a href="https://lethain.com/when-write-down-engineering-strategy/">when to write engineering strategy</a> introduced the concept of strategy altitude, which is being deliberate about where certain kinds of policies are created within your organization.</p> <p>Without repeating that section in its entirety, it&rsquo;s particularly relevant when you set policy to consider how your new policies eliminate flexibility within your organization. Consider these two somewhat opposing strategies:</p> <ul> <li><a href="https://lethain.com/stripe-sorbet/">Stripe&rsquo;s Sorbet strategy</a> only worked in an organization that enforced the use of a single programming language across (essentially) all teams</li> <li><a href="https://lethain.com/uber-service-migration-strategy/">Uber&rsquo;s service migration strategy</a> worked well in an organization that was unwilling to enforce consistent programming language adoption across teams</li> </ul> <p>Stripe&rsquo;s organization-altitude policy took away the freedom of individual teams to select their preferred technology stack. In return, they unlocked the ability to centralize investment in a powerful way. Uber went the opposite way, unlocking the ability of teams to pick their preferred technology stack, while significantly reducing their centralized teams&rsquo; leverage.</p> <p>Both altitudes make sense. Both have consequences.</p> <h2 id="criteria-for-effective-policies">Criteria for effective policies</h2> <p>In <em><a href="https://www.amazon.com/Engineering-Executives-Primer-Impactful-Leadership/dp/1098149483/">The Engineering Executive&rsquo;s Primer</a></em>&rsquo;s chapter on <a href="https://lethain.com/eng-strategies/">engineering strategy</a>, I introduced three criteria for evaluating policies. They ought to be applicable, enforced, and create leverage. Defining those a bit:</p> <ol> <li><strong>Applicable</strong>: it can be used to navigate complex, real scenarios, particularly when making tradeoffs.</li> <li><strong>Enforced</strong>: teams will be held accountable for following the guiding policy.</li> <li><strong>Create Leverage</strong>: create compounding or multiplicative impact.</li> </ol> <p>The last of these three, create leverage, made sense in the context of a book about engineering executives, but probably doesn&rsquo;t make as much sense here. Some policies certainly should create leverage (e.g. <a href="https://lethain.com/decompose-monolith-strategy/">empower developer experience team by restricting new services</a>), but others might not (e.g. <a href="https://lethain.com/private-equity-strategy/">moving to an N-1 backfill policy</a>). Outside the executive context, what&rsquo;s important isn&rsquo;t necessarily creating leverage, but that a policy solves for part of the diagnosis.</p> <p>That leaves the other two&ndash;being applicable and enforced&ndash;both of which are necessary for a policy to actually address the diagnosis. Any policy which you can&rsquo;t determine how to apply, or aren&rsquo;t willing to enforce, simply won&rsquo;t be useful.</p> <p>Let&rsquo;s apply these criteria to a handful of potential policies. First let&rsquo;s think about policies we might write to improve the talent density of our engineering team:</p> <ul> <li><strong>&ldquo;We only hire world-class engineers.&rdquo;</strong> This isn&rsquo;t applicable, because it&rsquo;s unclear what a world-class engineer means. Because there&rsquo;s no mutually agreeable definition in this policy, it&rsquo;s also not consistently enforceable.</li> <li><strong>&ldquo;We only hire engineers that get at least one &lsquo;strong yes&rsquo; in scorecards.&rdquo;</strong> This is applicable, because there&rsquo;s a clear definition. This is enforceable, depending on the willingness of the organization to reject seemingly good candidates who don&rsquo;t happen to get a strong yes.</li> </ul> <p>Next, let&rsquo;s think about a policy regarding code reuse within a codebase:</p> <ul> <li> <p><strong>&ldquo;We follow a strict Don&rsquo;t Repeat Yourself policy in our codebase.&rdquo;</strong> There&rsquo;s room for debate within a team about whether two pieces of code are truly duplicative, but this is generally applicable. Because there&rsquo;s room for debate, it&rsquo;s a very context specific determination to decide how to enforce a decision.</p> </li> <li> <p><strong>&ldquo;Code authors are responsible for determining if their contributions violate Don&rsquo;t Repeat Yourself, and rewriting them if they do.&rdquo;</strong> This is much more applicable, because now there&rsquo;s only a single person&rsquo;s judgment to assess the potential repetition. In some ways, this policy is also more enforceable, because there&rsquo;s no longer any ambiguity around who is deciding whether a piece of code is a repetition.</p> <p>The challenge is that enforceability now depends on one individual, and making this policy effective will require holding individuals accountable for the quality of their judgement. An organization that&rsquo;s unwilling to distinguish between good and bad judgment won&rsquo;t get any value out of the policy. This is a good example of how a good policy in one organization might become a poor policy in another.</p> </li> </ul> <p>If you ever find yourself wanting to include a policy that for some reason either can&rsquo;t be applied or can&rsquo;t be enforced, stop to ask yourself what you&rsquo;re trying to accomplish and ponder if there&rsquo;s a different policy that might be better suited to that goal.</p> <h2 id="developing-novel-policies">Developing novel policies</h2> <p>My experience is that there are vanishingly few truly novel policies to write. There&rsquo;s almost always someone else has already done something similar to your intended approach. <a href="https://lethain.com/calm-product-eng-company/">Calm&rsquo;s engineering strategy</a> is such a case: the details are particular to the company, but the general approach is common across the industry.</p> <p>The most likely place to find truly novel policies is during the adoption phase of a new widespread technology, such as the rise of ubiquitous mobile phones, cloud computing, or large language models. Even then, as explored in <a href="https://lethain.com/llm-adoption-strategy/">the strategy for adopting large-language models</a>, the new technology can be engaged with as a generic technology:</p> <blockquote> <p><strong>Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets.</strong> Through modeling our driver lifecycle, we determined that improving onboarding time will have little impact on the total number of active drivers. Instead, we are focusing on mechanisms to reactivate departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers.</p></blockquote> <p>You could simply replace &ldquo;LLM&rdquo; with &ldquo;data-driven&rdquo; and it would be equally readable. In this way, policy can generally sidestep areas of uncertainty by being a bit abstract. This avoids being overly specific about topics you simply don&rsquo;t know much about.</p> <p>However, even if your policy isn&rsquo;t novel to the industry, it might still be novel to you or your organization. The steps that I&rsquo;ve found useful to debug novel policies are the same steps as running a condensed version of the strategy process, with a focus on exploration and refinement:</p> <ol> <li>Collect a number of <em>similar</em> policies, with a focus on how those policies differ from the policy you are creating</li> <li>Create a <a href="https://lethain.com/strategy-systems-modeling/">systems model</a> to articulate how this policy will work, and also how it will differ from the similar policies you&rsquo;re considering</li> <li>Run a <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a> cycle for your proto-policy to discover any unknown-unknowns about how it works in practice</li> </ol> <p>Whether you run into this scenario is largely a function of the extent of your, and your organization&rsquo;s, experience. Early in my career, I found myself doing novel (for me) strategy work very frequently, and these days I rarely find myself doing novel work, instead focusing on adaptation of well-known policies to new circumstances.</p> <h2 id="are-competing-policy-proposals-an-anti-pattern">Are competing policy proposals an anti-pattern?</h2> <p>When creating policy, you&rsquo;ll often have to engage with the question of whether you should develop one preferred policy or a series of potential strategies to pick from. Developing these is a useful stage of setting policy, but rather than helping you refine your policy, I&rsquo;d encourage you to think of this as exposing gaps in your diagnosis.</p> <p>For example, <a href="https://lethain.com/stripe-sorbet/">when Stripe developed the Sorbet ruby-typing tooling</a>, there was debate between two policies:</p> <ol> <li>Should we build a ruby-typing tool to allow a centralized team to gradually migrate the company to a typed codebase?</li> <li>Should we migrate the codebase to a preexisting strongly typed language like Golang or Java?</li> </ol> <p>These were, initially, equally valid hypotheses. It was only by clarifying our diagnosis around resourcing that it became clear that incurring the bulk of costs in a centralized team was clearly preferable to spreading the costs across many teams. Specifically, recognizing that we wanted to prioritize short-term product engineering velocity, even if it led to a longer migration overall.</p> <p>If you do develop multiple policy options, I encourage you to move the alternatives into an appendix rather than <a href="https://lethain.com/readable-engineering-strategy-documents/">including them in the core of your strategy document</a>. This will make it easier for readers of your final version to understand how to follow your policies, and they are the most important long-term user of your written strategy.</p> <h2 id="recognizing-constraints">Recognizing constraints</h2> <p>A similar problem to competing solutions is developing a policy that you cannot possibly fund. It&rsquo;s easy to get enamored with policies that you can&rsquo;t meaningfully enforce, but that&rsquo;s bad policy, even if it would work in an alternate universe where it was possible to enforce or resource it.</p> <p>To consider a few examples:</p> <ul> <li>The <a href="https://lethain.com/user-data-access-strategy/">strategy for controlling access to user data</a> might have proposed requiring manual approval by a second party of every access to customer data. However, that would have gone nowhere.</li> <li>Our <a href="https://lethain.com/uber-service-migration-strategy/">approach to Uber&rsquo;s service migration</a> might have required more staffing for the infrastructure engineering team, but we knew that wasn&rsquo;t going to happen, so it was a meaningless policy proposal to make.</li> <li>The strategy for <a href="https://lethain.com/private-equity-strategy/">navigating private equity ownership</a> might have argued that new ownership should not hold engineering accountable to a new standard on spending. But they would have just invalidated that strategy in the next financial planning period.</li> </ul> <p>If you find a policy that contemplates an impractical approach, it doesn&rsquo;t <em>only</em> indicate that the policy is a poor one, it also suggests your policy is missing an important pillar. Rather than debating the policy options, the fastest path to resolution is to align on the diagnosis that would invalidate potential paths forward.</p> <p>In cases where aligning on the diagnosis isn&rsquo;t possible, for example because you simply don&rsquo;t understand the possibilities of a new technology as encountered in the <a href="https://lethain.com/llm-adoption-strategy/">strategy for adopting LLMs</a>, then you&rsquo;ve typically found a valuable opportunity to use <a href="https://lethain.com/refining-eng-strategy/">strategy refinement</a> to build alignment.</p> <h2 id="dealing-with-missing-strategies">Dealing with missing strategies</h2> <p>At a recent company offsite, we were debating which policies we might adopt to deal with annual plans that kept getting derailed after less than a month. Someone remarked that this would be much easier if we could get the executive team to commit to a clearer, written strategy about which business units we were prioritizing.</p> <p>They were, of course, right. It would be much easier. Unfortunately, it goes back to the problem we discussed in the <a href="https://lethain.com/diagnosis-for-strategy/">diagnosis chapter</a> about reframing blockers into diagnosis. If a strategy from the company or a peer function is missing, the empowering thing to do is to include the absence in your diagnosis and move forward.</p> <p>Sometimes, even when you do this, it&rsquo;s easy to fall back into the belief that you cannot set a policy because a peer function might set a conflicting policy in the future. Whether you&rsquo;re an executive or an engineer, you&rsquo;ll never have the details you want to make the ideal policy. Meaningful leadership requires taking meaningful risks, which is never something that gets comfortable.</p> <h2 id="summary">Summary</h2> <p>After working through this chapter, you know how to develop policy, how to assemble policies to solve your diagnosis, and how to avoid a number of the frequent challenges that policy writers encounter. At this point, there&rsquo;s only one phase of strategy left to dig into, <a href="https://lethain.com/operations-for-strategy/">operating the policies you&rsquo;ve created</a>.</p>Who gets to do strategy?https://lethain.com/who-gets-to-do-strategy/Thu, 06 Mar 2025 04:00:00 -0700https://lethain.com/who-gets-to-do-strategy/<p>If you talk to enough aspiring leaders, you&rsquo;ll become familiar with the prevalent idea that they need to be promoted before they can work on strategy. It&rsquo;s a truism, but I&rsquo;ve also found this idea perfectly wrong: you can work on strategy from anywhere in an organization, it just requires different tactics to do so.</p> <p>Both <em>Staff Engineer</em> and <em>The Engineering Executive&rsquo;s Primer</em> have chapters on strategy. While the chapters&rsquo; contents are quite different, both present a practical path to advancing your organization&rsquo;s thinking about complex topics. This chapter explains my belief that <em>anyone</em> within an organization can make meaningful progress on strategy, particularly if you are honest about the tools accessible to you, and thoughtful about how to use them.</p> <p>The themes we&rsquo;ll dig into are:</p> <ul> <li>How to do strategy as an engineer, particularly an engineer who hasn&rsquo;t been given explicit authority to do strategy</li> <li>Doing strategy as an engineering executive who is responsible for your organization&rsquo;s decision-making</li> <li>How you can do engineering strategy even when you depend on an absent strategy, cannot acknowledge parts of the diagnosis because addressing certain problems is politically sensitive, or struggle with pockets of misaligned incentives</li> <li>If this book&rsquo;s argument is that everyone should do strategy, is there anyone who, nonetheless, really should not do strategy?</li> </ul> <p>By the end, you&rsquo;ll hopefully agree that engineering strategy is accessible to everyone, even though you&rsquo;re always operating within constraints.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="doing-strategy-as-an-engineer">Doing strategy as an engineer</h2> <p>It&rsquo;s easy to get so distracted by executive&rsquo;s top-down approach to strategy that you convince yourself that there aren&rsquo;t other approachable mechanisms to doing strategy. There are!</p> <p><em>Staff Engineer</em> introduces an approach I call <a href="https://staffeng.com/guides/engineering-strategy/">Take five, then synthesize</a>, which does strategy by:</p> <ol> <li>Documenting how five current and historical related decisions have been made in your organization. This is an extended exploration phase</li> <li>Synthesizing those five documents into a diagnosis and policy. You are naming the implicit strategy, so it&rsquo;s impossible for someone to reasonably argue you&rsquo;re not empowered to do strategy: you&rsquo;re just describing what&rsquo;s already happening</li> </ol> <p>At that point, either the organization feels comfortable with what you&rsquo;ve written&ndash;which is their current strategy&ndash;or it doesn&rsquo;t in which case you&rsquo;ve forced a conversation about how to revise the approach. Creating awareness is often enough to drive strategic change, and doesn&rsquo;t require any explicit authorization from an executive to do.</p> <p>When awareness is insufficient, the other pattern I&rsquo;ve found highly effective in low-authority scenarios is an approach I wrote about in <em>An Elegant Puzzle</em>, and call <a href="https://lethain.com/model-document-share/">model, document, and share</a>:</p> <ol> <li>Model the approach you want others to adopt. Make it easy for them to observe how you&rsquo;ve changed the way you&rsquo;re doing things.</li> <li>Document the approach, the thinking behind it, and how to adopt it.</li> <li>Share the document around. If people see you succeeding with the approach, then they&rsquo;re likely to copy it from you.</li> </ol> <p>You might be skeptical because this is an influence-based approach. However, as we&rsquo;ll discuss in the next section, even executive-driven strategy is highly dependent on influence.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><strong>Strategy archaeology</strong></p> <p>Vernor Vinge&rsquo;s <em><a href="https://en.wikipedia.org/wiki/A_Deepness_in_the_Sky">A Deepness in the Sky</a></em>, published in 1999, introduced the term software archaeologists, folks who created functionality by cobbling together millennia of scraps of existing software.</p> <p>Although it&rsquo;s a somewhat different usage, I sometimes think of the &ldquo;take five, then synthesize&rdquo; approach as performing strategy archaeology. Simply by recording what has happened in the past, we make it easier to understand the present, and influence the future.</p> </div> <h2 id="doing-strategy-as-an-executive">Doing strategy as an executive</h2> <p>The biggest misconception about executive roles, frequently held by non-executives and new executives who are about to make a series of regrettable mistakes, is that executives operate without constraints. That is false: executives have an extremely high number of constraints that they operate under. Executives have budgets, CEO visions, peers to satisfy, and a team to motivate. They can disappoint any of these temporarily, but long-term have to satisfy all of them.</p> <p>Nonetheless, it is true that executives have more latitude to mandate and cajole participation in the strategies that they sponsor. <em>The Engineering Executive&rsquo;s Primer</em>&rsquo;s <a href="https://lethain.com/eng-strategies/">chapter on strategy</a> is a brief summary of this entire book, but it doesn&rsquo;t say much about how executive strategy differs from non-executive strategy.</p> <p>How the executive&rsquo;s approach to strategy differs from the engineer&rsquo;s can be boiled down to:</p> <ol> <li> <p>Executives can mandate following of their strategy, which empowers their policy options. An engineer can&rsquo;t prevent the promotion of someone who refused to follow their policy, but an executive can.</p> <p>Mandates only matter if there are consequences. If an executive is unwilling to enforce consequences for non-compliance with a mandate, the ability to issue a mandate isn&rsquo;t meaningful.</p> <p>This is also true if they <em>can&rsquo;t</em> enforce a mandate because of lack of support from their peer executives.</p> </li> <li> <p>Even if an executive is unwilling to use mandates, they have significant visibility and access to their organization to advocate for their preferred strategy.</p> </li> <li> <p>Neither access nor mandates improve an executive&rsquo;s ability to diagnose problems. However, both often create the appearance of progress. This is why executive strategies can fail so spectacularly and endure so long despite failure.</p> </li> </ol> <p>As a result, my experience is that executives have an easier time doing strategy, but a much harder time learning how to do strategy well, and fewer protections to avoid serious mistakes. Further, the consequences of an executive&rsquo;s poor strategy tend to be much further reaching than an engineer&rsquo;s. Waiting to do strategy until you are an executive is a recipe for disaster, even if it looks easier from a distance.</p> <h2 id="doing-strategy-in-other-roles">Doing strategy in other roles</h2> <p>Even if you&rsquo;re neither an engineer nor an engineering executive, you can still do engineering strategy. It&rsquo;ll just require an even more influence-driven approach.</p> <p>The engineering organization is generally right to believe that they know the most about engineering, but that&rsquo;s not always true. Sometimes a product manager used to be an engineer and has significant relevant experience. Other times, such as the <a href="https://lethain.com/llm-adoption-strategy/">early adoption of large language models</a>, engineers don&rsquo;t know much either, and benefit from outside perspectives.</p> <h2 id="doing-strategy-in-challenging-environments">Doing strategy in challenging environments</h2> <p>Good strategies succeed by accurately diagnosing circumstances and picking policies that address those circumstances. You are likely to spend time in organizations where both of those are challenging due to internal limitations, so it&rsquo;s worth acknowledging that and discussing how to navigate those challenges.</p> <h3 id="low-trust-environment">Low-trust environment</h3> <p>Sometimes the struggle to diagnose problems is a skill issue. Being bad at strategy is in some ways the easy problem to solve: just do more strategy work to build expertise. In other cases, you may see what the problems are fairly clearly, but not know how to acknowledge the problems because your organization&rsquo;s culture would frown on it. The latter is a diagnosis problem rooted in low-trust, and does make things more difficult.</p> <p>The chapter on <a href="https://lethain.com/diagnosis-for-strategy/">Diagnosis</a> recognizes this problem, and admits that sometimes you have to whisper the controversial parts of a strategy:</p> <blockquote> <p>When you’re writing a strategy, you’ll often find yourself trying to choose between two awkward options: say something awkward or uncomfortable about your company or someone working within it, or omit a critical piece of your diagnosis that’s necessary to understand the wider thinking. Whenever you encounter this sort of debate, my advice is to find a way to include the diagnosis, but to reframe it into a palatable statement that avoids casting blame too narrowly.</p></blockquote> <p>In short, the solution to low-trust is to translate difficult messages into softer, less direct versions that are acceptable to state. If your goal is to hold people accountable, this can feel dishonest or like a ethical compromise, but the goal of strategy is to make better decisions, which is an entirely different concern than holding folks accountable for the past.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><strong>Karpman Drama Triangle</strong></p> <p>Sometimes when the diagnosis seems particularly obvious, and people don&rsquo;t agree with you, it&rsquo;s because you are wrong. When I&rsquo;ve been obviously wrong about things I understand well, it&rsquo;s usually because I&rsquo;ve fallen into viewing a situation through the <a href="https://en.wikipedia.org/wiki/Karpman_drama_triangle">Karpman Drama Triangle</a>, where all parties are mapped as the persecutor, the rescuer, or the victim.</p> </div> <h2 id="poor-judgment-environment">Poor-judgment environment</h2> <p>Even when you do an excellent job diagnosing challenges, it can be difficult to drive agreement within the organization about how to address them. Sometimes this is due to genuinely complex tradeoffs, for example in <a href="https://lethain.com/pos-acquisition-integration/">Stripe&rsquo;s acquisition of Index</a>, there was debate about how to deal with Index&rsquo;s Java-based technology stack, which culminated in a compromise that didn&rsquo;t make anyone particularly happy:</p> <blockquote> <p>Defer making a decision regarding the introduction of Java to a later date: the introduction of Java is incompatible with our existing engineering strategy, but at this point we’ve also been unable to align stakeholders on how to address this decision. Further, we see attempting to address this issue as a distraction from our timely goal of launching a joint product within six months.</p> <p>We will take up this discussion after launching the initial release.</p></blockquote> <p>That compromise is a good example of a difficult tradeoff: although parties disagreed with the approach, everyone understood the conflicting priorities that had to be addressed.</p> <p>In other cases, though, there are policy choices that simply don&rsquo;t make much sense, generally driven by poor judgment in your organization. Sometimes it&rsquo;s not poor technical judgment, but poor judgment in choosing to prioritize one&rsquo;s personal interests at the expense of the company&rsquo;s needs. Calm&rsquo;s strategy to <a href="https://lethain.com/calm-product-eng-company/">focus on being a product-engineering organization</a> dealt with some aspects of that, acknowledged in its diagnosis:</p> <blockquote> <p>We’re arguing a particularly large amount about adopting new technologies and rewrites. Most of our disagreements stem around adopting new technologies or rewriting existing components into new technology stacks. For example, can we extend this feature or do we have to migrate it to a service before extending it? Can we add this to our database or should we move it into a new Redis cache instead? Is JavaScript a sufficient programming language, or do we need to rewrite this functionality in Go?</p></blockquote> <p>In that situation, your strategy is an attempt to educate your colleagues about the tradeoffs they are making, but ultimately sometimes folks will disagree with your strategy. In that case, remember that most interesting problems require iterative solutions. Writing your strategy and sharing it will start to change the organization&rsquo;s mind. Don’t get discouraged even if that change is initially slow.</p> <h3 id="dealing-with-missing-strategies">Dealing with missing strategies</h3> <p>The strategy for <a href="https://lethain.com/private-equity-strategy/">dealing with new private equity ownership</a> introduces a common problem: lack of clarity about what other parts of your own company want. In that case, it seems likely there will be a layoff, but it&rsquo;s unclear how large that layoff will be:</p> <blockquote> <p>Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&amp;D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction.</p></blockquote> <p>Many leaders encounter that sort of ambiguity and decide that they cannot move forward with a strategy of their own until that decision is made. While it&rsquo;s true that it&rsquo;s inconvenient not to know the details, getting blocked by ambiguity is <em>always</em> the wrong decision.</p> <p>Instead you should do what the private equity strategy does: accept that ambiguity as a fact to be worked around. Rather than giving up, it adopts a series of new policies to start reducing cost growth by changing their <a href="https://lethain.com/engineering-cost-model/">organization&rsquo;s seniority mix</a>, and recognizes that once there is clarity on reduction targets that there will be additional actions to be taken.</p> <p>Whenever you&rsquo;re doing something challenging, there are an infinite number of reasonable rationales for why you shouldn&rsquo;t or can&rsquo;t make progress. Leadership is finding a way to move forward despite those issues. A missing strategy is always part of your diagnosis, but never a reason that you can&rsquo;t do strategy.</p> <h2 id="who-shouldnt-do-strategy">Who shouldn&rsquo;t do strategy</h2> <p>In my experience, there&rsquo;s almost never a reason why <em>you</em> cannot do strategy, but there are two particular scenarios where doing strategy probably doesn&rsquo;t make sense. The first is not a who, but a <a href="https://lethain.com/when-write-down-engineering-strategy/">when problem</a>: sometimes there is so much strategy already happening, that doing more is a distraction. If another part of your organization is already working on the same problem, do your best to work with them directly rather than generating competing work.</p> <p>The other time to avoid strategy is when you&rsquo;re trying to satisfy an emotional need to make a direct, immediate impact. Sharing a thoughtful strategy always makes progress, but it&rsquo;s often the slow, incremental progress of changing your organization&rsquo;s beliefs. Even definitive, top-down strategies from executives are often ignored in pockets of an organization, and bottoms-up strategy spread slowly as they are modeled, documented and shared. Embarking on strategy work requires a tolerance for winning in the long-run, even when there&rsquo;s little progress this week or this quarter.</p> <h2 id="summary">Summary</h2> <p>As you finish reading this chapter, my hope is that you also believe that you can work on strategy in your organization, whether you&rsquo;re an engineer or an executive. I also hope that you appreciate that the tools you use vary greatly depending on who you are within your organization and the culture in which you work. Whether you need to model or can mandate, there&rsquo;s a mechanism that will work for you.</p>How to integrate Stripe's acquisition of Index? (2018)https://lethain.com/pos-acquisition-integration/Thu, 27 Feb 2025 06:00:00 -0700https://lethain.com/pos-acquisition-integration/<p>While discussions around acquisitions often focus on <a href="https://lethain.com/engineering-in-mergers-and-acquisition/">technical diligence</a> and deciding whether to make the acquisition, the integration that follows afterwards can be even more complex. There are few irreversible trapdoor decisions in engineering, but decisions made early in an integration tend to be surprisingly durable.</p> <p>This engineering strategy explores Stripe&rsquo;s approach to integrating <a href="https://www.pymnts.com/news/partnerships-acquisitions/2018/stripe-pos-software-startup-index-acquisition/">their 2018 acquisition of Index</a>. While a business book would focus on the rationale for the acquisition itself, here that rationale is merely part of the diagnosis that defines the integration tradeoffs. The integration itself is the area of focus.</p> <p>Like most acquisitions, the team responsible for the integration has only learned about the project after the deal closed, which means early efforts are a scramble to apply <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a> to distinguish between optimistic dates and technical realities.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="reading-this-document">Reading this document</h2> <p>To apply this strategy, start at the top with <em>Policy &amp; Operation</em>. To understand the thinking behind this strategy, read sections in reserve order, starting with <em>Explore</em>.</p> <p>More detail on this structure in <a href="https://lethain.com/readable-engineering-strategy-documents">Making a readable Engineering Strategy document</a>.</p> <h2 id="policy--operation">Policy &amp; Operation</h2> <p>We&rsquo;re starting with little shared context between the acquired and acquiring engineering teams, and have a six month timeline to launch a joint product. So our starting policy is a mix of a commitment to joint refinement and several provisional architectural policies:</p> <ol> <li> <p><strong>Meet at least weekly until the initial release is complete</strong>: the involved leadership from Stripe and Index will hold a weekly sync meeting to refine our approach until we fulfill our initial release timeline.</p> <p>This meeting is jointly owned by Stripe&rsquo;s Head of Traffic Engineering and Index&rsquo;s Head of Engineering.</p> </li> <li> <p><strong>Minimize changes to tokenization environment</strong>: because point-of-sale devices directly work with customer payment details, the API that directly supports the point-of-sale device must live within our secured environment where payment details are stored.</p> <p>However, any other functionality <em>must not</em> be added to our tokenization environment.</p> </li> <li> <p><strong>All other functionality must exist in standard environments</strong>: except for the minimum necessary functionality moving into the tokenization environment, everything else must be operated in our standard, non-tokenization environments. In particular, any software that requires frequent changes, or introduces complex external dependencies, should exist in the standard environments.</p> </li> <li> <p><strong>Defer making a decision regarding the introduction of Java to a later date</strong>: the introduction of Java is incompatible with our existing engineering strategy, but at this point we&rsquo;ve also been unable to align stakeholders on how to address this decision. Further, we see attempting to address this issue as a distraction from our timely goal of launching a joint product within six months.</p> <p>We will take up this discussion after launching the initial release.</p> </li> <li> <p><strong>Escalations come to paired leads</strong>: given our limited shared context across teams, all escalations must come to both Stripe&rsquo;s Head of Traffic Engineering and Index&rsquo;s Head of Engineering.</p> </li> <li> <p><strong>Security review of changes impacting tokenization environment</strong>: we need to move quickly to launch the combined point-of-sale and payments product, but we <em>must not</em> cut corners on security to launch faster. Security must be included and explicitly sign off on any integration decisions that involve our tokenization environment</p> </li> </ol> <h2 id="diagnose">Diagnose</h2> <p>There are generally four categories of acquisitions: talent acquisitions to bring on a talented team, business acquisitions to buy a company&rsquo;s revenue and product, technology acquisitions to add a differentiated capability that would be challenging to develop internally, and time-to-market acquisitions where you could develop the capability internally but can develop it meaningfully faster by acquiring a company.</p> <p>While most acquisitions have a flavor of several of these dimensions, this acquisition is primarily a time-to-market acquisition aimed to address these constraints:</p> <ul> <li> <p>Several of our largest customers are pushing for us to provide a point-of-sale device integrated with our API-driven payments ecosystem. At least one has implied that we either provide this functionality on a committed timeline or they may churn to a competitor.</p> </li> <li> <p>We currently have no homegrown expertise in developing or integrating with hardware such as point-of-sale devices. Based on other zero-to-one efforts internally, we believe it would take about a year to hire the team, develop and launch a minimum-viable product for a point-of-sale device integrated into our platform.</p> </li> <li> <p>Where we&rsquo;ve taken a horizontal approach to supporting web payments via an API, at least one of our competitors, Square, has taken a vertically integrated approach. While their API ecosystem is less developed than ours, they are a plausible destination for customers threatening to churn.</p> </li> <li> <p>We believe that at least one of our enterprise customers will churn if our best commitment is launching a point-of-sale solution 12 months from now.</p> </li> <li> <p>We&rsquo;ve decided to acquire a small point-of-sale startup, which we will use to commit to a six month timeframe for supporting an integrated point-of-sale device with our API ecosystem.</p> </li> <li> <p>We will need to rapidly integrate the acquired startup to meet this timeline. We only know a small number of details about what this will entail. We <em>do</em> know that point-of-sale devices directly operate on payment details (e.g. the point-of-sale device knows the credit card details of the card it reads).</p> <p>Our compliance obligations restrict such activity to our &ldquo;tokenization environment&rdquo;, a highly secured and isolated environment with direct access to payment details. This environment converts payment details into a unique token that other environments can utilize to operate against payment details without the compliance overhead of having direct access to the underlying payment details.</p> </li> <li> <p>Going into this technical integration, we have few details about the acquired company&rsquo;s technology stack. We do know that they are primarily a Java shop running on AWS, where we are primarily a Ruby (with some Go) shop running on AWS.</p> </li> </ul> <h2 id="explore">Explore</h2> <p>Prior to this acquisition, we have done several small acquisitions. None of those acquisitions had a meaningful product to integrate with ours, so we don&rsquo;t have much of an internal playbook to anchor our approach in.</p> <p>We do have limited experience in integrating technical acquisitions from prior companies we&rsquo;ve worked in, along with talking to peers at other companies to mine their experience. Synthesizing those experiences, the recurring patterns are:</p> <ol> <li> <p>Usually deal teams have made certain commitments, or the acquired team has understood certain commitments, that will be challenging to facilitate. This is doubly true when you are unaware of what those commitments might be.</p> <p>If folks seem to be behaving oddly, it might be one such misunderstanding, and it&rsquo;s worth engaging directly to debug the confusion.</p> </li> <li> <p>There should be an executive sponsor for the acquisition, and the sponsor is typically the best person to ask about the company&rsquo;s intentions. If you can&rsquo;t find the executive sponsor, or they are not engaged, try to recruit a new executive sponsor rather than trying to make things work without one.</p> </li> <li> <p>Close the culture gap quickly where there&rsquo;s little friction, and cautiously where there&rsquo;s little trust.</p> <p>We do need to bring the acquired company into our culture, but we have years to do that. The most successful stories of doing this leaned on a mix of moving folks into and out of the acquired team rather than applying force.</p> </li> <li> <p>The long-term cost of supporting a new technology stack is high, and in conflict with our technology strategy of consolidating on as few programming languages as possible.</p> <p>This is not the place to be flexible, as each additional feature in the new stack will take you further from your desired outcome.</p> </li> <li> <p>Finally, find a way to derisk key departures. Things can go wrong quickly. One of the easiest starting points is consolidating infrastructure immediately, even if the product or software takes longer.</p> </li> </ol> <p>Altogether, this was not the most reassuring exploration: it was a bit abstract, and much of our research returned strongly-held, conflicting perspectives. Perhaps acquisitions, like starting a new company, is one of those places where there&rsquo;s simply no right way to do it well.</p>Diagnosis in engineering strategy.https://lethain.com/diagnosis-for-strategy/Sat, 22 Feb 2025 04:00:00 -0700https://lethain.com/diagnosis-for-strategy/<p>Once you&rsquo;ve written your <a href="https://lethain.com/exploring-for-strategy/">strategy&rsquo;s exploration</a>, the next step is working on its diagnosis. Diagnosis is understanding the constraints and challenges your strategy needs to address. In particular, it&rsquo;s about doing that understanding while slowing yourself down from deciding how to <em>solve</em> the problem at hand before you know the problem&rsquo;s nuances and constraints.</p> <p>If you ever find yourself wanting to skip the diagnosis phase&ndash;let&rsquo;s get to the solution already!&ndash;then maybe it&rsquo;s worth acknowledging that every strategy that I&rsquo;ve seen fail, did so due to a lazy or inaccurate diagnosis. It&rsquo;s very challenging to fail with a proper diagnosis, and almost impossible to succeed without one.</p> <p>The topics this chapter will cover are:</p> <ul> <li>Why diagnosis is the foundation of effective strategy, on which effective policy depends. Conversely, how skipping the diagnosis phase consistently ruins strategies</li> <li>A step-by-step approach to diagnosing your strategy&rsquo;s circumstances</li> <li>How to incorporate data into your diagnosis effectively, and where to focus on adding data</li> <li>Dealing with controversial elements of your diagnosis, such as pointing out that your own executive is one of the challenges to solve</li> <li>Why it&rsquo;s more effective to view difficulties as part of the problem to be solved, rather than a blocking issue that prevents making forward progress</li> <li>The near impossibility of an effective diagnosis if you don&rsquo;t bring humility and self-awareness to the process</li> </ul> <p>Into the details we go!</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="diagnosis-is-strategys-foundation">Diagnosis is strategy&rsquo;s foundation</h2> <p>One of the challenges in evaluating strategy is that, after the fact, many effective strategies are so obvious that they&rsquo;re pretty boring. Similarly, most ineffective strategies are so clearly flawed that their authors look lazy. That&rsquo;s because, as a strategy is operated, the reality around it becomes clear. When you&rsquo;re writing your strategy, you don&rsquo;t know if you can convince your colleagues to adopt a new approach to specifying APIs, but a year later you know very definitively whether it&rsquo;s possible.</p> <p>Building your strategy&rsquo;s diagnosis is your attempt to correctly recognize the context that the strategy needs to solve before deciding on the policies to address that context. Done well, the subsequent steps of writing strategy often feel like an afterthought, which is why I think of diagnosis as strategy&rsquo;s foundation.</p> <p>Where <a href="https://lethain.com/exploring-for-strategy/">exploration</a> was an evaluation-free activity, diagnosis is all about evaluation. How do teams feel today? Why did that project fail? Why did the last strategy go poorly? What will be the distractions to overcome to make this new strategy successful?</p> <p>That said, not all evaluation is equal. If you state your judgment directly, it&rsquo;s easy to dispute. An effective diagnosis is hard to argue against, because it&rsquo;s a web of interconnected observations, facts, and data. Even for folks who dislike your conclusions, the weight of evidence should be hard to shift.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><a href="https://lethain.com/testing-strategy-iterative-refinement/">Strategy testing</a>, explored in the Refinement section, takes advantage of the reality that it&rsquo;s easier to diagnose by doing than by speculating. It proposes a recursive diagnosis process until you have real-world evidence that the strategy is working.</p> </div> <h2 id="how-to-develop-your-diagnosis">How to develop your diagnosis</h2> <p>Your strategy is almost certain to fail unless you start from an effective diagnosis, but how to build a diagnosis is often left unspecified. That&rsquo;s because, for most folks, building the diagnosis is indeed a dark art: unspecified, undiscussed, and uncontrollable. I&rsquo;ve been guilty of this as well, with <em>The Engineering Executive&rsquo;s Primer</em>&rsquo;s <a href="https://lethain.com/eng-strategies/">chapter on strategy</a> staying silent on the details of how to diagnose for your strategy.</p> <p>So, yes, there is some truth to the idea that forming your diagnosis is an emergent, organic process rather than a structured, mechanical one. However, over time I&rsquo;ve come to adopt a fairly structured approach:</p> <ol> <li> <p><strong>Braindump</strong>, starting from a blank sheet of paper, write down your best understanding of the circumstances that inform your current strategy. Then set that piece of paper aside for the moment.</p> </li> <li> <p><strong>Summarize exploration</strong> on a new piece of paper, review the contents of your <a href="https://lethain.com/exploring-for-strategy/">exploration</a>. Pull in every piece of diagnosis from similar situations that resonates with you. This is true for both internal and external works! For each diagnosis, tag whether it fits perfectly, or needs to be adjusted for your current circumstances. Then, once again, set the piece of paper aside.</p> </li> <li> <p><strong>Mine for distinct perspectives</strong> on yet another blank page, talking to different stakeholders and colleagues who you know are likely to disagree with your early thinking. Your goal is not to agree with this feedback. Instead, it&rsquo;s to understand their view.</p> <p><em><a href="https://www.amazon.com/Crux-How-Leaders-Become-Strategists-ebook/dp/B09G2QXXWX">The Crux</a></em> by Richard Rumelt anchors diagnosis in this approach, emphasizing the importance of &ldquo;testing, adjusting, and changing the frame, or point of view.&rdquo;</p> </li> <li> <p><strong>Synthesize views into one internally consistent perspective.</strong> Sometimes the different perspectives you&rsquo;ve gathered don&rsquo;t mesh well. They might well explicitly differ in what they believe the underlying problem is, as is typical in tension between platform and product engineering teams. The goal is to competently represent each of these perspectives in the diagnosis, even the ones you disagree with, so that later on you can evaluate your proposed approach against each of them.</p> <p>When synthesizing feedback goes poorly, it tends to fail in one of two ways. First, the author&rsquo;s opinion shines through so strongly that it renders the author suspect. Your goal is never to <em>agree</em> with every team&rsquo;s perspective, just as your diagnosis should typically avoid crowning any perspective as correct: a reader should generally be appraised of the details and unaware of the author.</p> <p>The second common issue is when a group tries to jointly own the synthesis, but create a fractured perspective rather than a unified one. I generally find that having one author who is accountable for representing all views works best to address both of these issues.</p> </li> <li> <p><strong>Test drafts across perspectives.</strong> Once you&rsquo;ve written your initial diagnosis, you want to sit down with the people who you expect to disagree most fervently. Iterate with them until they agree that you&rsquo;ve accurately captured their perspective.</p> <p>It might be that they disagree with some other viewpoints, but they should be able to agree that others hold those views. They might argue that the data you&rsquo;ve included doesn&rsquo;t capture their full reality, in which case you can caveat the data by saying that their team disagrees that it&rsquo;s a comprehensive lens.</p> </li> <li> <p><strong>Don&rsquo;t worry about getting the details perfectly right in your initial diagnosis.</strong> You&rsquo;re trying to get the right crumbs to feed into the next phase, <a href="https://lethain.com/refining-eng-strategy/">strategy refinement</a>. Allowing yourself to be directionally correct, rather than perfectly correct, makes it possible to cover a broad territory quickly. Getting caught up in perfecting details is an easy way to anchor yourself into one perspective prematurely.</p> </li> </ol> <p>At this point, I hope you&rsquo;re starting to predict how I&rsquo;ll conclude any recipe for strategy creation: if these steps feel overly mechanical to you, adjust them to something that feels more natural and authentic. There&rsquo;s no perfect way to understand complex problems. That said, if you feel uncertain, or are skeptical of your own track record, I do encourage you to start with the above approach as a launching point.</p> <h2 id="incorporating-data-into-your-diagnosis">Incorporating data into your diagnosis</h2> <p>The strategy for <a href="https://lethain.com/private-equity-strategy/">Navigating Private Equity ownership</a>&rsquo;s diagnosis includes a number of details to help readers understand the status quo. For example the section on headcount growth explains headcount growth, how it compares to the prior year, and providing a mental model for readers to translate engineering headcount into engineering headcount costs:</p> <blockquote> <p>Our Engineering headcount costs have grown by 15% YoY this year, and 18% YoY the prior year. Headcount grew 7% and 9% respectively, with the difference between headcount and headcount costs explained by salary band adjustments (4%), a focus on hiring senior roles (3%), and increased hiring in higher cost geographic regions (1%).</p></blockquote> <p>If everyone evaluating a strategy shares the same foundational data, then evaluating the strategy becomes vastly simpler. Data is also your mechanism for supporting or critiquing the various views that you&rsquo;ve gathered when drafting your diagnosis; to an impartial reader, data will speak louder than passion. If you&rsquo;re confident that a perspective is true, then include a data narrative that supports it. If you believe another perspective is overstated, then include data that the reader will require to come to the same conclusion.</p> <p>Do your best to include data analysis with a link out to the full data, rather than requiring readers to interpret the data themselves while they are reading. As your strategy document travels further, there will be inevitable requests for different cuts of data to help readers understand your thinking, and this is somewhat preventable by linking to your original sources.</p> <p>If much of the data you want doesn&rsquo;t exist today, that&rsquo;s a fairly common scenario for strategy work: if the data to make the decision easy already existed, you probably would have already made a decision rather than needing to run a structured thinking process. The next chapter <a href="https://lethain.com/refining-eng-strategy/">on refining strategy</a> covers a number of tools that are useful for building confidence in low-data environments.</p> <h2 id="whisper-the-controversial-parts">Whisper the controversial parts</h2> <p>At one time, the company I worked at rolled out a bar raiser program styled after Amazon&rsquo;s, where there was an interviewer from outside the team that had to approve every hire. I spent some time arguing against adding this additional step as I didn&rsquo;t understand what we were solving for, and I was surprised at how disinterested management was about knowing if the new process actually improved outcomes.</p> <p>What I didn&rsquo;t realize until much later was that most of the senior leadership distrusted one of their peers, and had rolled out the bar raiser program solely to create a mechanism to control that manager&rsquo;s hiring bar when the CTO was disinterested holding that leader accountable. (I also learned that these leaders didn&rsquo;t care much about implementing this policy, resulting in bar raiser rejections being frequently ignored, but that&rsquo;s a discussion for the <a href="https://lethain.com/operations-for-strategy/">Operations for strategy chapter</a>.)</p> <p>This is a good example of a strategy that <em>does</em> make sense with the full diagnosis, but makes little sense without it, and where stating part of the diagnosis out loud is nearly impossible. Even senior leaders are not generally allowed to write a document that says, &ldquo;The Director of Product Engineering is a bad hiring manager.&rdquo;</p> <p>When you&rsquo;re writing a strategy, you&rsquo;ll often find yourself trying to choose between two awkward options:</p> <ol> <li>Say something awkward or uncomfortable about your company or someone working within it</li> <li>Omit a critical piece of your diagnosis that&rsquo;s necessary to understand the wider thinking</li> </ol> <p>Whenever you encounter this sort of debate, my advice is to find a way to include the diagnosis, but to reframe it into a palatable statement that avoids casting blame too narrowly. I think it&rsquo;s helpful to discuss a few concrete examples of this, starting with the strategy for <a href="https://lethain.com/private-equity-strategy/">navigating private equity</a>, whose diagnosis includes:</p> <blockquote> <p>Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&amp;D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction.</p></blockquote> <p>There are many things the authors of this strategy likely feel about their state of reality. First, they are probably upset about the fact that their new private equity ownership is likely to eliminate colleagues. Second, they are likely upset that there is no clear plan around what they need to do, so they are stuck preparing for a wide range of potential outcomes. However they feel, they don&rsquo;t say any of that; they stick to precise, factual statements.</p> <p>For a second example, we can look to the <a href="https://lethain.com/uber-service-migration-strategy/">Uber service migration strategy</a>:</p> <blockquote> <p>Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing.</p></blockquote> <p>The team didn&rsquo;t <em>agree</em> that their headcount should not be growing, but it was the reality they were operating in. They acknowledged their reality as a factual statement, without any additional commentary about that statement.</p> <p>In both of these examples, they found a professional, non-judgmental way to acknowledge the circumstances they were solving. The authors would have preferred that the leaders behind those decisions take explicit accountability for them, but it would have undermined the strategy work had they attempted to do it within their strategy writeup.</p> <p>Excluding critical parts of your diagnosis makes your strategies particularly hard to evaluate, copy or recreate. Find a way to say things politely to make the strategy effective. As always, strategies are much more about realities than ideals.</p> <h2 id="reframe-blockers-as-part-of-diagnosis">Reframe blockers as part of diagnosis</h2> <p>When I work on strategy with early-career leaders, an idea that comes up a lot is that an identified problem means that strategy is not possible. For example, they might argue that doing strategy work is impossible at their current company because the executive team changes their mind too often.</p> <p>That core insight is almost certainly true, but it&rsquo;s much more powerful to reframe that as a diagnosis: if we don&rsquo;t find a way to show concrete progress quickly, and use that to excite the executive team, our strategy is likely to fail. This transforms the thing preventing your strategy into a condition your strategy needs to address.</p> <p>Whenever you run into a reason why your strategy seems unlikely to work, or why strategy overall seems difficult, you&rsquo;ve found an important piece of your diagnosis to include. There are never reasons why strategy simply cannot succeed, only diagnoses you&rsquo;ve failed to recognize.</p> <p>For example, we knew in our work on <a href="https://lethain.com/uber-service-migration-strategy/">Uber&rsquo;s service provisioning strategy</a> that we weren&rsquo;t getting more headcount for the team, the product engineering team was going to continue growing rapidly, and that engineering leadership was unwilling to constrain how product engineering worked. Rather than preventing us from implementing a strategy, those components clarified what sort of approach could actually succeed.</p> <h2 id="the-role-of-self-awareness">The role of self-awareness</h2> <p>Every problem of today is partially rooted in the decisions of yesterday. If you&rsquo;ve been with your organization for any duration at all, this means that <em>you</em> are directly or indirectly responsible for a portion of the problems that your diagnosis ought to recognize.</p> <p>This means that recognizing the impact of your prior actions in your diagnosis is a powerful demonstration of self-awareness. It also suggests that your next strategy&rsquo;s success is rooted in your self-awareness about your prior choices. Don&rsquo;t be afraid to recognize the failures in your past work. While changing your mind <em>without</em> new data is a sign of chaotic leadership, changing your mind <em>with</em> new data is a sign of thoughtful leadership.</p> <h2 id="summary">Summary</h2> <p>Because diagnosis is the foundation of effective strategy, I&rsquo;ve always found it the most intimidating phase of strategy work. While I think that&rsquo;s a somewhat unavoidable reality, my hope is that this chapter has somewhat prepared you for that challenge.</p> <p>The four most important things to remember are simply:</p> <ol> <li>form your diagnosis before deciding how to solve it,</li> <li>try especially hard to capture perspectives you initially disagree with,</li> <li>supplement intuition with data where you can, and</li> <li>accept that sometimes you&rsquo;re missing the data you need to fully understand.</li> </ol> <p>The last piece in particular, is why many good strategies never get shared, and the topic we&rsquo;ll address in the next chapter on <a href="https://lethain.com/refining-eng-strategy/">strategy refinement</a>.</p>Exploring for strategy.https://lethain.com/exploring-for-strategy/Thu, 13 Feb 2025 04:00:00 -0700https://lethain.com/exploring-for-strategy/<p>A surprising number of strategies are doomed from inception because their authors get attached to one particular approach without considering alternatives that would work better for their current circumstances. This happens when engineers want to pick tools solely because they are trending, and when executives insist on adopting the tech stack from their prior organization where they felt comfortable.</p> <p>Exploration is the antidote to early anchoring, forcing you to consider the problem widely <em>before</em> evaluating any of the paths forward. Exploration is about updating your priors before assuming the industry hasn&rsquo;t evolved since you last worked on a given problem. Exploration is continuing to believe that things can get better when you&rsquo;re not watching.</p> <p>This chapter covers:</p> <ul> <li>The goals of the exploration phase of strategy creation</li> <li>When to explore (always first!) and when it makes sense to stop exploring</li> <li>How to explore a topic, including discussion of the most common mechanisms: mining for internal precedent, reading industry papers and books, and leveraging your external network</li> <li>Why avoiding judgment is an essential part of exploration</li> </ul> <p>By the end of this chapter, you&rsquo;ll be able to conduct an exploration for the current or next strategy that you work on.</p> <h2 id="what-is-exploration">What is exploration?</h2> <p>One of the frequent senior leadership anti-patterns I&rsquo;ve encountered in my career is <a href="https://lethain.com/grand-migration/">The Grand Migration</a>, where a new leader declares that a massive migration to a new technology stack&ndash;typically the stack used by their former employer&ndash;will solve every pressing problem. What&rsquo;s distinguishing about the Grand Migration is not the initially bad selection, but the single-minded ferocity with which the senior leader pushes for their approach, even when it becomes abundantly clear to others that it doesn&rsquo;t solve the problem at hand.</p> <p>These senior leaders are very intelligent, but have allowed themselves to be framed in by their initial thinking from prior experiences. Accepting those early thoughts as the foundation of their strategy, they build the entire strategy on top of those ideas, and eventually there is so much weight standing on those early assumptions that it becomes impossible to acknowledge the errors.</p> <p>Exploration is the deliberate practice of searching through a strategy&rsquo;s problem and solution spaces before allowing yourself to commit to a given approach. It&rsquo;s understanding how others have approached the same problem recently and in the past. It&rsquo;s doing this both in trendy companies you admire, and in practical companies that actually resemble yours.</p> <p>Most exploration will be external to your team, but depending on your company, much of your exploration might be internal to the company. If you&rsquo;re in a massive engineering organization of 100,000, there are likely existing internal solutions to your problem that you&rsquo;ve never heard of. Conversely, if you&rsquo;re in an organization of 50 engineers, it&rsquo;s likely that much of your exploration will be external.</p> <h2 id="when-to-explore">When to explore</h2> <p>Exploration is the first step of good strategy work. Even when you want to skip it, you will always regret skipping it, because you&rsquo;ll inadvertently frame yourself into whatever approach you focus on first. Especially when it comes to problems that you&rsquo;ve solved previously, exploration is the only thing preventing you from over-indexing on your prior experiences.</p> <p>Try to continue exploration until you know how three similar teams within your company and three similar companies have recently solved the same problem. Further, make sure you are able to explain the thinking behind those decisions. At that point, you should be ready to stop exploring and move on to the <a href="https://lethain.com/diagnosis-for-strategy/">diagnosis step</a> of strategy creation.</p> <p>Exploration should always come with a minimum and maximum timeframe: less than a few hours is very suspicious, and more than a week is generally questionably as well.</p> <h2 id="how-to-explore">How to explore</h2> <p>While the details of each exploration will differ a bit, the overarching approach tends to be pretty similar across strategies. After I open up the draft strategy document I&rsquo;m working on, my general approach to exploration is:</p> <ol> <li> <p>Start throwing in every resource I can think of related to that problem.</p> <p>For example, in the <a href="https://lethain.com/uber-service-migration-strategy/">Uber service provisioning strategy</a>, I started by collecting recent papers on Mesos, Kubernetes, and Aurora to understand the state of the industry on orchestration.</p> </li> <li> <p>Do some web searching, foundational model prompting, and checking with a few current and prior colleagues about what topics and resources I might be missing.</p> <p>For example, for the <a href="https://lethain.com/calm-product-eng-company/">Calm engineering strategy</a>, I focused on talking with industry peers on tools they&rsquo;d used to focus a team with diffuse goals.</p> </li> <li> <p>Summarize the list of resources I&rsquo;ve gathered, organizing them by which I want to explore, and which I won&rsquo;t spend time on but are worth mentioning.</p> <p>For example, the <a href="https://lethain.com/llm-adoption-strategy/">Large Language Model adoption strategy</a>&rsquo;s exploration section documents the variety of resources the team explored before completing it.</p> </li> <li> <p>Work through the list one by one, continuing to collect notes in the strategy document. When you&rsquo;re done, synthesize those into a concise, readable summary of what you&rsquo;ve learned.</p> <p>For example, the <a href="https://lethain.com/decompose-monolith-strategy/">monolith decomposition strategy</a> synthesizes the exploration of a broad topic into four paragraphs, with links out to references.</p> </li> <li> <p>Stop once I generally understand how a handful of similar internal and external teams have recently approached this problem.</p> </li> </ol> <p>Of all the steps in strategy creation, exploration is inherently open-ended, and you may find a different approach works better for you. If you&rsquo;re not sure what to do, try following the above steps closely. If you have a different approach that you&rsquo;re confident in&ndash;as long as it&rsquo;s not skipping exploration!&ndash;then go ahead and try that instead.</p> <div class="bg-light-gray br4 ph3 pv1"> <p>While not discussed in this chapter, you can also use some techniques like <a href="wardley-mapping/">Wardley mapping</a>, covered in the <a href="https://lethain.com/refining-eng-strategy/">Refinement chapter</a>, to support your exploration phase. Wardley mapping is a strategy tool designed within a different strategy tradition, and consequently categorizing it as either solely an exploration tool or a refinement tool ignores some of its potential uses.</p> <p>There&rsquo;s no perfect way to do strategy: take what works for you and use it.</p> </div> <h2 id="mine-internal-precedent">Mine internal precedent</h2> <p>One of the most powerful forms of strategy is simply documenting how similar decisions have been made internally: often this is enough to steer how similar future decisions are made within your organization. This approach, documented in <em>Staff Engineer</em>&rsquo;s <a href="https://staffeng.com/guides/engineering-strategy/">Write five, then synthesize</a>, is also the most valuable step of exploration for those working in established companies.</p> <p>If you are a tenured engineer within your organization, then it&rsquo;s somewhat safe to assume that you are aware of the typical internal approaches. Even then, it&rsquo;s worth poking around to see if there are any related skunkworks projects happening internally. This is doubly true if you&rsquo;ve joined the organization recently, or are distant from the codebase itself. In that case, it&rsquo;s almost always worth poking around to see what already exists.</p> <p>Sometimes the internal approach isn&rsquo;t ideal, but it&rsquo;s still superior because it&rsquo;s already been implemented and there&rsquo;s someone else maintaining it. In the long-run, your strategy can ride along as someone else addresses the issues that aren&rsquo;t perfect fits.</p> <h2 id="using-your-network">Using your network</h2> <p><a href="https://lethain.com/user-data-access-strategy/">How should we control access to user data</a>&rsquo;s exploration section begins with:</p> <blockquote> <p>Our experience is that best practices around managing internal access to user data are widely available through our networks, and otherwise hard to find. The exact rationale for this is hard to determine,</p></blockquote> <p>While there are many topics with significant public writing out there, my experience is that there are many topics where there&rsquo;s very little you can learn without talking directly to practitioners. This is especially true for security, compliance, operating at truly large scale, and competitive processes like optimizing advertising spend.</p> <p>Further, it&rsquo;s surprisingly common to find that how people publicly describe solving a problem and how they actually approach the problem are largely divorced.</p> <p>This is why having a broad personal network is exceptionally powerful, and makes it possible to quickly understand the breadth of possible solutions. It also provides access to the practical downsides to various approaches, which are often omitted when talking to public proponents.</p> <p>In a recent strategy session, a proposal came up that seemed off to me, and I was able to text&ndash;and get answers to those texts&ndash;industry peers before the meeting ended, which invalidated the room&rsquo;s assumptions about what was and was not possible. A disagreement that might have taken weeks to resolve was instead resolved in a few minutes, and we were able to figure out next steps in that meeting rather than waiting a week for the next meeting when we&rsquo;d realized our mistake.</p> <p>Of course, it&rsquo;s <em>also</em> important to hold information from your network with skepticism. I&rsquo;ve certainly had my network be wrong, and your network never knows how your current circumstances differ from theirs, so blindly accepting guidance from your network is never the right decision either.</p> <div class="bg-light-gray br4 ph3 pv1"> <p>If you&rsquo;re looking for a more detailed coverage on building your network, this topic has also come up in <em>Staff Engineer</em>&rsquo;s chapter on <a href="https://staffeng.com/guides/network-of-peers/">Build a network of peers</a>, and <em>The Engineering Executive&rsquo;s Primer</em>&rsquo;s chapter on <a href="https://lethain.com/building-exec-network/">Building your executive network</a>. It feels silly to cover the same topic a third time, but it&rsquo;s a foundational technique for effective decision making.</p> </div> <h2 id="read-widely-read-narrowly">Read widely; read narrowly</h2> <p>Reading has always been an important part of my strategy work. There are two distinct motions to this approach: read widely on an ongoing basis to broaden your thinking, and read narrowly on the specific topic you&rsquo;re working on.</p> <p>Starting with reading widely, I make an effort each year to read ten to twenty industry-relevant works. These are not necessarily new releases, but are new releases <em>for me</em>. Importantly, I try to read things that I don&rsquo;t know much about or that I initially disagree with. Some of my recent reads were <em><a href="https://www.amazon.com/Chip-War-Worlds-Critical-Technology/dp/1982172002">Chip War</a></em>, <em><a href="https://www.amazon.com/Building-Green-Software-Sustainable-Development/dp/1098150627">Building Green Software</a></em>, <em><a href="https://learning.oreilly.com/library/view/tidy-first/9781098151232/">Tidy First?</a></em>, and <em><a href="https://www.amazon.com/How-Big-Things-Get-Done-ebook/dp/B0B3HS4C98/">How Big Things Get Done</a></em>. From each of these books, I learned something, and over time they&rsquo;ve built a series of bookmarks in my head about ideas that might apply to new problems.</p> <p>On the other end of things is reading narrowly. When I recently started working on an AI agents strategy, the first thing I did was read through Chip Huyen&rsquo;s <em><a href="https://www.amazon.com/AI-Engineering-Building-Applications-Foundation/dp/1098166302">AI Engineering</a></em>, which was an exceptionally helpful survey. Similarly, when we started thinking about <a href="https://lethain.com/uber-service-migration-strategy/">Uber&rsquo;s service migration</a>, we read a number of industry papers, including <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf">Large-scale cluster management at Google with Borg</a> and <a href="https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf">Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center</a>.</p> <p>None of these readings had all the answers to the problems I was working on, but they did an excellent job at helping me understand the range of options, as well as identifying other references to consult in my exploration.</p> <p>I&rsquo;ll mention two nuances that will help a lot here. First, I highly encourage getting comfortable with skimming books. Even tightly edited books will have a lot of content that isn&rsquo;t particularly relevant to your current goals, and you should skip that content liberally. Second, what you read doesn&rsquo;t have to be books. It can be blog posts, essays, interview transcripts, or certainly it can be books.</p> <div class="bg-light-gray br4 ph3 pv1"> <p>In this context, &ldquo;reading&rdquo; doesn&rsquo;t event have to actually be reading. There are conference talks that contain just as much as a blog post, and conferences that cover as much breadth as a book. There are also conference talks without a written equivalent, such as Dan Na&rsquo;s excellent <a href="https://blog.danielna.com/talks/pushing-through-friction">Pushing Through Friction</a>.</p> </div> <h2 id="each-job-is-an-education">Each job is an education</h2> <p>Experience is frequently disregarded in the technology industry, and there are ways to misuse experience by copying too liberally the solutions that worked in different circumstances, but the most effective, and the slowest, mechanism for exploring is continuing to work in the details of meaningful problems.</p> <p>You probably won&rsquo;t <a href="https://lethain.com/forty-year-career/">choose every job to optimize for learning</a>, but allowing you to instantly explore more complex problems over time&ndash;recognizing that a bit of your data will have become stale each time&ndash;is uniquely valuable.</p> <h2 id="save-judgment-for-later">Save judgment for later</h2> <p>As I&rsquo;ve mentioned several times, the point of exploration is to go broad with the goal of understanding approaches you might not have considered, and invalidating things you initially think are true. Both of those things are only possible if you save judgment for later: if you&rsquo;re passing judgment about whether approaches are &ldquo;good&rdquo; or &ldquo;bad&rdquo;, then your exploration is probably going astray.</p> <p>As a soft rule, I&rsquo;d argue that if no one involved in a strategy has changed their mind about something they believed when you started the exploration step, then you&rsquo;re not done exploring. This is <em>especially</em> true when it comes to strategy work by senior leaders. Their beliefs are often well-justified by years of experience, but it&rsquo;s unclear which parts of their experience have become stale over time.</p> <h2 id="summary">Summary</h2> <p>At this point, I hope you feel comfortable exploring as the first step of your strategy work, and understand the likely consequences of skipping this step. It&rsquo;s not an overstatement to say that every one of the worst strategic failures I&rsquo;ve encountered would have been prevented by its primary author taking a few days to explore the space before anchoring on a particular approach.</p> <p>A few days of feeling slow are always worth avoiding years of misguided efforts.</p>How should we control access to user data?https://lethain.com/user-data-access-strategy/Fri, 07 Feb 2025 06:00:00 -0700https://lethain.com/user-data-access-strategy/<p>At some point in a startup&rsquo;s lifecycle, they decide that they need to be ready to go public in 18 months, and a flurry of IPO-readiness activity kicks off. This strategy focuses on a company working on IPO readiness, which has identified a gap in their internal controls for managing access to their users&rsquo; data. It&rsquo;s a company that <em>wants</em> to meaningfully improve their security posture around user data access, but which has had a number of failed security initiatives over the years.</p> <p>Most of those initiatives have failed because they significantly degraded internal workflows for teams like customer support, such that the initial progress was reverted and subverted over time, to little long-term effect. This strategy represents the Chief Information Security Officer&rsquo;s (CISO) attempt to acknowledge and overcome those historical challenges while meeting their IPO readiness obligations, and&ndash;most importantly&ndash;doing right by their users.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="reading-this-document">Reading this document</h2> <p>To apply this strategy, start at the top with <em>Policy</em>. To understand the thinking behind this strategy, read sections in reverse order, starting with <em>Explore</em>, then <em>Diagnose</em> and so on. Relative to the default structure, this document has been refactored in two ways to improve readability: first, <em>Operation</em> has been folded into <em>Policy</em>; second, <em>Refine</em> has been embedded in <em>Diagnose</em>.</p> <p>More detail on this structure in <a href="https://lethain.com/readable-engineering-strategy-documents">Making a readable Engineering Strategy document</a>.</p> <h2 id="policy--operations">Policy &amp; Operations</h2> <p>Our new policies, and the mechanisms to operate them are:</p> <ul> <li> <p><strong>Controls for accessing user data must be significantly stronger prior to our IPO.</strong> Senior leadership, legal, compliance and security have decided that we are not comfortable accepting the status quo of our user data access controls as a public company, and must meaningfully improve the quality of resource-level access controls as part of our pre-IPO readiness efforts.</p> <p>Our Security team is accountable for the exact mechanisms and approach to addressing this risk.</p> </li> <li> <p><strong>We will continue to prioritize a hybrid solution to resource-access controls.</strong> This has been our approach thus far, and the fastest available option.</p> </li> <li> <p><strong>Directly expose the log of our resource-level accesses to our users.</strong> We will build towards a user-accessible log of all company accesses of user data, and ensure we are comfortable explaining each and every access. In addition, it means that each rationale for access must be comprehensible and reasonable from a user perspective.</p> <p>This is important because it aligns our approach with our users&rsquo; perspectives. They will be able to evaluate how we access their data, and make decisions about continuing to use our product based on whether they agree with our use.</p> </li> <li> <p><strong>Good security discussions don&rsquo;t frame decisions as a compromise between security and usability.</strong> We will pursue <a href="https://lethain.com/multi-dimensional-tradeoffs/">multi-dimensional tradeoffs</a> to simultaneously improve security and efficiency. Whenever we frame a discussion on trading off between security and utility, it&rsquo;s a sign that we are having the wrong discussion, and that we should rethink our approach.</p> <p>We will prioritize mechanisms that can both automatically authorize <em>and</em> automatically document the rationale for accesses to customer data. The most obvious example of this is automatically granting access to a customer support agent for users who have an open support ticket assigned to that agent. (And removing that access when that ticket is reassigned or resolved.)</p> </li> <li> <p><strong>Measure progress on percentage of customer data access requests justified by a user-comprehensible, automated rationale.</strong> This will anchor our approach on simultaneously improving the security of user data and the usability of our colleagues&rsquo; internal tools. If we only expand requirements for accessing customer data, we won&rsquo;t view this as progress because it&rsquo;s not automated (and consequently is likely to encourage workarounds as teams try to solve problems quickly). Similarly, if we only improve usability, charts won&rsquo;t represent this as progress, because we won&rsquo;t have increased the number of supported requests.</p> <p>As part of this effort, we will create a private channel where the security and compliance team has visibility into all manual rationales for user-data access, and will directly message the manager of any individual who relies on a manual justification for accessing user data.</p> </li> <li> <p><strong>Expire unused roles to move towards principle of least privilege.</strong> Today we have a number of roles granted in our role-based access control (RBAC) system to users who do not use the granted permissions. To address that issue, we will automatically remove roles from colleagues after 90 days of not using the role&rsquo;s permissions.</p> <p>Engineers in an active on-call rotation are the exception to this automated permission pruning.</p> </li> <li> <p><strong>Weekly reviews until we see progress; monthly access reviews in perpetuity.</strong> Starting now, there will be a weekly sync between the security engineering team, teams working on customer data access initiatives, and the CISO. This meeting will focus on rapid iteration and problem solving.</p> <p>This is explicitly a forum for ongoing <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a>, with CISO serving as the meeting&rsquo;s sponsor, and their Principal Security Engineer serving as the meeting&rsquo;s guide. It will continue until we have clarity on the path to 100% coverage of user-comprehensible, automated rationales for access to customer data.</p> <p>Separately, we are also starting a monthly review of sampled accesses to customer data to ensure the proper usage and function of the rationale-creation mechanisms we build. This meeting&rsquo;s goal is to review access rationales for quality and appropriateness, both by reviewing sampled rationales in the short-term, and identifying more automated mechanisms for identifying high-risk accesses to review in the future.</p> </li> <li> <p><strong>Exceptions must be granted in writing by CISO.</strong> While our overarching Engineering Strategy states that we follow an advisory architecture process as described in <em><a href="https://www.amazon.com/Facilitating-Software-Architecture-Empowering-Architectural-ebook/dp/B0DMHGWCPN/">Facilitating Software Architecture</a></em>, the customer data access policy is an exception and must be explicitly approved, with documentation, by the CISO. Start that process in the <code>#ciso</code> channel.</p> </li> </ul> <h2 id="diagnose">Diagnose</h2> <ul> <li> <p>We have a strong baseline of role-based access controls (RBAC) and audit logging. However, we have limited mechanisms for ensuring assigned roles follow the <a href="https://en.wikipedia.org/wiki/Principle_of_least_privilege">principle of least privilege</a>. This is particularly true in cases where individuals change teams or roles over the course of their tenure at the company: some individuals have collected numerous unused roles over five-plus years at the company.</p> <p>Similarly, our audit logs are durable and pervasive, but we have limited proactive mechanisms for identifying anomalous usage. Instead they are typically used to understand what occurred after an incident is identified by other mechanisms.</p> </li> <li> <p>For resource-level access controls, we rely on a hybrid approach between a 3rd-party platform for incoming user requests, and approval mechanisms within our own product. Providing a rationale for access across these two systems requires manual work, and those rationales are later manually reviewed for appropriateness in a batch fashion.</p> <p>There are two major ongoing problems with our current approach to resource-level access controls. First, the teams making requests view them as a burdensome obligation without much benefit to them or on behalf of the user. Second, because the rationale review steps are manual, there is no verifiable evidence of the quality of the review.</p> </li> <li> <p>We&rsquo;ve found no evidence of misuse of user data. When colleagues do access user data, we have uniformly and consistently found that there is a clear, and reasonable rationale for that access. For example, a ticket in the user support system where the user has raised an issue.</p> <p>However, the quality of our documented rationales is consistently low because it depends on busy people manually copying over significant information many times a day. Because the rationales are of low quality, the verification of these rationales is somewhat arbitrary. From a literal compliance perspective, we do provide rationales and auditing of these rationales, but it&rsquo;s unclear if the majority of these audits increase the security of our users&rsquo; data.</p> </li> <li> <p>Historically, we&rsquo;ve made significant security investments that caused temporary spikes in our security posture. However, looking at those initiatives a year later, in many cases we see a pattern of increased scrutiny, followed by a gradual repeal or avoidance of the new mechanisms.</p> <p>We have found that most of them involved increased friction for essential work performed by other internal teams. In the natural order of performing work, those teams would subtly subvert the improvements because it interfered with their immediate goals (e.g. supporting customer requests).</p> </li> <li> <p>As such, we have high conviction from our track record that our historical approach can create optical wins internally. We have limited conviction that it can create long-term improvements outside of significant, unlikely internal changes (e.g. colleagues are markedly less busy a year from now than they are today). It seems likely we need a new approach to meaningfully shift our stance on these kinds of problems.</p> </li> </ul> <h2 id="explore">Explore</h2> <p>Our experience is that best practices around managing internal access to user data are <a href="https://lethain.com/exploring-for-strategy/">widely available through our networks</a>, and otherwise hard to find. The exact rationale for this is hard to determine, but it seems possible that it&rsquo;s a topic that folks are generally uncomfortable discussing in public on account of potential future liability and compliance issues.</p> <p>In our exploration, we found two standardized dimensions (role-based access controls, audit logs), and one highly divergent dimension (resource-specific access controls):</p> <ul> <li> <p><strong>Role-based access controls</strong> (RBAC) are a highly standardized approach at this point. The core premise is that users are mapped to one or more roles, and each role is granted a certain set of permissions. For example, a role representing the customer support agent might be granted permission to deactivate an account, whereas a role representing the sales engineer might be able to configure a new account.</p> </li> <li> <p><strong>Audit logs</strong> are similarly standardized. All access and mutation of resources should be tied in a durable log to the human who performed the action. These logs should be accumulated in a centralized, queryable solution.</p> <p>One of the core challenges is determining how to utilize these logs proactively to detect issues rather than reactively when an issue has already been flagged.</p> </li> <li> <p><strong>Resource-level access controls</strong> are significantly less standardized than RBAC or audit logs. We found three distinct patterns adopted by companies, with little consistency across companies on which is adopted.</p> </li> </ul> <p>Those three patterns for resource-level access control were:</p> <ol> <li> <p><strong>3rd-party enrichment</strong> where access to resources is managed in a 3rd-party system such as Zendesk. This requires enriching objects within those systems with data and metadata from the product(s) where those objects live. It also requires implementing actions on the platform, such as archiving or configuration, allowing them to live entirely in that platform&rsquo;s permission structure.</p> <p>The downside of this approach is tight coupling with the platform vendor, any limitations inherent to that platform, and the overhead of maintaining engineering teams familiar with both your internal technology stack and the platform vendor&rsquo;s technology stack.</p> </li> <li> <p><strong>1st-party tool implementation</strong> where all activity, including creation and management of user issues, is managed within the core product itself. This pattern is most common in earlier stage companies or companies whose customer support leadership &ldquo;grew up&rdquo; within the organization without much exposure to the approach taken by peer companies.</p> <p>The advantage of this approach is that there is a single, tightly integrated and infinitely extensible platform for managing interactions. The downside is that you have to build and maintain all of that work internally rather than pushing it to a vendor that ought to be able to invest more heavily into their tooling.</p> </li> <li> <p><strong>Hybrid solutions</strong> where a 3rd-party platform is used for most actions, and is further used to permit resource-level access within the 1st-party system. For example, you might be able to access a user&rsquo;s data only while there is an open ticket created by that user, and assigned to you, in the 3rd-party platform.</p> <p>The advantage of this approach is that it allows supporting complex workflows that don&rsquo;t fit within the platform&rsquo;s limitations, and allows you to avoid complex coupling between your product and the vendor platform.</p> </li> </ol> <p>Generally, our experience is that all companies implement RBAC, audit logs, and one of the resource-level access control mechanisms. Most companies pursue either 3rd-party enrichment with a sizable, long-standing team owning the platform implementation, or rely on a hybrid solution where they are able to avoid a long-standing dedicated team by lumping that work into existing teams.</p>Our own agents with their own tools.https://lethain.com/our-own-agents-our-own-tools/Tue, 04 Feb 2025 04:00:00 -0700https://lethain.com/our-own-agents-our-own-tools/<p>Entering 2025, I decided to spend some time exploring the topic of agents. I started reading Anthropic&rsquo;s <a href="https://www.anthropic.com/research/building-effective-agents">Building effective agents</a>, followed by Chip Huyen&rsquo;s <em><a href="https://www.amazon.com/AI-Engineering-Building-Applications-Foundation/dp/1098166302">AI Engineering</a></em>. I kicked off a major workstream at work on using agents, and I also decided to do a personal experiment of sorts. This is a general commentary on building that project.</p> <p>What I wanted to build was a simple chat interface where I could write prompts, select models, and have the model use tools as appropriate. My side goal was to build this using Cursor and generally avoid writing code directly as much as possible, but I found that generally slower than writing code in emacs while relying on <code>4o-mini</code> to provide working examples to pull from.</p> <p>Similarly, while I initially envisioned building this in fullstack TypeScript via Cursor, I ultimately bailed into a stack that I&rsquo;m more comfortable, and ended up using Python3, FastAPI, PostgreSQL, and SQLAlchemy with the async psycopg3 driver. It&rsquo;s been a&hellip; while&hellip; since I started a brand new Python project, and used this project as an opportunity to get comfortable with Python3&rsquo;s async/await mechanisms along with Python3&rsquo;s typing along with <a href="https://mypy.readthedocs.io/">mypy</a>. Finally, I also wanted to experiment with <a href="https://tailwindcss.com/">Tailwind</a>, and ended up using <a href="https://tailwindui.com/components">TailwindUI&rsquo;s components</a> to build the site.</p> <p>The working version supports everything I wanted: creating chats with models, and allowing those models to use function calling to use tools that I provide. The models are allowed to call any number of tools in pursuit of the problem they are solving. The tool usage is the most interesting part here for sure. The simplest tool I created was a <code>get_temperature</code> tool that provided a fake temperature for your location. This allowed me to ask questions like &ldquo;What should I wear tomorrow in San Francisco, CA?&rdquo; and get a useful respond.</p> <p><img src="https://lethain.com/static/blog/2025/agent-temp.png" alt="Example of an agent responding to query about weather."></p> <p>The code to add this function to my project was pretty straightforward, just three lines of Python and 25 lines of metadata to pass to the OpenAI API.</p> <pre class="prettyprint">def tool_get_current_weather(location: str|None=None, format: str|None=None) -> str: "Simple proof of concept tool." temp = random.randint(40, 90) if format == 'fahrenheit' else random.randint(10, 25) return f"It's going to be {temp} degrees {format} tomorrow." FUNCTION_REGISTRY['get_current_weather'] = tool_get_current_weather TOOL_USAGE_REGISTRY['get_current_weather'] = { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA", }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the users location.", }, }, "required": ["location", "format"], }, } }</pre> <p>After getting this tool, the next tool I added was a simple URL retriever tool, which allowed the agent to grab a URL and use the content of that URL in its prompt.</p> <p><img src="https://lethain.com/static/blog/2025/agent-url.png" alt="An agent using a tool to retrieve the contents of a URL."></p> <p>The implementation for this tool was similarly quite simple.</p> <pre class="prettyprint">def tool_get_url(url: str|None=None) -> str: if url is None: return '' url = str(url) response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') content = soup.find('main') or soup.find('article') or soup.body if not content: return str(response.content) markdown = markdownify(str(content), heading_style="ATX").strip() return str(markdown) FUNCTION_REGISTRY['get_url'] = tool_get_url TOOL_USAGE_REGISTRY['get_url'] = { "type": "function", "function": { "name": "get_url", "description": "Retrieve the contents of a website via its URL.", "parameters": { "type": "object", "properties": { "url": { "type": "string", "description": "The complete URL, including protocol to retrieve. For example: \"https://lethain.com\"", } }, "required": ["url"], }, } }</pre> <p>What&rsquo;s pretty amazing is how much power you can add to your agent by adding such a trivial tool as retrieving a URL. You can similarly imagine adding tools for retrieving and commenting on Github pull requests and so, which could allow a very simple agent tool like this to become quite useful.</p> <p>Working on this project gave me a moderately compelling view of a near-term future where most engineers have simple application like this running that they can pipe events into from various systems (email, text, Github pull requests, calendars, etc), create triggers that map events to templates that feed into prompts, and execute those prompts with tool-aware agents.</p> <p>Combine that with ability for other agents to register themselves with you and expose the tools that they have access to (e.g. schedule an event with tool&rsquo;s owner), and a bunch of interesting things become very accessible with a very modest amount of effort:</p> <ul> <li>You could schedule events between two busy people&rsquo;s calendars, as if both of them had an assistant managing their calendar</li> <li>Reply to your own pull requests with new blog posts, providing feedback on typos and grammatical issues</li> <li>Crawl websites you care about and identify posts you might be interested in</li> <li>Ask the model to generate a system model using <a href="https://github.com/lethain/systems">lethain:systems</a>, run that model, then chart the responses</li> <li>Add a &ldquo;planning tool&rdquo; which allows the model to generate a plan to guide subsequent steps in a complex task. (e.g. getting my calendar, getting a friend&rsquo;s calendar, suggesting a time we could meet)</li> </ul> <p>None of these are exactly lifesaving, but each is somewhat useful, and I imagine there are many more fairly obvious ideas that become easy once you have the necessary scaffolding to make this sort of thing easy.</p> <p>Altogether, I think that I am convinced at this points that agents, using current foundational models, are going to create a number of very interesting experiences that improve our day to day lives in small ways that are, in aggregate, pretty transformational. I&rsquo;m less convinced that this is the way <em>all software</em> should work going forward though, but more thoughts on that over time. (A bunch of fun experiments happening at work, but early days on those.)</p>