Irrational Exuberancehttps://lethain.com/Recent content on Irrational ExuberanceHugo -- gohugo.ioen-usWill LarsonThu, 16 Jan 2025 04:00:00 -0700Bridging theory and practice in engineering strategy.https://lethain.com/bridging-eng-strategy-theory-and-practice/Thu, 16 Jan 2025 04:00:00 -0700https://lethain.com/bridging-eng-strategy-theory-and-practice/<p>Some people I&rsquo;ve worked with have lost hope that engineering strategy actually exists within <em>any</em> engineering organizations. I imagine that they, reading through the <a href="https://lethain.com/components-of-eng-strategy/">steps to build engineering strategy</a>, or the <a href="https://lethain.com/private-equity-strategy/">strategy for navigating private equity ownership</a>, are not impressed. Instead, these ideas probably come across as theoretical at best. In less polite company, they might describe these ideas as fake constructs.</p> <p>Let&rsquo;s talk about it! Because they&rsquo;re right. In fact, they&rsquo;re right in two different ways. First, this book is focused on explaining how to create clean, refine and definitive strategy documents, where initially most real strategy artifacts look rather messy. Second, applying these techniques in practice can require a fair amount of creativity. It might sound easy, but it&rsquo;s quite difficult in practice.</p> <p>This chapter will cover:</p> <ul> <li>Why strategy documents need to be clear and definitive, especially when strategy development has been messy</li> <li>How to iterate on strategy when there are demands for unrealistic timelines</li> <li>Using strategy as non-executives, where others might override your strategy</li> <li>Handling dynamic, quickly changing environments where diagnosis can change frequently</li> <li>Working with indecisive stakeholders who don&rsquo;t provide clarity on approach</li> <li>Surviving other people&rsquo;s bad strategy work</li> </ul> <p>Alright, let&rsquo;s dive into the many ways that praxis doesn&rsquo;t quite line up with theory.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="clear-and-definitive-documents">Clear and definitive documents</h2> <p>As explored in <a href="https://lethain.com/readable-engineering-strategy-documents/">Making engineering strategies more readable</a>, documents that feel intuitive to write are often fairly difficult to read. That&rsquo;s because thinking tends to be a linear-ish journey from a problem to a solution. Most readers, on the other hand, usually just want to know the solution and then to move on. That&rsquo;s because good strategies are read for direction (e.g. when a team wants to understand how they&rsquo;re supposed to solve a specific issue at hand) far more frequently than they&rsquo;re read to build agreement (e.g. building stakeholder alignment during the initial development of the strategy).</p> <p>However, many organizations only produce writer-oriented strategy documents, and may not have any reader-oriented documents at all. If you&rsquo;ve predominantly worked in those sorts of organizations, then the first reader-oriented documents you encounter will seem artificial.</p> <p>There are also organizations that have many reader-oriented documents, but omit the rationale behind those documents. Those documents feel prescriptive and heavy-handed, because the infrequent reader who <em>does</em> want to understand the thinking can&rsquo;t find it. Further, when they want to propose an alternative, they have to do so without the rationale behind the current policies: the absence of that context often transforms what was a collaborative problem-solving opportunity into a political match.</p> <p>With that in mind, I&rsquo;d encourage you to see the frequent absence of these documents as a major opportunity to drive strategy within your organization, rather than evidence that these documents don&rsquo;t work. My experience is that they do.</p> <h2 id="doing-strategy-despite-unrealistic-timelines">Doing strategy despite unrealistic timelines</h2> <p>The most frequent failure mode I see for strategy is when it&rsquo;s rushed, and its authors accept that thinking must stop when the artificial deadline is reached. Taking annual planning at Stripe as an example, <a href="https://www.amazon.com/Scaling-People-Tactics-Management-Building/dp/1953953212/">Claire Hughes Johnson</a> argued that planning expands to fit any timeline, and consequently set a short planning timeline of several weeks. Some teams accepted that as a fixed timeline and <em>stopped planning</em> when the timeline ended, whereas effective teams never stopped planning before or after the planning window.</p> <p>When strategy work is given an artificially or unrealistic timeline, then you should deliver the best draft you can. Afterwards, rather than being finished, you should view yourself as <a href="https://lethain.com/refining-eng-strategy/">starting the refinement process</a>. An open strategy secret is that many strategies never leave the refinement phase, and continue to be tweaked throughout their lifespan. Why should a strategy with an early deadline be any different?</p> <p>Well, there is one important problem to acknowledge: I&rsquo;ve often found that the executive who initially provided the unrealistic timeline intended it as a forcing function to inspire action and quick thinking. If you have a discussion with them directly, they&rsquo;re usually quite open to adjusting the approach. However, the intermediate layers of leadership between that executive and you often calcify on a particular approach which they claim that the executive insists on precisely following.</p> <p>Sometimes having the conversation with the responsible executive is quite difficult. In that case, you do have to work with individuals taking the strategy as literal and unalterable until either you can have the conversation or something goes wrong enough that the executive starts paying attention again. Usually, though, you can find someone who has a communication path, as long as you can articulate the issue clearly.</p> <h2 id="using-strategy-as-non-executives">Using strategy as non-executives</h2> <p>Some engineers will argue that the only valid <a href="https://lethain.com/when-write-down-engineering-strategy/">strategy altitude</a> is the highest one defined by executives, because any other strategy can be invalidated by a new, higher altitude strategy. They would claim that teams simply <em>cannot</em> do strategy, because executives might invalidate it. Some engineering executives would argue the same thing, instead claiming that they can&rsquo;t work on an engineering strategy because the missing product strategy or business strategy might introduce new constraints.</p> <p>I don&rsquo;t agree with this line of thinking at all. To do strategy at any altitude, you have to come to terms with the certainty that new information will show up, and you&rsquo;ll need to revise your strategy to deal with that.</p> <p><a href="https://lethain.com/uber-service-migration-strategy/">Uber&rsquo;s service provisioning strategy</a> is a good counterexample against the idea that you have to wait for someone else to set the strategy table. We were able to find a durable diagnosis despite being a relatively small team within a much larger organization that was relatively indifferent to helping us succeed. When it comes to using strategy, effective diagnosis trumps authority. In my experience, at least as many executives&rsquo; strategies are ravaged by reality&rsquo;s pervasive details as are overridden by higher altitude strategies. The only way to be certain your strategy will fail is waiting until you&rsquo;re certain that no new information might show up and require it changing.</p> <h2 id="doing-strategy-in-chaotic-environments">Doing strategy in chaotic environments</h2> <p><a href="https://lethain.com/llm-adoption-strategy/">How should you adopt LLMs?</a> discusses how a company should plot a path through the rapidly evolving LLM ecosystem. Periods of rapid technology evolution are one reason why your strategy might encounter a pocket of chaos, but there are many others. Pockets of rapid hiring, as well as layoffs, create chaos. The departure of load-bearing senior leaders can change a company quickly. Slowing revenue in a company&rsquo;s core business can also initiate chaotic actions in pursuit of a new business.</p> <p>Strategies don&rsquo;t require stable environments. Instead, strategies require awareness of the environment that they&rsquo;re operating in. In a stable period, a strategy might expect to run for several years and expect relatively little deviation from the initial approach. In a dynamic period, the strategy might know you can only protect capacity in two-week chunks before a new critical initiative pops up. It&rsquo;s possible to good strategy in either scenario, but it&rsquo;s impossible to good strategy if you don&rsquo;t diagnose the context effectively.</p> <h2 id="unreliable-information">Unreliable information</h2> <p>Often times, the strategy forward is very obvious if a few key decisions were made, you know who is supposed to make those decisions, but you simply cannot get them to decide. My most visceral experience of this was conducting a layoff where the CEO wouldn&rsquo;t define a target cost reduction or a thesis of how much various functions (e.g. engineering, marketing, sales) should contribute to those reductions. With those two decisions, engineering&rsquo;s approach would be obvious, and without that clarity things felt impossible.</p> <p>Although I was frustrated at the time, I&rsquo;ve since come to appreciate that missing decisions are the norm rather than the exception. The strategy on <a href="https://lethain.com/private-equity-strategy/">Navigating Private Equity ownership</a> deals with this problem by acknowledging a missing decision, and expressly blocking one part of its execution on that decision being made. Other parts of its plan, like changing how roles are backfilled, went ahead to address the broader cost problem.</p> <p>Rather than blocking on missing information, your strategy should acknowledge what&rsquo;s missing, and move forward where you can. Sometimes that&rsquo;s moving forward by taking risk, sometimes that&rsquo;s delaying for clarity, but it&rsquo;s never accepting yourself as stuck without options other than pointing a finger.</p> <h2 id="surviving-other-peoples-bad-strategy-work">Surviving other people&rsquo;s bad strategy work</h2> <p>Sometimes you will be told to follow something which is described as a strategy, but is really just a policy without any strategic thinking behind it. This is an unavoidable element of working in organizations and happens for all sorts of reasons. Sometimes, your organization&rsquo;s leader doesn&rsquo;t believe it&rsquo;s valuable to explain their thinking to others, because they see themselves as the one important decision maker.</p> <p>Other times, your leader doesn&rsquo;t agree with a policy they&rsquo;ve been instructed to rollout. Adoption of &ldquo;high hype&rdquo; technologies like blockchain technologies during the crypto book was often top-down direction from company leadership that engineering disagreed with, but was obligated to align with. In this case, your leader is finding that it&rsquo;s hard to explain a strategy that they themselves don&rsquo;t understand either.</p> <p>This is a frustrating situation. What I&rsquo;ve found most effective is writing a strategy of my own, one that acknowledges the broader strategy I disagree with in its diagnosis as a static, unavoidable truth. From there, I&rsquo;ve been able to make practical decisions that recognize the context, even if it&rsquo;s not a context I&rsquo;d have selected for myself.</p> <h2 id="summary">Summary</h2> <p>I started this chapter by acknowledging that the <a href="https://lethain.com/components-of-eng-strategy/">steps to building engineering strategy</a> are a theory of strategy, and one that can get quite messy in practice. Now you know why strategy documents often come across as overly pristine&ndash;because they&rsquo;re trying to communicate clearly about a complex topic.</p> <p>You also know how to navigate the many ways reality pulls you away from perfect strategy, such as unrealistic timelines, higher altitude strategies invalidating your own strategy work, working in a chaotic environment, and dealing with stakeholders who refuse to align with your strategy. Finally, we acknowledged that sometimes strategy work done by others is not what we&rsquo;d consider strategy, it&rsquo;s often unsupported policy with neither a diagnosis nor an approach to operating the policy.</p> <p>That&rsquo;s all stuff you&rsquo;re going to run into, and it&rsquo;s all stuff you&rsquo;re going to overcome on the path to doing good strategy work.</p>Uber's service migration strategy circa 2014.https://lethain.com/uber-service-migration-strategy/Thu, 09 Jan 2025 06:00:00 -0700https://lethain.com/uber-service-migration-strategy/<p>In early 2014, I joined as an engineering manager for Uber&rsquo;s Infrastructure team. We were responsible for a wide number of things, including provisioning new services. While the overall team I led grew significantly over time, the subset working on service provisioning never grew beyond four engineers.</p> <p>Those four engineers successfully migrated 1,000+ services onto a new, future-proofed service platform. More importantly, they did it while absorbing the majority, although certainly not the entirety, of the migration workload onto that small team rather than spreading it across the 2,000+ engineers working at Uber at the time. Their strategy serves as an interesting case study of how a team can drive strategy, even without any executive sponsor, by focusing on solving a pressing user problem, and providing effective ergonomics while doing so.</p> <div class="bg-light-gray br4 ph3 pv1"> <p>Note that after this introductory section, the remainder of this strategy will be written from the perspective of 2014, when it was originally developed.</p> </div> <p>More than a decade later after this strategy was implemented, we have an interesting perspective to evaluate its impact. It&rsquo;s fair to say that it had some meaningful, negative consequences by allowing the widespread proliferation of new services within Uber. Those services contributed to a messy architecture that had to go through cycles of internal cleanup over the following years.</p> <p>As the principle author of this strategy, I&rsquo;ve learned a lot from meditating on the fact that this strategy was wildly successful, that I think Uber is better off for having followed it, and that it also meaningfully degraded Uber&rsquo;s developer experience over time. There&rsquo;s both good and bad here; with a wide enough lens, all evaluations get complicated.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="reading-this-document">Reading this document</h2> <p>To apply this strategy, start at the top with <em>Policy</em>. To understand the thinking behind this strategy, read sections in reserve order, starting with <em>Explore</em>, then <em>Diagnose</em> and so on. Relative to the default structure, this document one tweak, folding the <em>Operation</em> section in with <em>Policy</em>.</p> <p>More detail on this structure in <a href="https://lethain.com/readable-engineering-strategy-documents">Making a readable Engineering Strategy document</a>.</p> <h2 id="policy--operation">Policy &amp; Operation</h2> <p>We&rsquo;ve adopted these guiding principles for extending Uber&rsquo;s service platform:</p> <ul> <li> <p><strong>Constrain manual provisioning allocation to maximize investment in self-service provisioning.</strong> The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks. We will move the remaining engineers to work on automation to speed up future service provisioning. This will degrade manual provisioning in the short term, but the alternative is permanently degrading provisioning by the influx of new service requests from newly hired product engineers.</p> </li> <li> <p><strong>Self-service must be safely usable by a new hire without Uber context.</strong> It is possible today to make a Puppet or Clusto change while provisioning a new service that negatively impacts the production environment. This must not be true in any self-service solution.</p> </li> <li> <p><strong>Move to structured requests, and out of tickets.</strong> Missing or incorrect information in provisioning requests create significant delays in provisioning. Further, collecting this information is the first step of moving to a self-service process. As such, we can get paid twice by reducing errors in manual provisioning while also creating the interface for self-service workflows.</p> </li> <li> <p><strong>Prefer initializing new services with good defaults rather than requiring user input.</strong> Most new services are provisioned for new projects with strong timeline pressure but little certainty on their long-term requirements. These users cannot accurately predict their future needs, and expecting them to do so creates significant friction.</p> <p>Instead, the provisioning framework should suggest good defaults, and make it easy to change the settings later when users have more clarity. The gate from development environment to production environment is a particularly effective one for ensuring settings are refreshed.</p> </li> </ul> <p>We are materializing those principles into this sequenced set of tasks:</p> <ol> <li> <p>Create an internal tool that coordinates service provisioning, replacing the process where teams request new services via Phabricator tickets. This new tool will maintain a schema of required fields that must be supplied, with the aim of eliminating the majority of back and forth between teams during service provisioning.</p> <p>In addition to capturing necessary data, this will also serve as our interface for automating various steps in provisioning without requiring future changes in the workflow to request service provisioning.</p> </li> <li> <p>Extend the internal tool will generate Puppet scaffolding for new services, reducing the potential for errors in two ways. First, the data supplied in the service provisioning request can be directly included into the rendered template. Second, this will eliminate most human tweaking of templates where typo&rsquo;s can create issues.</p> </li> <li> <p>Port allocation is a particularly high-risk element of provisioning, as reusing a port can break routing to an existing production service. As such, this will be the first area we fully automate, with the provisioning service supplying the allocated port rather than requiring requesting teams to provide an already allocated port.</p> <p>Doing this will require moving the port registry out of a Phabricator wiki page and into a database, which will allow us to guard access with a variety of checks.</p> </li> <li> <p>Manual assignment of new services to servers often leads to new services being allocated to already heavily utilized servers. We will replace the manual assignment with an automated system, and do so with the intention of migrating to the Mesos/Aurora cluster once it is available for production workloads.</p> </li> </ol> <p>Each week, we&rsquo;ll review the size of the service provisioning queue, along with the service provisioning time to assess whether the strategy is working or needs to be revised.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><strong>Prolonged strategy testing</strong></p> <p>Although I didn&rsquo;t have a name for this practice in 2014 when we created and implemented this strategy, the preceding paragraph captures an important truth of team-led bottom-up strategy: the entire strategy was implemented in a prolonged <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a> phase.</p> <p>This is an important truth of all low-attitude, bottom-up strategy: because you don&rsquo;t have the authority to mandate compliance. An executive&rsquo;s high-altitude strategy can be enforced despite not working due to their organizational authority, but a team&rsquo;s strategy will only endure while it remains effective.</p> </div> <h2 id="refine">Refine</h2> <p>In order to refine our diagnosis, we&rsquo;ve <a href="https://lethain.com/uber-service-onboarding-model/">created a systems model for service onboarding</a>. This will allow us to simulate a variety of different approaches to our problem, and determine which approach, or combination of approaches, will be most effective.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-provis-model-errors.png" alt="A systems model of provisioning services at Uber circa 2014."></p> <p>As we exercised the model, it became clear that:</p> <ol> <li>we are increasingly falling behind,</li> <li>hiring onto the service provisioning team is not a viable solution, and</li> <li>moving to a self-service approach is our only option.</li> </ol> <p>While the model writeup justifies each of those statements in more detail, we&rsquo;ll include two charts here. The first chart shows the status quo, where new service provisioning requests, labeled as <code>Initial RequestedServices</code>, quickly accumulate into a backlog.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-diag-1.png" alt="Initial diagram of Uber service provisioning model without error states."></p> <p>Second, we have a chart comparing the outcomes between the current status quo and a self-service approach.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-chart-self-service.png" alt="Chart showing impact of self-service provisioning on provisioning rate."></p> <p>In that chart, you can see that the service provisioning backlog in the self-service model remains steady, as represented by the <code>SelfService RequestedServices</code> line. Of the various attempts to find a solution, none of the others showed promise, including eliminating all errors in provisioning and increasing the team&rsquo;s capacity by 500%.</p> <h2 id="diagnose">Diagnose</h2> <p>We&rsquo;ve diagnosed the current state of service provisioning at Uber as:</p> <ul> <li> <p>Many product engineering teams are aiming to leave the centralized monolith, which is generating two to three service provisioning requests each week. We expect this rate to increase roughly linearly with the size of the product engineering organization.</p> <p>Even if we disagree with this shift to additional services, there&rsquo;s no team responsible for maintaining the extensibility of the monolith, and working in the monolith is the number one source of developer frustration, so we don&rsquo;t have a practical counter proposal to offer engineers other than provisioning a new service.</p> </li> <li> <p>The engineering organization is doubling every six months. Consequently, a year from now, we expect eight to twelve service provisioning requests every week.</p> </li> <li> <p>Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing.</p> <p>Some additional headcount is being allocated to Service Reliability Engineers (SREs) who can take on the most nuanced, complicated service provisioning work. However, their bandwidth is already heavily constrained across many tasks, so relying on SRES is an insufficient solution.</p> </li> <li> <p>The queue for service provisioning is already increasing in size as things are today. Barring some change, many services will not be provisioned in a timely fashion.</p> </li> <li> <p>Today, provisioning a new service takes about a week, with numerous round trips between the requesting team and the provisioning team. Missing and incorrect information between teams is the largest source of delay in provisioning services.</p> <p>If the provisioning team has all the necessary information, and it&rsquo;s accurate, then a new service can be provisioned in about three to four hours of work across configuration in Puppet, metadata in Clusto, allocating ports, assigning the service to servers, and so on.</p> </li> <li> <p>There are few safeguards on port allocation, server assignment and so on. It is easy to inadvertently cause a production outage during service provisioning unless done with attention to detail.</p> <p>Given our rate of hiring, training the engineering organization to use this unsafe toolchain is an impractical solution: even if we train the entire organization perfectly today, there will be just as many untrained individuals in six months. Further, product engineering leadership has no interest in their team being diverted to service provisioning training.</p> </li> <li> <p>It&rsquo;s widely agreed across the infrastructure engineering team that essentially every component of service provisioning should be replaced as soon as possible, but there is no concrete plan to replace any of the core components. Further, there is no team accountable for replacing these components, which means the service provisioning team will either need to work around the current tooling or replace that tooling ourselves.</p> </li> <li> <p>It&rsquo;s urgent to unblock development of new services, but moving those new services to production is rarely urgent, and occurs after a long internal development period. Evidence of this is that requests to provision a new service generally come with significant urgency and internal escalations to management. After the service is provisioned for development, there are relatively few urgent escalations other than one-off requests for increased production capacity during incidents.</p> </li> <li> <p>Another team within infrastructure is actively exploring adoption of Mesos and Aurora, but there&rsquo;s no concrete timeline for when this might be available for our usage. Until they commit to supporting our workloads, we&rsquo;ll need to find an alternative solution.</p> </li> </ul> <h2 id="explore">Explore</h2> <p>Uber&rsquo;s server and service infrastructure today is composed of a handful of pieces. First, we run servers on-prem within a handful of colocations. Second, we describe each server in Puppet manifests to support repeatable provisioning of servers. Finally, we manage fleet and server metadata in a tool named Clusto, originally created by Digg, which allows us to populate Puppet manifests with server and cluster appropriate metadata during provisioning. In general, we agree that our current infrastructure is nearing its end of lifespan, but it&rsquo;s less obvious what the appropriate replacements are for each piece.</p> <p>There&rsquo;s significant internal opposition to running in the cloud, up to and including our CEO, so we don&rsquo;t believe that will change in the foreseeable future. We do however believe there&rsquo;s opportunity to change our service definitions from Puppet to something along the lines of Docker, and to change our metadata mechanism towards a more purpose-built solution like Mesos/Aurora or Kubernetes.</p> <p>As a starting point, we find it valuable to read <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf">Large-scale cluster management at Google with Borg</a> which informed some elements of the approach to Kubernetes, and <a href="https://people.eecs.berkeley.edu/~alig/papers/mesos.pdf">Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center</a> which describes the Mesos/Aurora approach.</p> <div class="bg-light-gray br4 ph3 pv1"> <p>If you&rsquo;re wondering why there&rsquo;s no mention of <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44843.pdf">Borg, Omega, and Kubernetes</a>, it&rsquo;s because it wasn&rsquo;t published until 2016, a year after this strategy was developed.</p> </div> <p>Within Uber, we have a number of ex-Twitter engineers who can speak with confidence to their experience operating with Mesos/Aurora at Twitter. We have been unable to find anyone to speak with that has production Kubernetes experience operating a comparably large fleet of 10,000+ servers, although presumably someone is operating&ndash;or close to operating&ndash;Kuberenetes at that scale.</p> <p>Our general belief of the evolution of the ecosystem at the time is <a href="https://lethain.com/wardley-compute-ecosystem/">described in this Wardley mapping exercise on service orchestration (2014)</a>.</p> <p><img src="https://lethain.com/static/blog/strategy/wardley-compute-v2.png" alt="Wardley map of evolution of service orchestration in 2014"></p> <p>One of the unknowns today is how the evolution of Mesos/Aurora and Kubernetes will look in the future. Kubernetes seems promising with Google&rsquo;s backing, but there are few if any meaningful production deployments today. Mesos/Aurora has more community support and more production deployments, but the absolute number of deployments remains quite small, and there is no large-scale industry backer outside of Twitter.</p> <p>Even further out, there&rsquo;s considerable excitement around &ldquo;serverless&rdquo; frameworks, which seem like a likely future evolution, but canvassing the industry and our networks we&rsquo;ve simply been unable to find enough real-world usage to make an active push towards this destination today.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><a href="https://lethain.com/wardley-mapping/">Wardley mapping</a> is introduced as one of the techniques for <a href="https://lethain.com/refining-eng-strategy/">strategy refinement</a>, but it can also be a useful technique for exploring an dynamic ecosystem like service orchestration in 2014.</p> <p>Assembling each strategy requires exercising judgment on how to compile the pieces together most usefully, and in this case I found that the map fits most naturally with the rest of exploration rather than in the more operationally-focused refinement section.</p> </div>Service onboarding model for Uber (2014).https://lethain.com/uber-service-onboarding-model/Thu, 09 Jan 2025 05:00:00 -0700https://lethain.com/uber-service-onboarding-model/<p>At the core of <a href="https://lethain.com/uber-service-migration-strategy/">Uber&rsquo;s service migration strategy (2014)</a> is understanding the service onboarding process, and identifying the levers to speed up that process. Here we&rsquo;ll develop a <a href="https://lethain.com/strategy-systems-modeling/">system model</a> representing that onboarding process, and exercise the model to test a number of hypotheses about how to best speed up provisioning.</p> <p>In this chapter, we&rsquo;ll cover:</p> <ol> <li>Where the model of service onboarding suggested we focus on efforts</li> <li>Developing a system model using the <a href="https://github.com/lethain/systems">lethain/systems</a> package on Github. That model <a href="https://github.com/lethain/eng-strategy-models/blob/main/UberServiceOnboarding.ipynb">is available in the lethain/eng-strategy-models</a> repository</li> <li>Exercising that model to learn from it</li> </ol> <p>Let&rsquo;s figure out what this model can teach us.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in</em> <em><a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="learnings">Learnings</h2> <p>Even if we model this problem with a 100% success rate (e.g. no errors at all), then the backlog of requested new services continues to increase over time. This clarifies that the problem to be solved is not the quality of service the service provisioning team is providing, but rather that the fundamental approach is not working.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-diag-1.png" alt="Initial diagram of Uber service provisioning model without error states."></p> <p>Although hiring is tempting as a solution, our model suggests it is not a particularly valuable approach in this scenario. Even increasing the Service Provisioning team&rsquo;s staff allocated to manually provisioning services by 500% doesn&rsquo;t solve the backlog of incoming requests.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-chart-infra-hiring.png" alt="Chart showing impact of increased infrastructure engineering hiring on service provisioning."></p> <p>If reducing errors doesn&rsquo;t solve the problem, and increased hiring for the team doesn&rsquo;t solve the problem, then we have to find a way to eliminate manual service provisioning entirely. The most promising candidate is moving to a self-service provisioning model, which our model shows solves the backlog problem effectively.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-chart-self-service.png" alt="Chart showing impact of self-service provisioning on provisioning rate."></p> <p>Refining our earlier statement, additional hiring may benefit the team if we are able to focus those hires on building self-service provisioning, and were able to <a href="https://lethain.com/productivity-in-the-age-of-hypergrowth/">ramp their productivity</a> faster than the increase of incoming service provisioning requests.</p> <h2 id="sketch">Sketch</h2> <p>Our initial sketch of service provisioning is a simple pipieline starting with <code>requested services</code> and moving step by step through to <code>server capacity allocated</code>. Some of these steps are likely much slower than others, but it gives a sense of the stages and where things might go wrong. It also gives us a sense of what we can measure to evaluate if our approach to provisioning is working well.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-provis-model.png" alt="A systems model of provisioning services at Uber circa 2014."></p> <p>One element worth mentioning are the dotted lines from <code>hiring rate</code> to <code>product engineers</code> and from <code>product engineers</code> to <code>requested services</code>. These are called <em>links</em>, which are stocks that influence another stock, but don&rsquo;t flow directly into them.</p> <div class="bg-light-gray br4 ph3 pv1"> <p>A purist would correctly note that links should connect to flows rather than stocks. That is true! However, as we&rsquo;ll encounter when we convert this sketch into a model, there are actually several counterintuitive elemnents here that are necessary to model this system but make the sketch less readable. As a modeler, you&rsquo;ll frequently encounter these sorts of tradeoffs, and you&rsquo;ll have to decide what choices serve your needs best in the moment.</p> </div> <p>The biggest missing element the initial model is missing is error flows, where things can sometimes go wrong in addition to sometimes going right. There are many ways things can go wrong, but we&rsquo;re going to focus on modeling three error flows in particular:</p> <ol> <li> <p><code>Missing/incorrect information</code> occurs twice in this model, and throws a provisioning request back into the initial provisioning phase where information is collected.</p> <p>When this occurs during port assignment, this is a relatively small trip backwards. However, when it occurs in Puppet configuration, this is a significantly larger step backwards.</p> </li> <li> <p><code>Puppet error</code> occurs in the second to final stock, <code>Puppet configuration tested &amp; merged</code>. This sends requests back one step in the provisioning flow.</p> </li> </ol> <p>Updating our sketch to reflect these flows, we get a fairly complete, and somewhat nuanced, view of the service provisioning flow.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-provis-model-errors.png" alt="A systems model of provisioning services at Uber circa 2014, with error transitions"></p> <p>Note that the combination of these two flows introduces the possibility of a service being almost fully provisioned, but then traveling from Puppet testing back to Puppet configuration due to <code>Puppet error</code>, and then backwards again to the intial step due to <code>Missing/incorrect information</code>. This means it&rsquo;s possible to lose almost all provisioning progress if everything goes wrong.</p> <p>There are more nuances we could introduce here, but there&rsquo;s already enough complexity here for us to learn quite a bit from this model.</p> <h2 id="reason">Reason</h2> <p>Studying our sketches, a few things stands out:</p> <ol> <li> <p>The hiring of product engineers is going to drive up service provisioning requests over time, but there&rsquo;s no counterbalancing hiring of infrastructure engineers to work on service provisioning. This means there&rsquo;s an implicit, but very real, deadline to scale this process independently of the size of the infrastructure engineering team.</p> <p>Even without building the full model, it&rsquo;s clear that we have to either stop hiring product engineers, turn this into a self-service solution, or find a new mechanism to discourage service provisioning.</p> </li> <li> <p>The size of error rates are going to influence results a great deal, particularly those for <code>Missing/incorrect information</code>. This is probably the most valuable place to start looking for efficiency improvements.</p> </li> <li> <p>Missing information errors are more expensive than the model implies, because they require coordination across teams to resolve. Conversely, Puppet testing errors are probably cheaper than the model implies, because they should be solvable within the same team and consequently benefit from a quick iteration loop.</p> </li> </ol> <p>Now we need to build a model that helps guide our inquiry into those questions.</p> <h2 id="model">Model</h2> <p>You can find the <a href="https://github.com/lethain/eng-strategy-models/blob/main/UberServiceOnboarding.ipynb">full implementation of this model on Github</a> if you want to see the entirety rather than these emphasized snippets.</p> <p>First, let&rsquo;s get the success states working:</p> <pre><code>HiringRate(10) ProductEngineers(1000) [PotentialHires] &gt; ProductEngineers @ HiringRate [PotentialServices] &gt; RequestedServices(10) @ ProductEngineers / 10 RequestedServices &gt; InflightServices(0, 10) @ Leak(1.0) InflightServices &gt; PortNameAssigned @ Leak(1.0) PortNameAssigned &gt; PuppetGenerated @ Leak(1.0) PuppetGenerated &gt; PuppetConfigMerged @ Leak(1.0) PuppetConfigMerged &gt; ServerCapacityAllocated @ Leak(1.0) </code></pre> <p>As we run this model, we can see that the number of requested services grows significantly over time. This makes sense, as we&rsquo;re only able to provision a maximum of ten services per round.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-diag-1.png" alt="Initial diagram of Uber service provisioning model without error states."></p> <p>However, it&rsquo;s also the best case, because we&rsquo;re not capturing the three error states:</p> <ol> <li>Unique port and name assignment can fail because of missing or incorrect information</li> <li>Puppet configuration can also fail due to missing or incorrect information.</li> <li>Puppet configurations can have errors in them, requiring rework.</li> </ol> <p>Let&rsquo;s update the model to include these failure modes, starting with unique port and name assignment. The error-free version looks like this:</p> <pre><code>InflightServices &gt; PortNameAssigned @ Leak(1.0) </code></pre> <p>Now let&rsquo;s add in an error rate, where 20% of requests are missing information and return to inflight services stock.</p> <pre><code>PortNameAssigned &gt; PuppetGenerated @ Leak(0.8) PortNameAssigned &gt; RequestedServices @ Leak(0.2) </code></pre> <p>Then let&rsquo;s do the same thing for puppet configuration errors:</p> <pre><code># original version PuppetGenerated &gt; PuppetConfigMerged @ Leak(1.0) # updated version with errors PuppetGenerated &gt; PuppetConfigMerged @ Leak(0.8) PuppetGenerated &gt; InflightServices @ Leak(0.2) </code></pre> <p>Finally, we&rsquo;ll make a similar change to represent errors made in the Puppet templates themselves:</p> <pre><code># original version PuppetConfigMerged &gt; ServerCapacityAllocated @ Leak(1.0) # updated version with errors PuppetConfigMerged &gt; ServerCapacityAllocated @ Leak(0.8) PuppetConfigMerged &gt; PuppetGenerated @ Leak(0.2) </code></pre> <p>Even with relatively low error rates, we can see that the throughput of the system overall has been meaningfully impacted by introducing these errors.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-diag-2.png" alt="Updated diagram of Uber service provisioning model with error states."></p> <p>Now that we have the foundation of the model built, it&rsquo;s time to start exercising the model to understand the problem space a bit better.</p> <h2 id="exercise">Exercise</h2> <p>We already know the errors are impacting throughput, but let&rsquo;s start by narrowing down which of errors matter most by increasing the error rate for each of them independently and comparing the impact.</p> <p>To model this, we&rsquo;ll create three new specifications, each of which increases one error from from 20% error rate to 50% error rate, and see how the overall throughput of the system is impacted:</p> <pre><code># test 1: port assignment errors increased PortNameAssigned &gt; PuppetGenerated @ Leak(0.5) PortNameAssigned &gt; RequestedServices @ Leak(0.5) # test 2: puppet generated errors increased PuppetGenerated &gt; PuppetConfigMerged @ Leak(0.5) PuppetGenerated &gt; InflightServices @ Leak(0.5) # test 3: puppet merged errors increased PuppetConfigMerged &gt; ServerCapacityAllocated @ Leak(0.5) PuppetConfigMerged &gt; PuppetGenerated @ Leak(0.5) </code></pre> <p>Comparing the impact of increasing the error rates from 20% to 50% in each of the three error loops, we can get a sense of the model&rsquo;s sensitivity to each error.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-chart-diff-errors.png" alt="Chart showing impact of increased error rates in different stages of provisioning."></p> <p>This chart captures why exercising is so impactful: we&rsquo;d assumed during sketching that errors in puppet generation would matter the most because they caused a long trip backwards, but it turns out a very high error rate early in the process matters even more because there are still multiple other potential errors later on that compound on its increase.</p> <p>Next we can get a sense of the impact of hiring more people onto the service provisioning team to manually provision more services, which we can model by increasing the maximum size of the inflight services stock from <code>10</code> to <code>50</code>.</p> <pre><code># initial model RequestedServices &gt; InflightServices(0, 10) @ Leak(1.0) # with 5x capacity! RequestedServices &gt; InflightServices(0, 50) @ Leak(1.0) </code></pre> <p>Unfortunately, we can see that even increasing the team&rsquo;s capacity by 500% doesn&rsquo;t solve the backlog of requested services.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-chart-infra-hiring.png" alt="Chart showing impact of increased infrastructure engineering hiring on service provisioning."></p> <p>There&rsquo;s some impact, but that much, and the backlog of requested services remains extremely high. We can conclude that more infrastructure hiring isn&rsquo;t the solution we need, but let&rsquo;s see if moving to self-service is a plausible solution.</p> <p>We can simulate the impact of moving to self-service by removing the maximum size from inflight services entirely:</p> <pre><code># initial model RequestedServices &gt; InflightServices(0, 10) @ Leak(1.0) # simulating self-service RequestedServices &gt; InflightServices(0) @ Leak(1.0) </code></pre> <p>We can see this finally solves the backlog.</p> <p><img src="https://lethain.com/static/blog/strategy/uber-model-chart-self-service.png" alt="Chart showing impact of self-service provisioning on provisioning rate."></p> <p>At this point, we&rsquo;ve exercised the model a fair amount and have a good sense of what it wants to tell us. We know which errors matter the most to invest in early, and we also know that we need to make the move to a self-service platform sometime soon.</p>Refining strategy with Wardley Mapping.https://lethain.com/wardley-mapping/Thu, 02 Jan 2025 06:00:00 -0700https://lethain.com/wardley-mapping/<p>The first time I heard about Wardley Mapping was from Charity Majors discussing it on Twitter. Of the three core <a href="https://lethain.com/refining-eng-strategy/">strategy refinement techniques</a>, this is the technique that I&rsquo;ve personally used the least. Despite that, I decided to include it in this book because it highlights how many different techniques can be used for refining strategy, and also because it&rsquo;s particularly effective at looking at the broadest ecosystems your organization exists in.</p> <p>Where the other techniques like <a href="https://lethain.com/strategy-systems-modeling/">systems thinking</a> and <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a> often zoom in, Wardley mapping is remarkably effective at zooming out.</p> <p>In this chapter, we&rsquo;ll cover:</p> <ul> <li>A ten-minute primer on Wardley mapping</li> <li>Recommendations for tools to create Wardley maps</li> <li>When Wardley maps are an ideal strategy refinement tool, and when they&rsquo;re not</li> <li>The process I use to map, as well as integrate a Wardley map into strategy creation</li> <li>Breadcrumbs to specific Wardley maps that provide examples</li> <li>Documenting a Wardley map in the context of a strategy writeup</li> <li>Why I limited focus on two elements of Wardley&rsquo;s work: doctrines and gameplay</li> </ul> <p>After working through this chapter, and digging into some of this book&rsquo;s examples of Wardley Maps, you&rsquo;ll have a good background to start your own mapping practice.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="ten-minute-primer">Ten minute primer</h2> <p>Wardley maps are a technique created by Simon Wardley to ensure your strategy is grounded in reality. Or, as mapping practioners would say, it&rsquo;s a tool for creating situational awareness. If you have a few days, you might want to start your dive into Wardley mapping by reading Simon Wardley&rsquo;s book on the topic, <em><a href="https://medium.com/wardleymaps/on-being-lost-2ef5f05eb1ec">Wardley Maps</a></em>. If you only have ten minutes, then this section should be enough to get you up to speed on reading Wardley maps.</p> <p>Picking an example to work through, we&rsquo;re going to create a Wardley map that aims to understand a knowledge base management product, along the lines of a wiki like Confluence or Notion.</p> <p><img src="https://lethain.com/static/blog/strategy/intro-wardley-init.png" alt="Diagram showing a basic Wardley map for a knowledge base management application."></p> <p>You need to know three foundational concepts to read a Wardley map:</p> <ol> <li> <p>Maps are populated with three kinds of components: users, needs, and capabilities. Users exist at the top, and represent a cohort of users who will use your product. Each kind of user has a specific set of needs, generally tasks that they need to accomplish. Each need requires certain capabilities required to fulfill that need.</p> <p>Any box connecting directly to a user is a need. Any box connecting to a need is a capability. A capability can be connected to any number of needs, but can never connect directly to a user; they connect to users only indirectly via a need.</p> </li> <li> <p>The x-axis is divided into four segments, representing how commoditized a capability is. On the far left is genesis, which represents a brand-new capability that hasn&rsquo;t existed before. On the far right is commoditized, something so standard and expected that it&rsquo;s unremarkable, like turning on a switch causing electricity to flow. In between are custom and product, the two categories where most items fall on the map. Custom represents something that requires specialized expertise and operation to function, such as a web application that requires software engineers to build and maintain. Product represents something that can generally be bought.</p> <p>In this map, document reading is commoditized: it&rsquo;s unremarkable if your application allows its users to read content. On the other hand, document editing is someone on the border of product and custom. You might integrate an existing vendor for document editing needs, or you might build it yourself, but in either case document editing is less commoditized than document reading.</p> </li> <li> <p>The y-axis represents visibility to the user. In this map, reading documents is something that is extremely visible to the user. On the other hand, users depend on something indexing new documents for search, but your users will generally have no visibility into the indexing process or even that you have a search index to begin with.</p> </li> </ol> <p>Although maps can get quite complex, those three concepts are generally sufficient to allow you to decode an arbitrarily complex map.</p> <p>In addition to mapping the current state, Wardley maps are also excellent at exploring how circumstances might change over time. To illustrate that, let&rsquo;s look at a second iteration of our map, paying particular attention to the red arrows indicating capabilities that we expect to change in the future.</p> <p><img src="https://lethain.com/static/blog/strategy/intro-wardley-future.png" alt="Diagram showing a basic Wardley map for a knowledge base management application."></p> <p>In particular, the map now indicates that the current document creation experience will be superseded by an AI-enhanced editing process. Critically, the map also predicts that the AI-enhanced process will be more commoditized than its current authoring experience, perhaps because the AI-enhancement will be driven by commoditized foundational models from providers like Anthropic and OpenAI. Building on that, the only place left in the map for meaningful differentiation is in search indexing. Either the knowledge base company needs to accept the implication that they will increasingly be a search company, or they need to expand the user needs they service to find a new avenue for differentiation.</p> <p>Some maps will show evolution of a given capability using a &ldquo;pipeline&rdquo;, a box that describes a series of expected improvements in a capability over time.</p> <p><img src="https://lethain.com/static/blog/strategy/intro-wardley-future-pipeline.png" alt="Diagram showing a basic Wardley map for a knowledge base management application."></p> <p>Now instead of simply indicating that the authoring experience may be replaced by an AI-enhanced capability over time, we&rsquo;re able to express a sequence of steps. From the starting place of a typical editing experience, the next expected step is AI-assisted creation, and then finally we expect AI-led creation where the author only provides high-level direction to a machine learning-powered agent.</p> <p>For completeness, it&rsquo;s also worth mentioning that some Wardley maps will have an overlay, which is a box to group capabilities or requirements together by some common denominator. This happens most frequently to indicate the responsible team for various capabilities, but it&rsquo;s a technique that can be used to emphasize any interesting element of a map&rsquo;s topology.</p> <p><img src="https://lethain.com/static/blog/strategy/intro-wardley-team-overlay.png" alt="Diagram showing a basic Wardley map for a knowledge base management application, with an overlay to show which teams own which capabilities."></p> <p>At this point, you have the foundation to read a Wardley map, or get started creating your own. Maps you encounter in the wild might appear singificantly more complex than these initial examples, but they&rsquo;ll be composed of the same fundamental elements.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><strong>More Wardley Mapping resources</strong></p> <p><em><a href="https://itrevolution.com/product/the-value-flywheel-effect/">The Value Flywheel Effect</a></em> by David Anderson</p> <p><em><a href="https://medium.com/wardleymaps/on-being-lost-2ef5f05eb1ec">Wardley Maps</a></em> by Simon Wardley on Medium, also <a href="https://learnwardleymapping.com/book/">available as PDF</a></p> <p><a href="https://learnwardleymapping.com/">Learn Wardley Mapping</a> by Ben Mosior</p> <p><a href="https://list.wardleymaps.com/">wardleymaps.com&rsquo;s resources</a> and <a href="https://www.youtube.com/wardleymaps">@WardleyMaps on Youtube</a></p> </div> <h2 id="tools-for-wardley-mapping">Tools for Wardley Mapping</h2> <p>Systems modeling has a serious tooling problem, which often prevents would-be adopters from developing their systems modeling practice. Fortunately, Wardley Mapping doesn&rsquo;t suffer from that problem. Uou can simply print out a Wardley Map and draw on it by hand. You can also use OmniGraffle, Miro, Figma or whatever diagramming tool you&rsquo;re already familiar with.</p> <p>There are more focused tools as well, with Ben Mosior pulling together an excellent writeup on <a href="https://learnwardleymapping.com/2024/06/24/top-5-wardley-mapping-tools-for-2024/">Wardley Mapping Tools as of 2024</a>. Of those two, I&rsquo;d strongly encourage starting with <a href="https://mapkeep.com/">Mapkeep</a> as a simple, free, and intuitive tool for your innitial mapping needs.</p> <p>After you&rsquo;ve gotten some practice, you may well want to move back into your most familiar diagramming tool to make it easier to collaborate with colleagues, but initially prioritize the simplest tool you can to avoid losing learning momentum on configuration, setup and so on.</p> <h2 id="when-are-wardley-maps-useful">When are Wardley Maps useful?</h2> <p>All successful strategy begins with understanding the constraints and circumstances that the strategy needs to work within. Wardley mapping labels that understanding as situational awareness, and creating situational awareness is the foremost goal of mapping.</p> <p>Situational awareness is always useful, but it&rsquo;s particularly essential in highly dynamic environments where the industry around you, competitors you&rsquo;re sellinga gainst, or the capabilities powering your product are shifting rapidly. In the past several decades, there have been a number of these dynamic contexts, including the rise of web applications, the proliferation of mobile devices, and the expansion of machine learning techniques.</p> <p>When you&rsquo;re in those environments, it&rsquo;s obvious that the world is changing rapidly. What&rsquo;s sometimes easy to miss is that any strategy the needs to last longer than a year or two is build on an evolving foundation, even if things seem very stable at the time. For example, in the early 2010s, startups like Facebook, Uber and Digg were all operating in physical datacenters with their owned hardware. Over a five year period, having a presence in a physical datacenter went from the default approach for startups to a relatively unconventional solution, as cloud based infrastructure rapidly expanded. Any strategy written in 2010 that imagined the world of hosting was static, was destinated to be invalidated.</p> <p>No tool is universally effective, and that&rsquo;s true here as well. While Wardley maps are extremely helpful at understanding broad change, my experience is that they&rsquo;re less helpful in the details. If you&rsquo;re looping to optimize your onboarding funnel, then something like <a href="https://lethain.com/strategy-systems-modeling/">systems modeling</a> or <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a> are likely going to serve you better.</p> <h2 id="how-to-wardley-map">How to Wardley Map</h2> <p>Learning Wardley mapping is a mix of reading others&rsquo; maps and writing your own. A variety of maps for reading are collected in the following breadcrumbs section, and I&rsquo;d recommend skimming all of them. In this section are the concrete steps I&rsquo;d encourage you to follow for creating the first map of your own:</p> <ol> <li> <p><strong>Commit to starting small and iterating.</strong> Simple maps are the foundation of complex maps. Even the smallest Wardley map will have enough detail to reveal something interesting about the environment you&rsquo;re operating in.</p> <p>Conversely, by starting complex, it&rsquo;s easy to get caught up in all of your early map&rsquo;s imperfections. At worst, this will cause you to lose momentum in creating the map. At best, it will accidentally steer your attention rather than facilitating discover of which details are important to focus on.</p> </li> <li> <p><strong>List users, needs and capabilities.</strong> Identify the first one or two users for your product. Going back to the knowledge management example from the primer, your two initial users might be an author and a reader. From there, identify those users&rsquo; needs, such as authoring content, finding content, and providing feedback on which content is helpful. Finally, write down the underlying technical capabilities necessary to support those needs, which might range from indexing content in a search index to a customer support process to deal with frustrated users.</p> <p>Remember to start small! On your first pass, it&rsquo;s fine to focus on a single user. As you iterate on your map, bring in more users, needs and capabilities until the map conveys something useful.</p> <p>Tooling for this can be a piece of paper or wherever you keep notes.</p> </li> <li> <p><strong>Establish value chains.</strong> Take your list and then connect each of the components into chains. For example, the reader in the above knowledge base example would then be connected to needing to discover content. Discovering content would be linked to indexing in the search index. That sequence from reader to discovering content to search index represents one value chain.</p> <p>Convergence across chains is a good thing. As your chains get more comprehensive, it&rsquo;s expected that a given capability would be referenced by multiple different needs. Similarly, it&rsquo;s expected that multiple users might have a shared need.</p> </li> <li> <p><strong>Plot value chains</strong> on a Wardley Map. You can do this using any of the tools discussed in the Tools for Wardley mapping section, including a piece of paper.</p> <p>Because you already have the value chains created, what you&rsquo;re focused on in this step is placing each component relative to it&rsquo;s visibility to users (higher up is more visible to the user, lower down is less visible), and how mature the solutions are (leftward represents more custom solutions, rightward represents most commoditized solutions).</p> </li> <li> <p><strong>Study current state</strong> of the map. With the value chains plotted on your map, it will begin to reveal where your organization&rsquo;s attention should be focused, and what complexity you can delegate to vendors. Jot down any realizations you have from this topology.</p> </li> <li> <p><strong>Predict</strong> evolution of the map, and create a second version of your map that includes these changes. (Keep the previous version so you can better see the evolution of your thinking!)</p> <p>It can be helpful to create multiple maps that contemplate different scenarios. Thinking about the running knowledge base example, you might contemplate a future where AI-powered tools become the dominant mechanism for authors creating content. Then you could explore another future where such tools are regulated out of most tools, and imagine how that would shape your approach differently.</p> <p>Picking the timeframe for these changes will vary on the evironment you&rsquo;re mapping. Always prefer a timeframe that makes it easy to believe changes will happen, maybe that&rsquo;s five years, or maybe it&rsquo;s 12 months. If you&rsquo;re caught up wondering whether change might take longer a certain timeframe, than simply extend your timeframe to sidestep that issue.</p> </li> <li> <p><strong>Study future state</strong> of the map, now that you&rsquo;ve predicted the future, Once again, write down any unexpected implications of this evolution, and how you may need to adjust your approach as a result.</p> </li> <li> <p><strong>Share with others</strong> for feedback. It&rsquo;s impossible for anyone to know everything, which is why the best maps tend to be a communal creation. That&rsquo;s not to suggest that you should perform every step in a broad community, or that your map should be the consensus of a working group. Instead, you should test your map against others, see what they find insightful and what they find artificial in the map, and include that in your map&rsquo;s topology.</p> </li> <li> <p><strong>Document</strong> what you&rsquo;ve learned as discussed below in the section on documentation. You should also connect that Wardley map writeup with your overall strategy document, typically in the <a href="https://lethain.com/components-of-eng-strategy/">Refine or Explore sections</a>.</p> </li> </ol> <p>One downside of presenting steps to do something is that the sequence can become a fixed recipe. These are the steps that I&rsquo;ve found most useful, and I&rsquo;d encourage you to try them if mapping is a new tool in your toolkit, but this is far from the canonical way. Start here, then experiment with other approaches until you find the best approach for you and the strategies that you&rsquo;re working on.</p> <h2 id="breadcrumbs-for-wardley-map-examples">Breadcrumbs for Wardley Map examples</h2> <div class="bg-light-gray br4 ph3 pv1"> <p><em>I&rsquo;ll update these examples as I continue writing more strategies for this book.</em> <em>Until then, I admit that some of these examples are &ldquo;what I have laying around&rdquo; moreso than the &ldquo;ideal forms of Wardley maps.&rdquo;</em></p> </div> <p>With the foundation in place, the best way to build on Wardley mapping is writing your own maps. The second best way is to read existing maps that others have made, and a number of which exist within this book:</p> <ul> <li><a href="wardley-llm-ecosystem">LLM evolution</a> studies the evolution of the Large Language Model ecosystem, and how that will impact product engineering organizations attempting to validate and deploy new paradigms like agentic workflows and retrieval augmented generation</li> <li><a href="https://lethain.com/wardley-gitlab-strategy/">Gitlab strategy</a> shows a broad Wardley Map, looking at the developer tooling industry&rsquo;s evolution over time, and how Gitlab&rsquo;s approach implies they belief commoditization will drive organizations to prefer bundled solutions over integration best-in-breed offerings</li> <li><a href="https://lethain.com/measuring-developer-experience-benchmarks-theory-of-improvement/">Evolution of developer experience tooling space</a> explores how Wardley mapping has helped me refine my understanding of how the developer experience ecosystem will evolve over time</li> </ul> <p>In addition to the maps within this book, I also label maps that I create on my blog using the <a href="https://lethain.com/tags/wardley/">wardley category</a>.</p> <h2 id="how-to-document-a-wardley-map">How to document a Wardley Map</h2> <p>As explored in <a href="https://lethain.com/readable-engineering-strategy-documents/">how to create readable strategy documents</a>, the default temptation is to structure documents around the creation process. However, it&rsquo;s essentially always better to write in two steps: develop a writing-optimization version that&rsquo;s focused on facilitating thinking, and then rework it into a reading-optimized version that supports both readers who are, and are not, interested in the details.</p> <p>The writing-optimized version is what we discussed in &ldquo;How to Wardley Map&rdquo; above. For a reading-optimized version, I recommend:</p> <ol> <li> <p><strong>How things work today</strong> shares a map of the current environment, explains any interesting rationales or controversies behind placements on the map, and highlights the most interesting parts of the map</p> </li> <li> <p><strong>Transition to future state</strong> starts with a second map, this one showing the transition from the current state to a projected future state. It&rsquo;s very reasonable to have multiple distinct maps, each of which considers one potential evolution, or one step of a longer evolution.</p> </li> <li> <p><strong>Users and Value chains</strong> are the first place you start creating a Wardley map, but generally the least interesting part of explaining a map&rsquo;s implications. This isn&rsquo;t because the value chains are unimportant, rather it&rsquo;s because the map itself tends to implicitly explain the value chain enough that you can move directly to focusing on the map&rsquo;s most interesting implications.</p> <p>In a sufficiently complex, it&rsquo;s very reasonable to split this into two sections, but generally I find it eliminates redundency to cover users and value chains in one joint section rather than separately. This is a good example of the difference between reading and writing: splitting these two topics helps clarify thinking, but muddles reading.</p> </li> </ol> <p>This ordering may seem too brief or a bit counter-intuitive for you, as the person who has the full set of details, but my experience is that it will be simpler to read for most readers. That&rsquo;s because most readers read until they agree with the conclusion, then stop reading, and are only interested in the details if they disagree with the conclusion.</p> <p>This format is also fairly different than the format I recommend for documenting systems models. That is because systems model diagrams exclude much of the relevant detail, showing the relationship between stocks but not showing the magnitude of the flows. You can only fully understand a system model by seeing both the diagram and a chart showing the model&rsquo;s output. Wardley maps, on the other hand, tend to be more self-explanatory, and often can stand on their own with relatively less written description.</p> <h2 id="what-about-doctrines-and-gameplay">What about doctrines and gameplay?</h2> <p>This book&rsquo;s <a href="https://lethain.com/components-of-eng-strategy/">components of strategy</a> are most heavily influenced by Richard Rumelt&rsquo;s approach. Simon Wardley&rsquo;s approach to strategy built around Wardley Mapping could be viewed as a competing lens. For each problem that Rumelt&rsquo;s system solves, there is a Wardley solution as well, and it&rsquo;s worth mentioning some of the components I&rsquo;ve not included, and why I didn&rsquo;t.</p> <p>The two most important components I&rsquo;ve not discussed thus far are Wardley&rsquo;s ideas of <a href="https://learnwardleymapping.com/2020/08/17/principles-first/">doctrine</a> and <a href="https://www.wardleymaps.com/gameplay">gameplay</a>. Wardley&rsquo;s doctrine are universally applicable practices like knowing your users, biasing towards data, and design for constant evolution. Gameplay is similar to doctrine, but is context-dependent rather than universal. Some examples of gameplay are talent raid (hiring from knowledgable competitior), bundling (selling products together rather than separately), and exploiting network effects.</p> <p>I decided not to spend much time on doctrine and gameplay because I find them lightly specialized on the needs of business strategy, and consequently a bit messy to apply to the sorts of problems that this book is most interested in solving: the problems of engineering strategy.</p> <p>To be explicit, I don&rsquo;t personally view Rumelt&rsquo;s approach and Wardley&rsquo;s approaches as competing efforts. What&rsquo;s most valuable is to have a broad toolkit, and pull in the pieces of that toolkit that feel most applicable to the problems at hand. I find Wardley Maps exceptionally valuable at enhancing exploration, diagnosis, and refinement in some problems. In other problems, typically shorter duration or more internally-oriented, I find the Rumelt playbook more applicable. In all problems, I find the combination more valuable than anchoring in one camp&rsquo;s perspective.</p> <h2 id="summary">Summary</h2> <p>No refinement technique will let you reliably predict the future, but Wardley mapping is very effective at helping you plot out the various potential futures your strategy might need to operate in. With those futures in mind, you can tune your strategy to excel in the most likely, and to weather the less desirable.</p> <p>It took me years to dive into Wardley mapping. Once I finally did, it was simpler than I&rsquo;d feared, and now I find myself creating Wardley maps somewhat frequently. When you&rsquo;re working on your next strategy that&rsquo;s impacted by the ecosystem&rsquo;s evolution around it, try your hand at mapping, and soon you&rsquo;ll <a href="https://lethain.com/tags/wardley/">start to build your own collection of maps</a>.</p>How to effectively refine engineering strategy.https://lethain.com/refining-eng-strategy/Sat, 28 Dec 2024 04:00:00 -0700https://lethain.com/refining-eng-strategy/<p>In Jim Collins&rsquo; <em><a href="https://www.amazon.com/Great-Choice-Uncertainty-Thrive-Despite/dp/1847940889/">Great by Choice</a></em>, he develops the concept of <a href="https://www.jimcollins.com/concepts/fire-bullets-then-cannonballs.html">Fire Bullets, Then Cannonballs</a>. His premise is that you should cheaply test new ideas before fully committing to them. Your organization can only afford firing a small number of cannonballs, but it can bankroll far more bullets. Why not use bullets to derisk your cannonballs&rsquo; trajectories?</p> <p>This chapter presents a series of concrete techniques that I have personally used to effectively refine strategies before reaching the cannonball stage. We&rsquo;ll work through:</p> <ul> <li>An introduction to the practice of strategy refinement</li> <li>Why strategy refinement is the highest impact step of strategy creation</li> <li>How mixed incentives often cause refinement to be skipped, even though skipping leads to worse organizational outcomes</li> <li>Building your personal toolkit for refining strategy by picking from various refinement techniques like <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a>, <a href="https://lethain.com/strategy-systems-modeling/">systems modeling</a>, and <a href="https://lethain.com/wardley-mapping/">Wardley mapping</a></li> <li>Brief introductions to each of those refinement techniques. These provide enough context to pick which ones might be useful for the strategy that you&rsquo;re working on</li> <li>Survey of anti-patterns that skip refinement or manufacture consent to create the illusion of refinement without providing the benefits</li> </ul> <p>Each of the refinement techniques, such as systems modeling, are covered in greater detail&ndash;including concrete applications to specific engineering strategies&ndash;in the refinement section of this book.</p> <div class="bg-light-gray br4 ph3 pv1"> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> </div> <h2 id="what-is-strategy-refinement">What is strategy refinement?</h2> <p>Most strategies succeed because they properly address narrow problems within a broader strategy. While it&rsquo;s possible to implement the entire strategy to validate the approach, this is both inefficient and slow. Worse, it&rsquo;s easy to get so distracted by miscellaneous details that you lose sight of the levers that will make your strategy impactful.</p> <p>Strategy refinement is a toolkit of methods to identify those narrow problems that matter most, and validate that your solutions to those problems will be effective. The right tool within the toolkit will vary depending on the strategy you&rsquo;re working on. It might be using Wardley mapping to understand how the ecosystem&rsquo;s evolution will impact your approach. Or it might be systems modeling to determine which part of a migration is the most valuable lever. In other cases, it&rsquo;s slowing down committing to your strategy until you&rsquo;ve done a narrow test drive to derisk the pieces you don&rsquo;t quite have conviction in yet.</p> <p>Whatever tools you&rsquo;ve relied on to refine strategy thus far in your work, there are always new refinement tools to pick up. This book presents a workable introduction to several tools that I find reliably useful, while providing a broader foundation for deploying other techniques that you develop towards strategy refinement.</p> <h2 id="does-refinement-matter">Does refinement matter?</h2> <p>At Stripe, the head of engineering one-shot rolled out agile techniques as the required method for engineering development. This change was aimed at our difficulties with planning in periods longer than a month, which was becoming an increasing challenge as we started working with enterprise businesses who wanted us to commit to specific functionality as part of signing their contracts. On the other hand, the approach worked poorly, because it assumed that the issue was engineering managers being generally unfamiliar with agile techniques. The challenge of adoption wasn&rsquo;t awareness, but rather the difficulty of prioritizing asks from numerous stakeholders in an environment where saying no was frowned upon.</p> <p>In this agile rollout, the lack of a shared planning paradigm was a real, apt problem. However, the solution solved the easiest part of the problem, without addressing the messier parts, and consequently failed to make meaningful progress. This happens a surprising amount, and can be largely avoided with a small dose of refinement.</p> <p>On the opposite end, we created Uber&rsquo;s service adoption strategy exclusively through refinement, because the infrastructure engineering team didn&rsquo;t have any authority to mandate wider changes. Instead, we relied on two different kinds of refinement to focus our iterative efforts. First, we used systems modeling to understand what parts of adoption we needed to focus on. Second, we used strategy testing to learn by migrating individual product engineering teams over to the new platform.</p> <p>In the agile adoption example, failure to refine turned a moderately challenging problem into a strategy failure. In the service migration example, focus on refinement translated an extremely difficult problem into a success. Refinement is, in my experience, the kernel of effective strategy.</p> <h2 id="if-it-matters-why-is-it-skipped">If it matters, why is it skipped?</h2> <p>When a small team creates a strategy, a so-called <a href="https://lethain.com/when-write-down-engineering-strategy/">low-altitude strategies</a>, they almost always spend a great deal of time refining their strategy. This isn&rsquo;t because most teams believe in refinement. Rather it&rsquo;s because most teams lack the authority to force others to align with their strategy. This lack of authority means they must incrementally prove out their approach until other teams or executives believe it&rsquo;s worth aligning with.</p> <p>High-altitude strategy is typically the domain of executives, who generally have the ability to mandate adoption, and routinely skip the refinement stage, even when it&rsquo;s inexpensive and is almost guaranteed to make them more successful. Why is that? When <a href="https://lethain.com/first-ninety-days-cto-vpe/">executives start a new role</a>, they know making an early impression matters. They also, unfortunately, know that sounding ambitious often resonates more loudly than doing good work. So, while they do hope to eventually be effective, early on they kick off a few aspirational initiatives <a href="https://lethain.com/grand-migration/">like a massive overhaul of the codebase</a>, believing it&rsquo;ll establish their reputation as an effective leader at the company.</p> <p>This isn&rsquo;t uniquely an executive failure, it also happens frequently in <a href="https://lethain.com/when-write-down-engineering-strategy/">permissive strategy organizations</a> that require <a href="https://staffeng.com/guides/staff-projects/">an ambitious, high-leverage project to get promoted into senior engineering roles</a>. For example, you might see a novel approach to networking or authorization implemented in a company, whose adoption fails after solving some easier proof points, and trace its heritage back to promotion criteria. In many cases, the promotion will come before the rollout stalls out, disincentivizing the would-be promoted engineer from worrying too deeply about whether this was net-positive for the organization. The executive responsible for the promotion rubric will eventually recognize the flaw, but it&rsquo;s not the easiest tradeoff for them to pick between an organization that innovates too much while empowering individuals or an organization with little waste but restricted room for creativity.</p> <p>Another reason refinement can get skipped is that sometimes you&rsquo;re forced to urgently create and commit to a strategy, usually because your boss tells you to. This doesn&rsquo;t actually prevent refinement&ndash;just say you&rsquo;re committed and refine anyway&ndash;but often this interaction turns off the strategist&rsquo;s mind, tricking them into intellectually thinking they can&rsquo;t change their approach because they&rsquo;ve already committed to it. This is never true, all decisions are up for review with proper evidence, but it takes a certain courage to refine when those around you are asking for weekly updates on completing the project.</p> <p>There&rsquo;s one other important reason that strategy refinement gets skipped: many people haven&rsquo;t built out a toolkit to perform strategy refinement, and haven&rsquo;t worked with someone who has a toolkit.</p> <h2 id="building-your-toolkit">Building your toolkit</h2> <p>I&rsquo;m eternally grateful to my father, a professor of economics, who brought me to a systems modeling workshop in Boston one summer when I was in high school. This opened my eyes to the wide world of techniques for reasoning about problems, and systems modeling became the first tool in my toolkit for strategy refinement.</p> <p>The section on refinement will go into three refinement techniques in significant detail: <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a>, <a href="https://lethain.com/strategy-systems-modeling/">systems modeling</a>, and <a href="https://lethain.com/wardley-mapping/">Wardley mapping</a>, as well as surveying a handful of other techniques more common to strategy consultants. Systems modeling I adopted early, whereas Wardley mapping I only learned while working on this book. Few individuals are proficient users of many refinement tools, but it&rsquo;s extraordinarily powerful to unlock your first tool, and worthwhile to slowly expand your experience with other tools over time. All tools are flawed, and each is best at illuminating certain types of problems.</p> <p>If all of these are unfamiliar, then skim over all of them and pick one that seems most applicable to a current problem you&rsquo;re working on. You&rsquo;ll build expertise by trying a tool against many different problems, and talking through the results with engaged peers.</p> <p>As you practice, remember that the important thing to share is the learning from these techniques, and try to avoid getting too caught up in sharing the techniques themselves. I&rsquo;ve seen these techniques meaningfully change strategies, but I&rsquo;ve never seen those changes successfully justified through the inherent insight of the refinement techniques themselves.</p> <h2 id="strategy-testing">Strategy testing</h2> <p>Sometimes you&rsquo;ll need a strategy to solve an ambiguous problem, or a problem where diagnosing the issues blocking progress are poorly understood. At Carta, one strategy problem we worked on was improving code quality, which is a good example of both of those. It&rsquo;s difficult to agree on what code quality is, and it&rsquo;s equally difficult to agree on appropriate, concrete steps to improve it.</p> <p>To navigate that ambiguity, we spent relatively little time thinking about the right initial solution, and a great deal of our time deploying the <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a> technique:</p> <ol> <li>Identify the narrowest, deepest available slice of your strategy. Iterate on applying that slice until you see some evidence it&rsquo;s working.</li> <li>As you iterate, identify metrics that help you verify the approach is working.</li> <li>Operate from the belief that people are well-meaning, and strategy failures are due to excess friction and poor ergonomics.</li> <li>Keep refining until you have conviction that your strategy’s details work in practice, or that the strategy needs to be approached from a new direction.</li> </ol> <p>In this case, we achieved some small wins, funded a handful of specific bets that we believed would improve the problem long-term, and ended the initiative early without making a large organzational commitment. You could argue that&rsquo;s a failure, but my experience is quite different: having a problem doesn&rsquo;t mean you have an elegant solution, and strategy testing helps you validate if the solution&rsquo;s efficiency and ergonomics are viable.</p> <p>If you&rsquo;re dealing with a deeply ambiguous problem and there&rsquo;s no agreement on the nature of the reality you&rsquo;re operating in, strategy testing is a great technique to start with.</p> <h2 id="systems-modeling">Systems modeling</h2> <p>When you&rsquo;re unsure where leverage points might be in a complex system, <a href="https://lethain.com/strategy-systems-modeling/">systems modeling</a> is an effective technique to cheaply determine which levers might be effective. For example, the systems model for <a href="https://lethain.com/driver-onboarding-model/">onboarding drivers in a ride-share app</a> shows that reengaging drives who&rsquo;ve left the platform matters more than bringing on new drivers in a mature market.</p> <p>Similarly, in the Uber service migration example, systems modeling helped us focus on eliminating upfront steps during service onboarding, shifting to reasonable defaults and away from forcing teams to learn the new service platform before it had shown done anything useful for them.</p> <p><img src="https://lethain.com/static/blog/strategy/QualityMentalModels.png" alt="Diagram of a quality systems model"></p> <p>While you can certainly reach these insights without modeling, modeling tends to make the insights immediately visible. In cases where your model doesn&rsquo;t immediately illuminate what matters most, studying how your model&rsquo;s projections conflict with real-world data will guide you to understand where your assumptions are contorting your understanding of the problem.</p> <p>If you generally understand a problem, but need to determine where to focus efforts to make the largest impact, then systems modeling is valuable technique to deploy.</p> <h2 id="wardley-mapping">Wardley mapping</h2> <p>Many engineering strategies implicitly make the assumption that the ecosystem we&rsquo;re operating within is static. However, that&rsquo;s certainly false. Many experienced engineers and engineering leaders have great judgment, and great intuition, but nonetheless deploy flawed strategy because they&rsquo;ve anchored on their memory of how things work rather than noticing how things have changed over time.</p> <p>If, rather than being hit over the head by them, you want to incorporate these changes into your strategy, <a href="https://lethain.com/wardley-mapping/">Wardley mapping</a> is a great tool to add to your kit.</p> <p>Wardley maps allow you to plot users, their needs, and then study how the solutions to those needs will shift over time. For example, today there is a proliferation of narrow platforms built on recent advances in large language models, but <a href="https://lethain.com/wardley-llm-ecosystem/">studying a Wardley map of the LLM ecosystem</a> suggests that it&rsquo;s likely that this ecosystem will consolidate to fewer, broader platforms rather than remaining so widely scattered across distinct vendors.</p> <p><img src="https://lethain.com/static/blog/strategy/llm-wardley-1.png" alt="Wardley map of Large Language Model ecosystem"></p> <p>If your strategy involves adopting a highly dynamic technology such as observability in the 2010s, or if your strategy is intended to span five-plus years, then Wardley mapping will help surface how industry evolution will impact your approach.</p> <h2 id="anti-patterns-in-refinement">Anti-patterns in refinement</h2> <p>We&rsquo;ve already discussed why <strong>refinement is often skipped</strong>, which is the most frequent and most damning refinement anti-pattern. At Calm, we cargo-culted adoption of decomposing our monolithic codebase into microservices; we had no reason to believe this was improving developer productivity, but we continued to pursue this strategy for a year before recognizing that we were suffering from skipping refinement.</p> <p>The second most common anti-pattern is creating the impression of strategy refinement through <strong>manufactured consent</strong>. A new senior leader joined Uber and mandated a complete technical re-achitecture, justifying this in part through the evidence that a number of internal leaders had adopted the same techniques successfully on their teams. Speaking with those internal leaders, they themselves were skeptical that the proposal made sense, despite the fact that their surface-level agreement was being used to convince the wider organization that they believed in the new approach.</p> <p>Finally, refinement often occurs, but counter-evidence is discarded because the refining team is <strong>optimizing for a side-goal</strong> of some sort. My first team at Yahoo adopted Erlang for a key component of <a href="https://lethain.com/datahub/">Yahoo! Build Your Own Search Service</a>, which proved to be an excellent solution to our problem of wanting to use Erlang, but a questionable solution to the core problem at hand. Only three of the engineers on our fifteen person team were willing to touch the Erlang codebase, but that counter-evidence was ignored because it was in conflict with the side-goal.</p> <h2 id="summary">Summary</h2> <p>This chapter has introduced the concept of strategy refinement, surveyed three common refinement techniques&ndash;strategy testing, systems modeling, and Wardley mapping&ndash;and provided a framework for building your personal toolkit for refinement. When you&rsquo;re ready to get into more detail, further in the book there&rsquo;s a section dedicated to the details of applying these techniques, starting with <a href="https://lethain.com/testing-strategy-iterative-refinement/">strategy testing</a>.</p>Wardley mapping the LLM ecosystem.https://lethain.com/wardley-llm-ecosystem/Tue, 24 Dec 2024 04:00:00 -0700https://lethain.com/wardley-llm-ecosystem/<p>In <a href="https://lethain.com/llm-adoption-strategy/">How should you adopt LLMs?</a>, we explore how a theoretical ride sharing company, Theoretical Ride Sharing, should adopt Large Language Models (LLMs). Part of that strategy&rsquo;s diagnosis depends on understanding the expected evolution of the LLM ecosystem, which we&rsquo;ve build a <a href="https://lethain.com/wardley-mapping/">Wardley map</a> to better explore.</p> <p>This map of the LLM space is interested in how product companies should address the proliferation of model providers such as Anthropic, Google and OpenAI, as well as the proliferation of LLM product patterns like agentic workflows, Retrieval Augmented Generation (RAG), and running <a href="https://github.com/openai/evals">evals to maintain performance as models change</a>.</p> <hr> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> <h2 id="reading-this-document">Reading this document</h2> <p>To quickly understand the analysis within this Wardley Map, read from top to bottom to understand this analysis. If you want to understand how this map was <em>written</em>, then you should read section by section from the bottom up, starting with Users, then Value Chains, and so on.</p> <p>More detail on this structure in <a href="https://lethain.com/wardley-mapping/">Refining strategy with Wardley Mapping</a>.</p> <h2 id="how-things-work-today">How things work today</h2> <p>If Retrieval Augmented Generation (RAG) was the trending LLM pattern of 2023, and you could reasonably argue that agents&ndash;or agentic workflows&ndash;are the pattern of 2024, then it&rsquo;s hard to guess what the patterns of tomorrow will be, but it&rsquo;s a safe guess that there are more, new patterns coming our way. LLMs are a proven platform today, and now are being applied widely to discover new patterns. It&rsquo;s a safe bet that validating these patterns will continue to drive product companies to support additional infrastructure components (e.g. search indexes to support RAG).</p> <p><img src="https://lethain.com/static/blog/strategy/llm-wardley-1.png" alt="Current state of LLM ecosystem."></p> <p>This proliferation of patterns has created a significant cost for these product companies, a problem which market forces are likely to address as offerings evolve.</p> <h2 id="transition-to-future-state">Transition to future state</h2> <p>Looking at the evolution of the LLM ecosystem, there are two questions that I believe will define the evolution of the space:</p> <ol> <li>Will LLM framework platforms for agents, RAG, and so on, remain bundled with model providers such as OpenAI and Anthropic? Or will they, instead, split with models and platforms being offered separately?</li> <li>Which elements of LLM frameworks will be productizable in the short-term? For example, running evals seems like a straightforward opportunity for bundling, as would providing <em>some</em> degree of agent support. Conversely, bundling RAG might seem straightforward but most production usecases would require real-time updates, incurring the full complexity of operating scaled search clusters.</li> </ol> <p>Depending on the answers to those questions, you might draw a very different map. This map answers the first question by imagining that LLM platforms will decouple from model providers, while also allowing you to license with that platform for model access rather than needing to individually negotiate with each model provider. It answers the second question by imagine that most non-RAG functionality will move into a bundled platform provider. Given the richness of investment in the current space, it seems safe to believe that every plausible combination will exist to some degree until the ecosystem eventually stabilizes in one dominant configuration.</p> <p><img src="https://lethain.com/static/blog/strategy/llm-wardley-2.png" alt="Current state of LLM ecosystem."></p> <p>The key drivers of this configuration is that the LLM ecosystem is investing new patterns every year, and companies are spinning up haphazard interim internal solutions to validate those patterns, but ultimately few product companies are able to effectively fund these sorts of internal solutions in the long run.</p> <p>If this map is correct, then it means eventual headwinds for both model providers (who are inherently limited to providing their own subset of models) as well as narrow LLM platform providers (who can only service a subset of LLM patterns). The likely best bet for a product company in this future is adopting the broadest LLM pattern platforms today, and to explicitly decouple pattern platform from model provider.</p> <h2 id="user--value-chains">User &amp; Value Chains</h2> <p>The LLM landscape is evolving rapidly, with some techniques getting introduced and reaching wide-spread adoption within a single calendar year. Sometimes those widely adopted techniques are <em>actually</em> being adopted, and other times it&rsquo;s closer to &ldquo;conference-talk driven development&rdquo; where folks with broad platforms inflate the maturity of industry adoption.</p> <p>The three primary users attempting to navigate that dynamism are:</p> <ol> <li><strong>Product Engineers</strong> are looking for faster, easier solutions to deploying LLMs across the many, evolving parameters: new models, support for agents, solutions to offload the search dimensions of retrieval-augmented-generation (RAG), and so on.</li> <li><strong>Machine Learning Infrastructure</strong> team is responsible for the effective usage of the mechanisms, and steering product developers towards effective adoption of these tools. They are also, in tandem with other infrastructure engineering teams, responsible for supporting common elements for LLM solutions, such as search indexes to power RAG implementations.</li> <li><strong>Security and Compliance</strong> &ndash; how to ensure models are hosted safely and securely, and that we&rsquo;re only sending approved information? how do stay in alignment with rapidly evolving AI risks and requirements?</li> </ol> <p>To keep the map focused on evolution rather than organizational dynamics, I&rsquo;ve consolidated a number of teams in slightly artificial ways, and omitted a few teams that are certainly worth considering. Finance needs to understand the cost and usage of LLM usage. Security and Compliance are really different teams, with both overlapping and distinct requirements between them. Machine Learning Infrastructure could be split into two distinct teams with somewhat conflicting perspectives on who should own things like search infrastructure.</p> <p>Depending on what <em>you</em> want to learn from the map, you might prefer to combine, split and introduce a different set of combinations than I&rsquo;ve selected here.</p>Wardley mapping of Gitlab Strategy.https://lethain.com/wardley-gitlab-strategy/Mon, 23 Dec 2024 04:00:00 -0700https://lethain.com/wardley-gitlab-strategy/<p>Gitlab is an integrated developer productivity, infrastructure operations, and security platform. This <a href="https://lethain.com/wardley-mapping/">Wardley map</a> explores the evolution of Gitlab&rsquo;s users&rsquo; needs, as one component in understanding the company&rsquo;s strategy. In particular, we look at how Gitlab&rsquo;s strategy of a bundled, all-in-one platform anchors on the belief that build and security tooling is moving from customization to commodity.</p> <hr> <p><em>This is an exploratory, draft chapter for a book on engineering strategy that I&rsquo;m brainstorming in <a href="https://lethain.com/tags/eng-strategy-book/">#eng-strategy-book</a>.</em> <em>As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts.</em></p> <h2 id="reading-this-document">Reading this document</h2> <p>To quickly understand the analysis within this Wardley Map, read from top to bottom to understand this analysis. If you want to understand how this map was <em>written</em>, then you should read section by section from the bottom up, starting with Users, then Value Chains, and so on.</p> <p>More detail on this structure in <a href="https://lethain.com/wardley-mapping/">Refining strategy with Wardley Mapping</a>.</p> <h2 id="how-things-work-today">How things work today</h2> <p>Today, managing build, deploys and security are somewhat custom endeavors. The kind of work that even small <a href="https://lethain.com/tech-company/">technology companies</a> dedicated teams to operating smoothly.</p> <p><img src="https://lethain.com/static/blog/strategy/gitlab-ward-2.png" alt="Wardley map of developer productivity space."></p> <p>The value chains across users are highly coupled: there is no value chain that doesn&rsquo;t overlap across users. For example, debugging a failed build is important to both the developers and to the developer experience team. Similarly, understanding attribution of costs is essential to both the developer experience team and to the finance team.</p> <p>Because of that bundling, teams that buy best-in-breed solutions rather than a bundled stack spend significant time integrating them together to work properly. It&rsquo;s not uncommon for teams to spend a day a month on just the finance and developer experience integration. This sort of customization is unique for each company, but is rarely the company&rsquo;s special sauce. Rather, it&rsquo;s the result of poor interoperabiltiy between many tools in the people systems and developer systems space.</p> <h2 id="transition-to-future-state">Transition to future state</h2> <p>It&rsquo;s fairly clear that more and more components of this map are shifting from custom to product. Gitlab has a clear point of view in these ecosystem standardizing, evolving up from custom implementations and toward products and commoditization.</p> <p><img src="https://lethain.com/static/blog/strategy/gitlab-ward-1.png" alt="Wardley map of developer productivity space."></p> <p>These shifts will bring an increasingly large number of companies into Gitlab&rsquo;s addressable market, including annoying but low value problems like storing build and deploy logs for future access. Most markets vacillate between pursuing &ldquo;best of breed&rdquo; (you buy a number of specialized vendors) and &ldquo;all-in-one&rdquo; (you buy one, comprehensive and highly integrated solution).</p> <p>Gitlab has placed a clear bet on being an all-in-one solution by solving for both the traditional developer and developer experience users as well as the security user. This appears to reflect a belief that security tooling is quickly moving towards becoming a commodity solution, an interesting view, and one whose validity we&rsquo;ll see negotiated in the market as Gitlab competes with companies like Wiz and Snyk for marketshare.</p> <h2 id="user--value-chains">User &amp; Value Chains</h2> <p>Gitlab describes itself as &ldquo;most comprehensive AI-powered DevSecOps platform.&rdquo; This is a broad ambition, and consequently there are quite a few users for the platform. For this mapping exercise, we are going to focus on four users:</p> <ol> <li> <p><strong>Developers</strong> at the company. The product and infrastructure engineers who are using the Gitlab platform as a tool within their workflows. These are the developers responsible for creating and running the company&rsquo;s product.</p> <p>The value chains they&rsquo;re focused on are deploying software, debugging failed deploys, and optimizing the speed at which builds and deploys occur. Underneath those needs are a number of infrastructure components performing the actual deploy, collecting logs for debugging, and so on.</p> </li> <li> <p><strong>Developer Experience</strong> who are responsible for selecting, onboarding and operating the deployment infrastructure in the company. More broadly, this team is responsible for the overall productivity of the company&rsquo;s developers.</p> <p>They don&rsquo;t have any value chain that is unique to them, but they are tightly involved in every other users&rsquo;value chains. This creates a unique broad view of the map. Further, the developer experience team is generally the expert on each value chain, having the deepest view.</p> </li> <li> <p><strong>Security &amp; Compliance</strong> who maintain the security infrastructure and compliance postures for your company. They require vulnerability scanning to detect supply chain security attacks, as well as identifying common issues in developed software such as the <a href="https://owasp.org/www-project-top-ten/">OWASP Top Ten</a>.</p> <p>The value chain they&rsquo;re focused on is software vulnerability scanning, which in turn depends on a database of package vulnerabilities and a scanner for detecting those packages and other common vulnerabilities.</p> </li> <li> <p><strong>Finance</strong> who monitor the cost and usage of your platform. They&rsquo;re most focused on the projection and attribution of costs represented by the platform. For example, they would want to model the infrastructure costs of hiring an additional 50 product engineers in terms of the additional builds, deploys, and so on they would consume.</p> <p>The value chain they&rsquo;re focused on is understanding attribution and usage, which in turn relies on an ownership graph mapping each piece of software (and each build, and each test run, and each security issue, etc) to a concrete team within the company.</p> </li> </ol> <p>There are more users we could dig into, but these are the four most important customers in evaluating Gitlab&rsquo;s strategic approach.</p>2024 in review.https://lethain.com/2024-in-review/Sat, 14 Dec 2024 05:00:00 -0700https://lethain.com/2024-in-review/<p>A lot happened for me this year. I continued learning the details of fund accounting at Carta, which is likely the most complex product domain I&rsquo;ve worked in. My third book was published, and I did a small speaking tour to support it. We started the unironically daunting San Francisco kindergarten application process. I was diagnosed with skin cancer and had successful surgery to remove it. All things considered, it was a much messier year than I intended, but with many good pockets mixed in with the mess.</p> <p>(I love to read other folks year-in writeups – if you write one, please send it my way!)</p> <hr> <p><em>Previously: <a href="https://lethain.com/2023-in-review/">2023</a>, <a href="https://lethain.com/2022-in-review/">2022</a>, <a href="https://lethain.com/2021-in-review/">2021</a>, <a href="https://lethain.com/2020-in-review/">2020</a>, <a href="https://lethain.com/2019-in-review">2019</a>, <a href="https://lethain.com/2018-in-review/">2018</a>, <a href="https://lethain.com/things-learned-in-2017/">2017</a></em></p> <h2 id="goals">Goals</h2> <p>Evaluating my goals for this year and decade:</p> <ul> <li> <p><strong>[Completed]</strong> <em>Write at least four good blog posts each year.</em></p> <p><a href="https://lethain.com/quality/">How to create software quality</a>, <a href="https://lethain.com/layers-of-context/">Layers of context</a>, <a href="https://lethain.com/multi-dimensional-tradeoffs/">Useful tradeoffs are multi-dimensional</a>, <a href="https://lethain.com/mental-model-for-how-to-use-llms-in-products/">Notes on how to use LLMs in your product</a>, <a href="https://lethain.com/engineering-cost-model/">Eng org seniority-mix model</a>.</p> </li> <li> <p><strong>[Completed]</strong> <em>Write another book about engineering or leadership.</em></p> <p>I did this in either 2023 or 2024, as I released <em><a href="https://lethain.com/eng-execs-primer/">The Engineering Executive&rsquo;s Primer</a></em> this year, and finishing writing it late last year. Either way, it&rsquo;s complete.</p> </li> <li> <p><strong>[New]</strong> <em>Write three books about engineering or leadership in 2020s.</em></p> <p>I&rsquo;ve already written two this decade, so committing to writing one more feels pretty attainable. At that point, I&rsquo;ll have written four total including <em>An Elegant Puzzle</em> in 2019, and I&rsquo;m pretty sure I will be &ldquo;booked out,&rdquo; perhaps permanently, but I think I have one more good one left in me on the topic of engineering strategy.</p> </li> <li> <p><strong>[Completed]</strong> <em>Do something substantial and new every year that provides new perspective or deeper practice.</em></p> <p>Started working with a physical trainer for first time, significantly helping me improve on a number of my lifts. On a totally different dimension, I finally <a href="https://lethain.com/learning-wardley-mapping/">spent time to learn Wardley mapping</a> which is something I&rsquo;ve been intending to do, but not doing, for at least five years.</p> </li> <li> <p><strong>[In progress]</strong> <em>20+ folks who I’ve managed or meaningfully supported move into VPE or CTO roles at 50+ person or $100M+ valuation companies.</em></p> <p>This is a decade goal ending in 2029. I previously increased the goal in 2022 from <code>3-5</code> to <code>20</code>. A strict count is at <code>10</code> today, so I think I&rsquo;m on track.</p> </li> </ul> <p>For backstory on these goals: I originally <a href="https://lethain.com/2019-in-review/">set them in 2019</a>, and then <a href="https://lethain.com/2022-in-review/">revised them in 2022</a>. I&rsquo;ve generally come to the point of view that I should be revising these every year, but also not sure it&rsquo;s worth doing it. Maybe one day!</p> <h2 id="published-_engineering-executives-primer_">Published <em>Engineering Executive&rsquo;s Primer</em></h2> <p>My third book was published in March, <em><a href="https://lethain.com/eng-execs-primer/">The Engineering Executive&rsquo;s Primer</a></em>. I wrote up <a href="https://lethain.com/publishing-eng-execs-primer/">notes on publishing it with O&rsquo;Reilly</a>, and I&rsquo;m altogether very happy to have written and published it. I&rsquo;m also glad that work is in the past, rather than the present, as finishing a book is a fair amount of work to juggle with a full-time job and parenting.</p> <p>Relative to my last two books, this is a bit more of a niche topic, so my mental model for sales was that they&rsquo;d likely be a bit lower, which seems to be accurate so far, with a bit over 10,000 copies sold through October. The real question for a book like this is whether it maintains sales after the first six-month bump, which I&rsquo;ll be watching with curiosity. It may also be true that book sales are not a particularly effective way to evaluate a book like this, which aims to <a href="https://lethain.com/advancing-the-industry/">advance the state of the industry</a> by reaching decision makers within the industry. That said, it&rsquo;s hard not to evaluate your book that way, even when you know that you shouldn&rsquo;t.</p> <p>As an aside, determining accurate numbers for O&rsquo;Reilly titles is a bit tricky. There are things like the online version of the book, where you&rsquo;re paid by usage (similar to Amazon Kindle Unlimited) but don&rsquo;t have a non-dollar indicator of usage. I&rsquo;m similarly unclear how audiobook usage is calculated, as it doesn&rsquo;t show up in my royalties. To deal with all that, I&rsquo;ve just focused on sold physical books and ebooks.</p> <h2 id="other-book-updates">Other book updates</h2> <p>A few other book related updates:</p> <ul> <li>I broke through 1,000 ratings on Amazon for <em><a href="https://staffeng.com/">Staff Engineer</a></em> this year, which is a cool milestone to pass for two distinct books. (<em>AEP</em> passed 1,000 a year or two ago.)</li> <li>Both earlier books continued to sell fairly well. I believe <em>An Elegant Puzzle</em> passed 100k copies sold either late last 2023 or early 2024 (the sales data I get is not particularly high granularity), which is a nice number. <em>Staff Engineer</em> is also doing well, passing 89k copies sold as of today. I think it&rsquo;ll pass 100k sometime in 2026, assuming it can mostly maintain its velocity from over the last couple years.</li> <li>I wrote the <a href="https://lethain.com/high-context-triad/">High-Context Triad</a> which I view as an additional set of chapters for a second edition of <em>Staff Engineer</em> whenever I get to that project, probably my second side project in the queue after finishing the engineering strategy book. Of those chapters, I think <a href="https://lethain.com/multi-dimensional-tradeoffs/">Useful tradeoffs are multi-dimensional</a> and <a href="https://lethain.com/layers-of-context/">Layers of context</a> are particularly useful.</li> <li>I&rsquo;ve been pulling together notes and writing on <a href="https://lethain.com/tags/eng-strategy-book/">engineering strategy</a>, which has a real chance of turning into my next book, but there&rsquo;s no guarantee on these things, e.g <em><a href="https://infraeng.dev/">Infrastructure Engineering</a></em> certainly went off the rails.</li> </ul> <h2 id="public-speaking-etc">Public speaking, etc</h2> <p>Generally, I am not very focused on public speaking, but my stance on this topic changes the years when I publish a new book, and I did quite a bit of speaking, writing, podcast attendee-ing, and so on. I was featured on Gergely Orosz&rsquo;s <a href="https://newsletter.pragmaticengineer.com/p/getting-an-engineering-executive">Pragmatic Engineer mailing list</a>, <a href="https://review.firstround.com/unexpected-anti-patterns-for-engineering-leaders-lessons-from-stripe-uber-carta/">First Round Review</a>, and Lenny Rachitsky&rsquo;s <a href="https://www.lennysnewsletter.com/p/the-engineering-mindset-will-larson">Lenny&rsquo;s Newsletter</a>.</p> <p>I also gave talks at:</p> <ul> <li><a href="https://events.sapphireventures.com/hypergrowthengineeringsummit24/">2024 Hypergrowth Engineering Summit</a> (<a href="https://lethain.com/video-mental-model-for-how-to-use-llms-in-products/">recording of practice run</a>)</li> <li><a href="https://leaddev.com/leadingeng-new-york">LeadingEng New York, 2024</a> (<a href="https://lethain.com/video-developing-leadership-styles/">recording of practice run</a>)</li> <li><a href="https://qconsf.com/presentation/nov2024/ambiguous-roles-and-ambiguous-problems-navigating-life-principal-engineer">QCon SF November, 2024</a> (<a href="https://lethain.com/qcon-sf-2024-talk-video/">recording of practice run</a>)</li> </ul> <p>If you&rsquo;re wondering why I do more speaking on the years I publish new books, it&rsquo;s pretty straightforward: it helps sell more copies of the books, and books live on momentum. A good start in the first year contributes <em>forever</em> to its sales, as long as it&rsquo;s a timeless book (&ldquo;engineering leadership&rdquo;) rather than a timely one (&ldquo;details about the current version of a database&rdquo;).</p> <h2 id="videos">Videos</h2> <p>I&rsquo;ve been playing around with <a href="https://www.youtube.com/channel/UC6kz0ObZuo6UnEhpCK7EqZQ">creating YouTube videos</a> for my conference talks. My approach is pretty basic: whenever I give a talk, I record a practice run ahead of time, and then post it online after I&rsquo;ve given the talk at the conference. I don&rsquo;t ever expect my YouTube channel to go anywhere&ndash;hence the low production values&ndash;but I do think it&rsquo;s worthwhile to keep recordings as conference videos often disappear. Worst case, this means I have <em>some</em> recording of my talks since late 2023.</p> <h2 id="health">Health</h2> <p>This has been a pretty challenging year for me from a health perspective. I had two major vacations planned this year, and both got derailed by health concerns. The first I was recouperating with stitches from successful skin cancer surgery, and the second I was in bed with a fever for five day straight until my doctors were convinced my sickness was bacterial.</p> <p>The skin cancer diagnosis was unexpected, and was a pretty unsettling several months to go from detection through surgery. As I write this, I am treating myself with topical chemotherapy to prevent another swath of skin requiring surgery in the future, which is far less serious than it sounds, but unsettling nonetheless. The good news is that a few months out, I am <em>likely</em> on the other side of this batch of skin cancer risk, and one large scar on my neck is a small price to pay for good health.</p> <p>Probably the most challenging thing with the surgery and recovery, sickness and recovery, and so on, is that it just throws off the larger health routine. For the first time in my life, I&rsquo;m working with a personal trainer, which has been quite nice. I&rsquo;ve been an off and on lifter for the past several decades, and feel comfortable withi most lifts, but there have been a handful where I&rsquo;ve just never quite gotten the form right, and I&rsquo;m <em>finally</em> getting it now after working with my trainer. Similarly, there are some mobility issues preventing e.g. deep squatting, which I&rsquo;ve certainly not fixed, but I&rsquo;m able to better understand and make progress on.</p> <p>I&rsquo;m still running 1-2 times a week, although I&rsquo;ve dropped milleage down to 4 mile runs, and playing basketball for a couple hours each weekend. I&rsquo;m hopeful I&rsquo;ll be able to both extend milleage and incorporate a speed session, but I&rsquo;m honestly struggling to find the time (or perhaps, energy) to make those happen. Fingers crossed for 2025.</p> <h2 id="kindergarten-search-in-san-francisco">Kindergarten search in San Francisco</h2> <p>Without saying too much, I can say that I&rsquo;ve never experienced anything like the San Francisco process of applying to public and private kindergartens. That said, the schools themselves are just remarkable. Looking back at the schools I went to in North Carolina&hellip; comparing them to these schools is almost impossible. I&rsquo;m excited to see my son start next year, <em>and</em> will be excited to have this process behind us in a few months.</p> <h2 id="tracking-family-cashflow">Tracking family cashflow</h2> <p>It&rsquo;s slightly embarassing to admit, but I&rsquo;ve had a relatively bad grasp of the details of my financial situation over the past few years. Specifically, I was very comfortable <a href="https://lethain.com/personal-finances/">with how we were invested</a>, but was having a hard time with the details of cash management. How much <em>were</em> we cash flowing? How much were our high spend months the result of obvious annual expenses (e.g. property tax) versus one-off purchases?</p> <p>Our income streams are sufficiently complex, and I shuffle money between several bank accounts (one for bills, one for higher interest), that answering these questions had generally gotten beyond my ability to answer with conviction. This is a problem, because I&rsquo;m the family CFO, and I became increasingly uncomfortable that I didn&rsquo;t understand the financial consequences of various theoretical scenarios. For example, if I lost my job (or died, I suppose), where would that leave my family?</p> <p>I wanted to reclaim my confidence in our finances this year, and did a fair amount of research to determine a tool that would make this possible. Ultimately, I went with Monarch Money (but do your own research before picking one!), and was pretty impressed with how quickly I was able to correctly represent our cashflows. The most important part was marking the various flows to and from bank and investment accounts as transfers rather than in or outflows of cash.</p> <p>Altogether, I probably spent four or five hours over a week to get it fully configured, and now I can see my family&rsquo;s cashflow for the first time in my life. This is pretty powerful, and now I have a pretty clear understanding of how much we&rsquo;re spending (a bit more than I realized), and how much we would need to have invested to fully cover that spend. This has also made it much easier to sit down once or twice a year as a family and update our mental models on how things are going.</p> <h2 id="reading">Reading</h2> <p>The profession-adjacent related reading I did this year (by which I mean, most of my reading is fiction that I don&rsquo;t include here):</p> <ol> <li><em><a href="https://www.amazon.com/dp/1119594596">Financial Accounting, 11th Edition</a></em> by Weygandt, Kimmel, and Kieso</li> <li><em><a href="https://www.amazon.com/Hardcore-Software-Inside-Rise-Revolution-ebook/dp/B0CYBS9PFY/">Hardcore Software</a></em> by Steven Sinofsky</li> <li><em><a href="https://press.stripe.com/the-big-score">The Big Score</a></em> by Michael S. Malone</li> <li><em><a href="https://www.amazon.com/How-Asia-Works-Joe-Studwell/dp/0802121322">How Asia Works</a></em> by Joe Studwell</li> <li><em><a href="https://www.amazon.com/Loonshots-Nurture-Diseases-Transform-Industries/dp/1250185963">Loonshots</a></em> by Safi Bahcall</li> <li><em><a href="https://www.amazon.com/Chip-War-Worlds-Critical-Technology/dp/1982172002">Chip War</a></em> by Chris Miller</li> <li><em><a href="https://www.amazon.com/Practical-TLA-Planning-Driven-Development/dp/1484238281">Practical TLA+: Planning Driven Development</a></em> by Hillel Wayne (I thought that I&rsquo;d read this previously, but I&rsquo;m pretty sure I&rsquo;d just bought the Kindle version and struggled to get into it in that format. I actually read it this time as a paperback book.)</li> <li><em><a href="https://www.manning.com/books/architecture-modernization">Architecture Modernization</a></em> by Nick Tune with Jean-Georges Perrin</li> <li><em><a href="https://www.amazon.com/Superforecasting-Science-Prediction-Philip-Tetlock/dp/0804136718">Superforecasting</a></em> by Philip Tetlock and Dan Gardner</li> <li><em><a href="https://www.amazon.com/Accounting-Made-Simple-Explained-Pages/dp/0981454224">Accounting Made Simple</a></em> by Mike Piper</li> <li><em><a href="https://www.udemy.com/course/partnership-accounting/">Partnership Accounting - Financial Accounting</a></em> by Robert Steele (This is a Udemy course, not a book, but very much professional content.)</li> <li><em><a href="https://www.amazon.com/Team-Topologies-Organizing-Business-Technology/dp/1942788819">Team Topologies</a></em> by Matthew Skelton and Manuel Pais</li> <li><em><a href="https://www.amazon.com/Super-Founders-Reveals-Billion-Dollar-Startups/dp/1541768426">Super Founders</a></em> by Ali Tamaseb</li> <li><em><a href="https://www.amazon.com/Zero-Peter-Thiel-Blake-Masters/dp/0753555204/">Zero to One</a></em> by Peter Thiel</li> <li><em><a href="https://www.stripe.press/poor-charlies-almanack">Poor Charlie&rsquo;s Almanack</a></em> by Charles T. Munger</li> <li><em><a href="https://www.amazon.com/Domain-Driven-Design-Distilled-Vaughn-Vernon/dp/0134434420">Domain-Driven Software Distilled</a></em> by Vaughn Vernon</li> <li><em><a href="https://www.amazon.com/How-Will-Measure-Your-Life/dp/0062102419">How Will You Measure Your Life?</a></em> by Christensen, Allworth, and Dillon</li> <li><em><a href="https://writeusefulbooks.com/">Write Useful Books</a></em> by Rob Fitzpatrick</li> <li><em><a href="https://www.amazon.com/Building-Green-Software-Sustainable-Development/dp/1098150627">Building Green Software</a></em> by Anne Currie, Sarah Hsu, and Sara Bergman</li> <li><em><a href="https://www.amazon.com/Unit-Pentagon-Silicon-Valley-Transforming/dp/1668031388">Unit X</a></em> by Raj Shah and Christopher Kirchhoff</li> <li><em><a href="https://www.amazon.com/What-You-Do-Who-Are/dp/0062871331/">What You Do Is You Who Are</a></em> by Ben Horowitz</li> <li><em><a href="https://readwriteown.com/">Read Write Own</a></em> by Chris Dixon</li> <li><em><a href="https://www.amazon.com/gp/product/059371671X">Co-intelligence: Living and Working with AI</a></em> by Ethan Mollick</li> <li><em><a href="https://www.amazon.com/Cold-Start-Problem-Andrew-Chen-ebook/dp/B08HZ5XY7X/">The Cold Start Problem</a></em> by Andrew Chen</li> <li><em><a href="https://www.hachettebookgroup.com/titles/rick-wartzman/the-end-of-loyalty/9781586489151/">The End of Loyalty</a></em> by Rick Wartzman</li> <li><em><a href="https://abookapart.com/products/you-deserve-a-tech-union">You Deserve a Tech Union</a></em> by Ethan Marcotte</li> <li><em><a href="https://www.amazon.com/Build-Unorthodox-Guide-Making-Things/dp/0063046067">Build</a></em> by Tony Fadell</li> </ol>