November 5, 2019.
Imagine you woke up one day and found yourself responsible for a Site Reliability Engineering team. By 10AM, you’ve downloaded a free copy the SRE book, and are starting to get the hang of things. Then an incident strikes: oh no! Folks rally to mitigate user impact and later diagnosis and remediate the underlying cause, but a bunch of your users have a very bad day. Your shoulders are a bit heavier than just a few hours ago. You sit down with your team and declare your bold leader-y goal: next quarter we’ll have _zero_ _incidents_.
November 3, 2019.
About a year ago I started sending public weekly updates to a relevant public (within the company) mailing list. I've found the practice useful enough to write a few works on the how and why. This practice is sometimes called a 5-15 report reflecting the goal of spending fifteen minutes a week writing a report that can be read in five minutes.
October 31, 2019.
A few weeks ago I got the chance to speak at SRECon EMEA 2019, and the videos are up! This is the video of my talk, Investing in technical infrastructure.
October 27, 2019.
A couple days ago at Stripe's weekly incident review, we started a discussion on a topic that is always surprisingly controversial: healthchecks. I've been thinking about them since and have written up some related thoughts.
October 23, 2019.
An Elegant Puzzle was released on May 20th, 2019. In June I summarized what I learned writing the book, which says what I have to say about creating the book. Instead of retreading that material, I wanted to recap An Elegant Puzzle by the numbers.