How to practice backend engineering.
On a recent call, I chatted with someone about backend roles in software engineering, and what folks actually do in those roles. More than just what do these folks do, how would you practice for this kind of role or prepare for interviews?
Roughly the sorts of work that backend engineers are asked to take on versus the work that any engineer might be asked to take on, three categories of tasks stand out to me as being both frequent and practicable:
- Modeling and remodeling data - how do you design an effective data model for your application, and then evolve that data model as requirements shift over time?
- Designing and evolving interfaces - how other components integrate with your service?
- Integrating with APIs - how do you integrate your application with 3rd party APIs like Twilio, Stripe, and so on?
- Scaling capacity - how do you evolve your architecture to support more load over time?
At the bottom of this post I’ve collected some books and blog posts for each of those that may be helpful if that’s how you learn, but I also wanted to put together a project that folks could use to practice these.
Preamble on learning projects
Before you get started on the project, a few general notes on what I’ve generally found makes projects like this effective:
- Not the only way to learn - I want to start by caveating that these sorts of learning projects have always worked well for me, but there are many different ways to learn these sorts of things, and this one is particularly time intensive
- Narrow your focus - don’t try to learn a ton of new things at once. For example, if you’re focused on learning about integrating with APIs, then use a programming language you’re already comfortable with. This is particularly important for backend and infrastructure-style projects because you can spend your entire time trying to get Dockerfiles or Vagrant configurations work and never get to the actual learning you care about
- Use a source code repository - use a tool like Github to store your code so that you have examples to go back to over time. Working code examples that you understand are an amazing debugging and refresher tool
- Use an ephemeral environment - it’s totally fine to work on your laptop, but if you’re able to use a cheap service like DigitalOcean Droplets ($5/month) or Amazon Lightsail ($4/month), you can avoid spending time fixing your local environment and you can just delete everything and start over if something goes particularly wrong. Glitch is also a great option, although you wouldn’t want to use it for the scalability practice
Project definition
For the project itself, I’ll outline a series of steps to take along with the intended learning from each step. This is intentionally a bit vague for you to play around with.
- Scaffolding - getting the pieces ready
- Create a repository on Github (or your code hosting of choice) for this new project, and setup an HTTP server using the framework of your choice. If you’re using Python, that might be Flask, add an endpoint at “/”
- Within your repository, create another directory named “client” which holds an HTTP client to call your service. If you’re using Python, you might use the requests library. It should be able to call your your “/” endpoint and print out the response
- Add a database of your choice to the HTTP server, add a table to it and start writing every request to “/” to that table. You might use SQLite or MySQL or PostgreSQL
- Evolving data model and interfaces - evolving an existing application
- Update your server to offer two endpoints, one to send a message and another to retrieve recently sent messages. The API should support specifying the number of recent messages to retrieve
- Update your client to use those two endpoints. The client should be able to send and retrieve messages
- Update your server to return messages sent after a point in time. For example, “all messages since 10AM this morning”. This will require adding a new column to your data model, and to store the time created for new messages. You’ll also have to figure out a plan to migrate the existing messages forward. Do you default to the current time for existing messages? Try not to just drop the existing table, migrating the data is an important part of this
- [Bonus task] Add an API that allows you to respond to an existing message, and support returning all replies to a message along with that message. Update your client to render replies differently so you can tell which messages are replies and what message they are a reply to
- Integrating with external APIs - add another API
- Add a Twilio integration which allows you to text a message and for it to get added as a message in your service the same as if you used the client to create the message
- [Bonus task] Create a Slack app which allows you to send and retrieve messages to your server using Slack. I wrote up some notes last year of doing something similar
- Scaling capacity - how can you evolve your server to support more load?
- Download Locust and set it up to create load against your server. Setup three different load tests: one that does only reads, one that does only writes, and one that does a mix of 50% reads and 50% writes
- Run the “reads only” load test against your server. How much scale can it tolerate? How can you figure out where you’re spending the most time (hint: try searching for “performance profiling”)? How can you modify your server to support more load (hint: one simple initial strategy might be an in-memory cache, but make sure to think about cache invalidation)?
- Run the “writes only” load test against your server. How much scale can it tolerate? How can you protect the overall stability of the service against too many writes (hint: try searching for “ratelimiting”)?
- Run the “writes and reads” load test against your server. How much scale can it tolerate? What other techniques could you deploy to scale it up? Is it slow in the server or in the database? How do you know? Write a list of things you’d use to identify which is the case and how you could address it. (Actually making these sorts of fixes might lead you down the path of spending more money on hosting than you want to, so it’s fine if you don’t implement them!)
Completing all of these steps ought to give you a fairly representative look into the lifecycle of creating, evolving and maintaining an application from a backend engineering perspective. If this feels too easy, try introducing new elements like a new database, more kinds of load, more complex new requirements for your API to support, and so on.
Resources
Beyond this sort of practice, some resources that might be helpful:
- Designing Data-Intensive Applications - this book is all the rage lately and comes highly recommended from many folks for introduction to designing and scaling applications.
- Introduction to architecting systems for scale - a blog post I wrote to provide a summary of web scalability techniques.
- Acing Your Architecture Interview - another blog post I wrote discussing strategies to use in architecture interviews.
- Building Scalable Websites - this book is fairly dated at this point, but it was my first entry point to web scalability and I found it very approachable if you’re looking for a quicker read than Designing Data-Intensive Applications.
- Web Scalability for Startup Engineers - I haven’t read it nor am I familiar with others reading it, but from my quick research and the reviews this seems like an updated version of Building Scalable Websites, which might be worth checking out.