YouTube Scalability

Published on July 8, 2007. scaling (3)

The Tech Talk, available here was part of the Seattle Conference on Scalability, and was a presentation on how YouTube dealt with scalability issues. This is a topic I have been pretty interested in recently. (Trying to get this site to scale to 5 users instead of 1.) I recently ordered a book by Cal Henderson (tech lead for Flickr) on scalability in web applications, I am looking forward to it getting here tommorow (along with a non-pdf copy of SICP, deadtree books are still king in my world).

It was a passable talk, but I definitely would have preferred for it to have been a bit more concrete, and to focus more on their good solutions than their failed ones (also, statements like "we use bigtable for that now" just make me want to cry in a corner, thanks man, where are we supposed to get that? oh, and what about the hundreds of thousands of servers to support it? some seedy guy on a street corner?). Anyway, on with the notes.

Scaling is constant iteration of finding and fixing bottlenecks
Python is "fast enough" most of the time
Psycho to JIT compile Python
Write C extensions for computation heavy bits (encryption)
Scale by adding more machines & load balancing
Lighttpd to serve static content, Apache to serve dynamic content
Each video is hosted by a mini-cluster (redundancy in case of failure, load spike)
Keep hardware simple & cheap (cheaper support, cheaper service, easier to find existing resources for help)
Always start simple and build on that
Lighttpd is single process by default (leads to poor performance in some situations)
MySQL Database Replication scales poorly for write intensive systems, writes crowd out reads
Local Memory > Anything Else, Disk Reads < Anything Else
vmstat for profiling
many layers of caching built on top of each other
database partitioning is the best long-term solution

To elaborate on database partitioning briefly, the idea is to have lots of small databases (thus there are no replication delay issues), along with a hashing mechanism for determining which database to look in for a particular piece of data. This increases locality of caches (improving their quality), and also means that these smaller databases are more portable, can be stored in memory more easily, and the small size also helps with duplication (for backups, etc).

As you can see, I didn't pick up on many details. Ah well, c'est la vie. Oh, there are a bunch of other TechTalks available from other presenters at the Seatle Conference on Scalability. Some of them may be a bit more concrete.