Scaling YouTube. It was a decent talk, and I jotted down some notes while listening to it. This article is my compiled notes, because the ideas were occasionally good, but probably did not deserve a 52 minute powerpoint demonstration (filled with jokes where the speaker intentionally pauses... but then quickly aborts the pause when the expected laughter fails to materialize from the abyss)."/> YouTube Scalability

YouTube Scalability

July 8, 2007. Filed under scaling 3

The Tech Talk, available here was part of the Seattle Conference on Scalability, and was a presentation on how YouTube dealt with scalability issues. This is a topic I have been pretty interested in recently. (Trying to get this site to scale to 5 users instead of 1.) I recently ordered a book by Cal Henderson (tech lead for Flickr) on scalability in web applications, I am looking forward to it getting here tommorow (along with a non-pdf copy of SICP, deadtree books are still king in my world).

It was a passable talk, but I definitely would have preferred for it to have been a bit more concrete, and to focus more on their good solutions than their failed ones (also, statements like "we use bigtable for that now" just make me want to cry in a corner, thanks man, where are we supposed to get that? oh, and what about the hundreds of thousands of servers to support it? some seedy guy on a street corner?). Anyway, on with the notes.

  • Scaling is constant iteration of finding and fixing bottlenecks
  • Python is "fast enough" most of the time
  • Psycho to JIT compile Python
  • Write C extensions for computation heavy bits (encryption)
  • Scale by adding more machines & load balancing
  • Lighttpd to serve static content, Apache to serve dynamic content
  • Each video is hosted by a mini-cluster (redundancy in case of failure, load spike)
  • Keep hardware simple & cheap (cheaper support, cheaper service, easier to find existing resources for help)
  • Always start simple and build on that
  • Lighttpd is single process by default (leads to poor performance in some situations)
  • MySQL Database Replication scales poorly for write intensive systems, writes crowd out reads
  • Local Memory > Anything Else, Disk Reads < Anything Else
  • vmstat for profiling
  • many layers of caching built on top of each other
  • database partitioning is the best long-term solution

To elaborate on database partitioning briefly, the idea is to have lots of small databases (thus there are no replication delay issues), along with a hashing mechanism for determining which database to look in for a particular piece of data. This increases locality of caches (improving their quality), and also means that these smaller databases are more portable, can be stored in memory more easily, and the small size also helps with duplication (for backups, etc).

As you can see, I didn't pick up on many details. Ah well, c'est la vie. Oh, there are a bunch of other TechTalks available from other presenters at the Seatle Conference on Scalability. Some of them may be a bit more concrete.