August 25, 2012.
My previous coworker Can Duruk recently posted an excellent article covering some web development tips he picked up working at Digg. One of his comments about pagination provoked some confusion on Hacker News. How could, one wonders from their comfy armchair, someone who works at a so-called "internet company" be so incompetent as to have trouble with pagination?
First, let's take a look at Can's comment:
Counts are hard. Pagination is harder. Therefore, don’t show counts on pagination. Flawless logic.
Fundamentally, there are two straightforward conceptual ways to describe a starting position in a list of content (this is an abstract concept of list, but are probably rows in a database ):
offset which is an index in a list
cursor (probably a string, but maybe a timestamp or some such), which can uniquely
describe an item to start at (or after), probably a primary key or identifier or some kind.
Both approaches would also use an integer
count which describes the quantity of things to return.
offset approach has a lot of user experience benefits going for it, as it allows jumping into the list
at any position, as opposed to forcing the user to scroll (perhaps, endlessly!) through the content to
get to a given page. It's also incrediably simple to pagination correctly: the next page always starts at
offset + count, the previous page starts
offset - count.
cursor approach does not have much going for it. It's clunky because you can't jump into arbitrary
positions in the list, and if you use a different cursor than the thing's unique identifier then you can confuse
the hell out of yourself, especially if they're both formatted timestamps prepended to uuids
(not that we did that). (We did.)
The only positive thing you can say for the
cursor approach is that within some constraints, it's the only thing that works.
In our case we were storing most lists of content within Cassandra,
where the datastructures available are ColumnFamilies filled with Rows, filled with columns.
In normal Cassandra configurations, Rows are randomly ordered to support hashing content across a ring of nodes, and Columns within a given Row are ordered by their key. In that sense, you can think of a ColumnFamily as an unordered dictionary with strings as keys and Rows as values, and you can think of Rows as an ordered dictionary with strings as keys and strings as values. (Ok, they're probably all bytes instead of strings, hang me later.)
Within the constraints of storing stories within Cassandra, you're generally going to end up using cursors instead of offsets if you're dealing with frequent writes, which generally we were. If you're dealing with a low number of writes, you can do any number of asinine things to work around the limitation. With large number of writes, any strategy I've been able to think up for index based access against Cassandra runs into the tombstone implementation. (Who knows, maybe there are new APIs which make this stuff possible.)
In the end, we were able to move away from using cursors and standardize on
pagination, which we accomplished by moving our dynamicly calculated storylists into Redis sorted sets,
which are a great complement to Cassandra for handling dynamicly sorted content (using unique identifiers for the key) and storing
themselves in Cassandra, accessible via the unique identifiers retrieved from the Redis sorted set.
The one storylist we kept in Cassandra was the Top News storylist, which was only recalculated periodically. We prematerialized each combination of filters (type of content, topic, etc) and stored each one as a freaking JSON blob stored within a Cassandra column. As far as implementations go, all you can really say about that one was "it worked".
Anyway, that's why pagination was hard.
Another aspect of pagination which is awkward is when you're ranking content by a dynamic score, such that a given piece of content can go from position three to position ten, and then in a later surge of popularity from position ten to position two.
In this case offsets are liable to show the same content multiple times on different pages (because the content moved up or down between loading the first and second pages), and cursors are liable to lose position entirely (if the cursor drops from tenth to fifteenth position, when you load the next page you'll skip positions ten through fifteen entirely).
We generally didn't try to solve this problem, but for the third iteration of MyNews the storylist was changing so aggresively, that we did implement a solution for consistent pagination. That version of MyNews took content your friends were Digging, submitting or commenting on, plus content from Newsrooms you participated in most, plus trending content across Digg itself, threw it all together and ranked it according to a variety of factors. In the initial internal implementation, that meant that content could jump around quite a bit from one page to the other.
Our solution was:
offsetwas zero, then generate the first 225 items in the MyNews storylist, store that ordered list of itemids as a JSON blob inside Redis, and return the first
offsetwas not zero and the MyNews JSON blob existed in Redis, retrieve the blob and slice the
countitem_ids from within.
ossetwas not zero, and the MyNews JSON blob did not exist, then generate it, store it, and retrieve the
countitem_ids. (This was an edgecase for people refreshing old pages or if the blob got evicted from the cache, which in general shouldn't happen in a short period of time, but could happen.)
This implementation has the drawback of content becoming stale on non-first pages, but our experience with the user behavior was that that wasn't important, as people looking for newer content would go to the first page rather than hiding around on later pages.
Can also alludes to another problem we experienced with our storylist implementations which relied on Cassandra when he says
Therefore, don't show counts on pagination.
This one was really entirely our fault, but as we materialized more and more lists of stories, we often lost track of some of them, such that when a story was deleted (because the submitter closed their account, etc) or hidden because it was identified as a spam, then it would still be returned to the frontend when it queried for stories. The frontend would then hide that story, but would only have, say, twelve valid stories to display when it had asked the backend for fifteen.
As the number of materialized views grew to include half a dozen per Newsroom, six per user's MyNews, and so on, we ended up with an awkward compromise, which is we would repair storylists on read (removing any invalid, expired, spam and such). In most cases this was a backup mechanism rather than the primary mechanism for keeping the views consistent, but meant some reads would return less than the desired quantity of items.
Like a lot of our solutions at Digg, all you can say is that it worked, and worked at a moderately high scale. Maintaining a registry of all the storylists a given article was in and then cleaning them up on modification would have been much prettier, but if a story was in several hundred thousand storylists, many of which will never actually be read, then the repair-on-read mechanism may be preferable (it also self-repairs in the case of failures, which is a bonus when dealing with distributed systems with non-zero error rates).