July 2, 2013 - No Comments!

Russian Doll Caching in Rails

Matryoshka-dollsRussian dolls: not just a knickknack you bring home from Moscow or a short-lived Lifetime reality TV show. It’s a powerful caching technique that we used to make the new RogerEbert.com fly. Russian Doll Caching is the default in the new Rails 4, so it’s best to start getting used to it.

Traditional page and action caching are powerful and speedy, but the control they offer is often coarse and unwieldy. Say you want to update some text in the shared site header. You now have to invalidate every single page on the site. All those pages will have to be fully rendered the next time someone visits them.

Expiring expire_fragment

If you’re using traditional fragment caching, you’ll find yourself face-to-face with the cache often. Any time you add a new fragment, you better go add an expire_fragment for anywhere where it might get updated. In a cache store like Redis or FileStore you can use regular expressions to delete a series of keys, but that process is very slow since behind the scenes the store is comparing every single key to the Regex. In Memcached, you don’t even have that and thus need to delete every single key individually.

Enter the obligatory Phil Karlton quote: “There are only two hard problems in Computer Science: cache invalidation and naming things.” In this new caching paradigm, we build on fragment caching, but flip cache expiration inside out. The beautiful secret to Russian doll caching is you never explicitly invalidate anything. Never again will you need to expire_fragment. Instead, we just use a brand new key.

We’ll get into the details of this soon, but like a proper Matryoshka doll, let’s start from the outside in.

An Example View

Here is an example from RogerEbert.com. I’m simplifying the code for the sake of brevity, but all the salient ideas will still be there. For reference, we are using Mongoid as our ORM on top of MongoHQ and the Dalli memcached store over Amazon’s Elasticache. If you’re using ActiveRecord, these techniques still apply, although the syntax might be slightly different.

The basic organization of the site is pieces of content organized within channels. A piece of content belongs to primary channel.1

Here’s a channel listing taken from www.rogerebert.com/festivals-and-awards and below it, simplified view code:

Ebert Channel Page

What's the Key?

The cache view helper takes an array as its first argument. It concatenates all the pieces together, calling #cache_key for any objects that respond to that method. We’ll get to cache_key’s implementation soon, but for now, know that each model object has a unique cache key. A naïve implementation might be "#{self.class.model_name}/#{id}"2.

[:index, :page, @contents.current_page, @channel] would create a cache key like views/index/page/1/channels/123 and nested within that cache, [:listing, content] would turn into views/listing/contents/12345. An advantage of this is it allows us to easily share a cached fragment across pages. If the piece of content falls off to the next page and we have to re-render the main index, we can still use the other individual content fragments from cache.

This still leaves us with keys that must be invalidated. How do we account for that? What if #cache_key were smarter than our previous implementation? And, what if every model instance had some persistent attribute that gets changed every time the instance gets updated. It’s a good thing we’ve got updated_at handy, and the default implementation of #cache_key already takes advantage of this.

If you look at the source, you’ll see that the key is a combination of the model name, its ID, and its updated_at timestamp. So if you call #update_attributes or #save on a model, that updated_at field will get the new time and voila! #cache_key will be different the next time you ask for it.

So a normal ActiveRecord cache key might look like contents/1234-20130519024351. Mongoid will look a little crazier: contents/5194dd8d4206c510c2000001-20130528101518.

Combine that with a context like :listing, and you have a unique cache key for each fragment that will be updated any time you update the object.

And thanks to touch: true that we have on the belongs_to :primary_content association, any time you make an update to the piece of content, it will also touch the channel and give it a new updated_at timestamp.

So, imagine that we have views/index/page/1/channels/1234-20130528101518 as a cache key. We’re still having to look up the channel from the database, but we don’t have to hit the database to find the @channel.contents association (it’s still just a Mongoid::Criteria or ActiveRecord::Relation at this point waiting to be lazy-loaded). If we update the title of one of the pieces of content, we get a new cache key for the content and for the channel. The next time someone visits the page, we do have to hit the database to get the contents' updated_at fields, but we only have to render 1 out of 10 of the :listing fragments (there are ten posts per page); all the rest are fetched from the cache.

What about the old cached fragment? Don’t we need to delete it? The short answer is “who cares” and “no.” If you’re using a store like Memcached and it needs more space, it will automatically purge any key/values that haven’t been accessed recently.

There are still things to be aware of, such as what happens if your code changes; how do you invalidate the cache then? That's where cache digests come in, and a subject for another post.

In the meantime, start nesting your fragments and speed up your site's performance!


1 A piece of content can belong to additional channels, but I’m leaving that out for simplicity. The primary channel is the important one, since it's what we use for determining the canonical URL of a piece of content.

2 In fact, this is the return value of one of the three cases in the default cache_key implementation.

Published by: Jason Hanggi in Developers
Tags: , , ,

Leave a Reply