A faster way to cache complicated data models

When your data model gets complicated, and your APIs hit that sad 1 second response time, there’s usually an easy fix: :includes. When you preload your model’s associations, you won’t make as many SQL calls. And that can save you a ton of time.

But then your site slows down again, and you think about caching responses. And now you have a problem. Because if you want to get responses from the cache:

results = {lawyer_1: 1, lawyer_2: 2, lawyer_3: 3}
cached_objects = Rails.cache.fetch_multi(results.keys) do |key|
  Lawyer.find(results[key]).as_json
end

You’ve now lost all your :includes. Can you have both? How do you get a fast response for your cached objects, and still load the objects that aren’t in the cache, quickly?

There’s a lot to do, so thinking about it is tough. It’s easier when you break the problem apart into smaller pieces, and come up with a simple next step.

So what’s the first thing you can do? To do much of anything, you need to know which objects are in your cache, and which ones you still need to find.

Separate the cached from the uncached

So, say you have a bunch of cache keys:

cache_keys = [:key_1, :key_2, :key_3]

How can you tell which of these are in a cache?

ActiveSupport::Cache has a handy method called read_multi:

# When only lawyer_1 is cached

cache_keys = [:lawyer_1, :lawyer_2, :lawyer_3]
Rails.cache.read_multi(cache_keys) # => {:lawyer_1 => {"id": 1, "name": "Bob the Lawyer"} }

read_multi returns a hash of {key: value} for each key found in the cache. But how do you find all the keys that aren’t in the cache? You can do it the straightforward way: Loop through all the cache keys and find out which ones aren’t in the hash that read_multi returns:

cache_keys = [:lawyer_1, :lawyer_2, :lawyer_3]
uncached_keys = []

cached_keys_with_values = Rails.cache.read_multi(cache_keys)

cache_keys.each do |key|
  uncached_keys << key unless cached_keys_with_values.has_key?(key)
end

So, what do you have now?

An array of all the cache keys you wanted objects for.
A hash of {key: value} pairs for each object you found in the cache.
A list of the keys that weren’t in the cache.

And what do you need next?

The values for the keys that weren’t in the cache. Preferably fetched all at once.

That’s your next step.

Preload the uncached values

Soon, you’ll have to find an object using a cache key. To make things easier, you can change the code to something like:

cache_identifiers = {lawyer_1: 1, lawyer_2: 2, lawyer_3: 3}
cache_keys = cache_identifiers.keys
uncached_keys = []

cached_keys_with_values = Rails.cache.read_multi(cache_keys)

cache_keys.each do |key|
  uncached_keys << key unless cached_keys_with_values.has_key?(key)
end

So cache_identifiers now keeps track of the cache key and the object id to fetch.

Now, with your uncached keys:

uncached_keys # => [:lawyer_2, :lawyer_3]

And your cache_identifiers hash:

cache_identifiers # => {lawyer_1: 1, lawyer_2: 2, lawyer_3: 3}

You can fetch, preload, and serialize all those objects at once:

uncached_ids = uncached_keys.map { |key| cache_identifiers[key] }
uncached_lawyers = Lawyer.where(id: uncached_ids)
                         .includes([:address, :practice_areas, :awards, ...])
                         .map(&:as_json))

So what do you have now?

An array of all the cache keys you wanted objects for to begin with.
A hash of {key: value} pairs for each object found in the cache.
A list of the keys that weren’t in the cache.
All the values that weren’t found in the cache.

And what do you need next?

To cache all the values you just fetched, so you don’t have to go through this whole process next time.
The final list of all your objects, whether they came from the cache or not.

Cache the uncached values

You have two lists: one list of uncached keys and another of uncached values. But to cache them, it’d be easier if you had one list of [key, value] pairs, so that your value is right next to its key. This is an excuse to use one of my favorite methods, zip:

[1, 2, 3].zip(["a", "b", "c"]) # => [[1, "a"], [2, "b"], [3, "c"]]

With zip, you can cache your fetched values easily:

uncached_keys.zip(uncached_lawyers).each do |key, value|
  Rails.cache.write(key, value)
end

What do you have now?

An array of all the cache keys you wanted objects for to begin with.
A hash of {key: value} pairs for each object found in the cache.
A list of formerly-uncached values that you just cached.

And what do you still need?

One big list of all your objects, whether they came from the cache or not.

Bring it all together

Now, you have an ordered list of cache keys:

cache_keys = cache_identifiers.keys

Your list of the objects you fetched from the cache:

cached_keys_with_values = Rails.cache.read_multi(cache_keys)

And your list of objects you just now grabbed from the database:

uncached_ids = uncached_keys.map { |key| cache_identifiers[key] }
uncached_lawyers = Lawyer.where(id: uncached_ids)
                         .includes([:address, :practice_areas, :awards, ...])
                         .map(&:as_json))

Now you just need one last loop to put everything together:

results = []
cache_keys.each do |key|
  results << cache_keys_with_values[key] || uncached_lawyers.shift
end

That is, for each cache key, you grab the object you found in the cache for that key. If that key wasn’t originally in the cache, you grab the next object you pulled from the database.

After that, you’re done!

Here’s what the whole thing looks like:

cache_identifiers = {lawyer_1: 1, lawyer_2: 2, lawyer_3: 3}
cache_keys = cache_identifiers.keys
uncached_keys = []

# Fetch the cached values from the cache
cached_keys_with_values = Rails.cache.read_multi(cache_keys)

# Create the list of keys that weren't in the cache
cache_keys.each do |key|
  uncached_keys << key unless cached_keys_with_values.has_key?(key)
end

# Fetch all the uncached values, in bulk
uncached_ids = uncached_keys.map { |key| cache_identifiers[key] }
uncached_lawyers = Lawyer.where(id: uncached_ids)
                         .includes([:address, :practice_areas, :awards, ...])
                         .map(&:as_json))

# Write the uncached values back to the cache
uncached_keys.zip(uncached_lawyers).each do |key, value|
  Rails.cache.write(key, value)
end

# Create our final result set from the cached and uncached values
results = []
cache_keys.each do |key|
  results << cache_keys_with_values[key] || uncached_lawyers.shift
end
results

Was it worth it? Maybe. It’s a lot of code. But if you’re caching objects with lots of associations, it could save you dozens or hundreds of SQL calls. And that can shave a ton of time off of your API responses.

At Avvo, this pattern has been incredibly useful: a lot of our JSON APIs use it to return cached responses incredibly quickly.

The pattern has been so useful that I wrote a gem to encapsulate it called bulk_cache_fetcher. So if you ever find yourself trying to cache big, complicated data models, give it a try!

A Faster Way to Cache Complicated Data Models

Separate the cached from the uncached

Preload the uncached values

Cache the uncached values

Bring it all together

Did you like this article? You should read these:

Comments