When your data model gets complicated, and your APIs hit that sad 1 second response time, there’s usually an easy fix: :includes
. When you preload your model’s associations, you won’t make as many SQL calls. And that can save you a ton of time.
But then your site slows down again, and you think about caching responses. And now you have a problem. Because if you want to get responses from the cache:
You’ve now lost all your :includes
. Can you have both? How do you get a fast response for your cached objects, and still load the objects that aren’t in the cache, quickly?
There’s a lot to do, so thinking about it is tough. It’s easier when you break the problem apart into smaller pieces, and come up with a simple next step.
So what’s the first thing you can do? To do much of anything, you need to know which objects are in your cache, and which ones you still need to find.
Separate the cached from the uncached
So, say you have a bunch of cache keys:
How can you tell which of these are in a cache?
ActiveSupport::Cache
has a handy method called read_multi
:
read_multi
returns a hash of {key: value}
for each key found in the cache. But how do you find all the keys that aren’t in the cache? You can do it the straightforward way: Loop through all the cache keys and find out which ones aren’t in the hash that read_multi
returns:
So, what do you have now?
- An array of all the cache keys you wanted objects for.
- A hash of
{key: value}
pairs for each object you found in the cache. - A list of the keys that weren’t in the cache.
And what do you need next?
- The values for the keys that weren’t in the cache. Preferably fetched all at once.
That’s your next step.
Preload the uncached values
Soon, you’ll have to find an object using a cache key. To make things easier, you can change the code to something like:
So cache_identifiers
now keeps track of the cache key and the object id to fetch.
Now, with your uncached keys:
And your cache_identifiers
hash:
You can fetch, preload, and serialize all those objects at once:
So what do you have now?
- An array of all the cache keys you wanted objects for to begin with.
- A hash of
{key: value}
pairs for each object found in the cache. - A list of the keys that weren’t in the cache.
- All the values that weren’t found in the cache.
And what do you need next?
- To cache all the values you just fetched, so you don’t have to go through this whole process next time.
- The final list of all your objects, whether they came from the cache or not.
Cache the uncached values
You have two lists: one list of uncached keys and another of uncached values. But to cache them, it’d be easier if you had one list of [key, value]
pairs, so that your value
is right next to its key
. This is an excuse to use one of my favorite methods, zip
:
With zip
, you can cache your fetched values easily:
What do you have now?
- An array of all the cache keys you wanted objects for to begin with.
- A hash of
{key: value}
pairs for each object found in the cache. - A list of formerly-uncached values that you just cached.
And what do you still need?
- One big list of all your objects, whether they came from the cache or not.
Bring it all together
Now, you have an ordered list of cache keys:
Your list of the objects you fetched from the cache:
And your list of objects you just now grabbed from the database:
Now you just need one last loop to put everything together:
That is, for each cache key, you grab the object you found in the cache for that key. If that key wasn’t originally in the cache, you grab the next object you pulled from the database.
After that, you’re done!
Here’s what the whole thing looks like:
Was it worth it? Maybe. It’s a lot of code. But if you’re caching objects with lots of associations, it could save you dozens or hundreds of SQL calls. And that can shave a ton of time off of your API responses.
At Avvo, this pattern has been incredibly useful: a lot of our JSON APIs use it to return cached responses incredibly quickly.
The pattern has been so useful that I wrote a gem to encapsulate it called bulk_cache_fetcher. So if you ever find yourself trying to cache big, complicated data models, give it a try!