System Design: Cache me if you can

3 min readJan 13, 2025

Have you ever woken up in the middle of the night, thirsty as hell — especially after a night of partying — and reached for a water bottle on your side table, only to realize it’s empty? Now you’re cranky, slipping on your slippers, and embarking on what feels like a mile-long trek to the kitchen. If this sounds familiar, you’re not alone. And if it doesn’t, maybe I’m just weird.

But let me introduce you to the brilliant concept of caching — because sometimes, life calls for a little preemptive hydration strategy. I am just stretching this a bit now, let’s just get into caching.

What is Caching? : The easy part

Simply put, I don’t want to keep going to the kitchen every time I need water. I know I’ll need it before I fall asleep, maybe in the middle of the night, and definitely in the morning. To save myself multiple trips to the kitchen, I fill up a water bottle and keep it by my bedside.

This is similar to how caching works.

In technical terms, caching is the process of storing copies of frequently accessed data in a temporary storage location, allowing quick access for your service or application.

When to use?

I will not beat around the bush. If your application or logic involves making external system calls for frequently accessed data, implementing a cache is an excellent solution.

Remember

Systems work best if they are kept simple rather than made complex. Don’t create a problem to fit a solution.

Simple use case: Caching credential storage results

Do remember that caching is a concept, not an application. The use case mentioned here serves as an example of application-level caching. Caching has diverse applications in the field of computer science. In computer architecture, OS-level caching is used to optimize process execution and disk access. In content distribution, web caching (e.g., CDNs) helps reduce latency by storing content closer to end-users.

Imagine you’re building a microservice that processes large amounts of data in real-time and frequently needs to access secrets, tokens, or configurations from a service like HashiCorp Vault.

The typical flow might look like this:

Authenticate with Vault
Retrieve the secret
Use the secret
Repeat as needed

The Challenge

Increased response time: Frequent API calls to Vault add latency.
Rate limits: Vault enforces rate limits, which can throttle excessive requests, impacting availability.
Overhead: Repeated authentication and secret retrieval introduce unnecessary computational and network load.

The Solution: Add a Caching Layer

By introducing a caching layer, you can significantly optimize the flow:

Authenticate with Vault and cache the token.
Pre-fetch critical secrets and store them in the cache.
During operation, check the cache for the requested secret:

If the secret exists in the cache, use the cached version.
If the secret is missing or expired, fetch it from Vault, update the cache, and then use it.

Tradeoffs: The hard part, not covered in this Blog. LOL

Like everything in life, caching isn’t perfect and comes with its tradeoffs. The most challenging aspect is cache invalidation — determining when and how to remove or update cached data to ensure it remains accurate and relevant.

Cache invalidation is often considered a difficult problem in computer science, so if you’re reading this, I hope you find a solution.

I’ll create a separate blog to dive deeper into this issue, discussing it from both low-level and high-level perspectives. Until then, cheers!