Archive · · 4 min read

Fairness in multi-tenant systems

The Amazon Builder's Library is a great set of deep dive papers into the challenges with modern systems. This post highlights some of the challenges in dealing with multi-tenant systems.

Fairness in multi-tenant systems

The Amazon Builder’s Library is a great set of deep dive papers into the challenges with modern systems. This post highlights some of the challenges in dealing with mult-tenant systems.

Fairness in multi-tenant systems, looks at the challenges of balancing loads within multi-tenant systesm. Specifically, issues around handling API requests to these systems.

I call out a few more details in the Twitter thread below…

Tweet 1/16 👇 Next tweet

today we’re looking at “Fairness in multi-tenant systems” by @dyanacek from the Amazon Builder’s library

it’s available at https://aws.amazon.com/builders-library/fairness-in-multi-tenant-systems/

🧵☁️ cloud devops @awscloud

Tweet 2/16 👇 Next tweet 👆 Start

you can find this thread unrolled at https://t.co/henuJwkAKm

…and yesterday’s thread on “Making retries safe with idempotent APIs” at https://markn.ca/2021/making-retries-safe-with-idempotent-apis/

🧵☁️ cloud devops

Tweet 3/16 👇 Next tweet 👆 Start

I’ve helped a lot of teams build out multi-tenant systems. it’s a fascinating problem space. you’re trying to find the right balance between isolation & economy of scale

it’s not easy

🧵☁️ cloud devops

Tweet 4/16 👇 Next tweet 👆 Start

before we dive into this 📑, let me just add the “SaaS Lens” from the Well-Architected Framework to your reading list as well

this Lens explains a few different multi-tenancy models

it’s at https://docs.aws.amazon.com/wellarchitected/latest/saas-lens/saas-lens.html

🧵☁️ cloud devops

Tweet 5/16 👇 Next tweet 👆 Start

ok, back on track, this paper from @dyanacek looks at how Amazon manages APIs requests in order to avoid overload

what does that have to do with multi-tenancy? you ask (you did ask, right?)

well…

🧵☁️ cloud devops

Tweet 6/16 👇 Next tweet 👆 Start

it's economy of scale bit. you optimize resource usage (& thus spend) by making sure that you’re streamlining the use of your service

@dyanacek takes it a step further in this 📑 & shows how you can use this to reduce pressures on your systems as well

🧵☁️ cloud devops

Tweet 7/16 👇 Next tweet 👆 Start

first up, is the case for multitenancy. there’s a few well structured arguments in the paper, but basically it’s all about resource optimization

idle = bad (generally)

🧵☁️ cloud devops

Tweet 8/16 👇 Next tweet 👆 Start

one of the biggest downfalls is rightfully called out by @dyanacek as well; tenants impacting each other

if we’re each sharing a resource, what if I grab more than my fair share?

this is something we need to solve for…

🧵☁️ cloud devops

Tweet 9/16 👇 Next tweet 👆 Start

which is why this 📑 is a great one, it transitions out of the example highlighting the +/- and into “fairness”

summed up as, “every client in a multi-tenant system is provided with a single-tenant experience” << or at least they SHOULD be

🧵☁️ cloud devops

Tweet 10/16 👇 Next tweet 👆 Start

one area that gets more attention is the case when demand is outpacing supply (which should increasing as scaling catches up). what do you then?

the author introduces the concept of “load shedding”

🧵☁️ cloud devops

Tweet 11/16 👇 Next tweet 👆 Start

…a/k/a saying 🚫 quickly with little resource cost. this can relieve the pressure on the backend & clients can easily retry using the techniques we highlighted in previous threads 👆

🧵☁️ cloud devops

Tweet 12/16 👇 Next tweet 👆 Start

load shedding isn’t enough to solve the issue of fairness, that’s where “rate limiting” comes into play. you can use this technique to “shape unplanned increases in traffic”

the paper details how & what to look out for when adding this pattern to your design

🧵☁️ cloud devops

Tweet 13/16 👇 Next tweet 👆 Start

quotas go hand-in-hand with rate limited & the author spends quite a bit of time of them as well (rightfully so!)

there’s a fine art to implementing a quota system & the 📑 does a good job of providing a sold overview

🧵☁️ cloud devops

Tweet 14/16 👇 Next tweet 👆 Start

all of these techniques are known as “admission control systems”

the paper highlights how Amazon uses these (and why, and when), showing a few different models and patterns that could help you out

🧵☁️ cloud devops

Tweet 15/16 👇 Next tweet 👆 Start

finally the paper dives into architecture design patterns that can help reduce your need for these techniques.

nothing is every perfect, so knowing all of the tools ⚒️ at your disposal is critical

🧵☁️ cloud devops

Tweet 16/16 👇 Next tweet 👆 Start

all-in-all, this is a FANTASTIC paper by @dyanacek highlight years of hard won learning.

it builds nicely on the other two papers I’ve mini-reviewed this week. I really think you should add this one to your reading list!

/🧵☁️ cloud devops

Read next