πŸ’‘

This post was written 3 years ago, it may be out of date, my opinion might have changed, and/or the writing may be embarrassingly bad. Read with caution.

Timeouts, retries, and backoff with jitter

Fall/2021 – 2 min read

The Amazon Builder's Library is a great set of deep dive papers into the challenges with modern systems. This post highlights some of the challenges in dealing with failure at scale.

The Amazon Builder’s Library is a great set of deep dive papers into the challenges with modern systems. This post highlights some of the challenges in dealing with failure at scale.

β€œTimeouts, retries, and backoff with jitter, looks at various types of failures and their potential impact on both your service and it’s consumers.

I call out a few more details in the Twitter thread below…

Tweet 1/6 πŸ‘‡ Next tweet

last week, I looked at a number of @awscloud white papers. this week, I'll be diving into the Amazon Builder's LIbrary

first up: "Timeouts, retries, and backoff with jitter", by @MarcJBrooker, https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/

🧡☁️ #cloud #devops

Tweet 2/6 πŸ‘‡ Next tweet πŸ‘† Start

this 🧡 is available unrolled at https://t.co/dqEcUqffGB

Friday's thread is up at https://markn.ca/2021/machine-learning-best-practices-for-public-sector-organizations/

🧡☁️ #cloud #devops

Tweet 3/6 πŸ‘‡ Next tweet πŸ‘† Start

tldr: πŸ’© happens. plan for it. make sure to keep your customers perspective in mind

more πŸ‘‡

🀣

🧡☁️ #cloud #devops

Tweet 4/6 πŸ‘‡ Next tweet πŸ‘† Start

. @MarcJBrooker calls out the 3 primary techniques that Amazon uses for handling failures:

1. timeouts 2. retries 3. backoff

the rest of the πŸ“‘ details the how, why, & when or each of these techniques

🧡☁️ #cloud #devops

Tweet 5/6 πŸ‘‡ Next tweet πŸ‘† Start

the discussion of timeouts is of particular note. looking at the impacts of server vs. client timeouts & how to manage both without causing a flood of retries is really interesting

🧡☁️ #cloud #devops

Tweet 6/6 πŸ‘‡ Next tweet πŸ‘† Start

retries is where the paper goes next. the author calls out that retries as "selfish" & cause failures to amplify. this is why Amazon combines them with the backoff technique to avoid a flood of requests

🧡☁️ #cloud #devops