Security Cloud Courses About

Timeouts, retries, and backoff with jitter

The Amazon Builder’s Library is a great set of deep dive papers into the challenges with modern systems. This post highlights some of the challenges in dealing with failure at scale.

Timeouts, retries, and backoff with jitter, looks at various types of failures and their potential impact on both your service and it’s consumers.

I call out a few more details in the Twitter thread below…

Tweet 1/6 ๐Ÿ‘‡ Next tweet

last week, I looked at a number of @awscloud white papers. this week, I'll be diving into the Amazon Builder's LIbrary

first up: "Timeouts, retries, and backoff with jitter", by @MarcJBrooker,

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 2/6 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

this ๐Ÿงต is available unrolled at

Friday's thread is up at

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 3/6 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

tldr: ๐Ÿ’ฉ happens. plan for it. make sure to keep your customers perspective in mind

more ๐Ÿ‘‡


๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 4/6 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

. @MarcJBrooker calls out the 3 primary techniques that Amazon uses for handling failures:

1. timeouts 2. retries 3. backoff

the rest of the ๐Ÿ“‘ details the how, why, & when or each of these techniques

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 5/6 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

the discussion of timeouts is of particular note. looking at the impacts of server vs. client timeouts & how to manage both without causing a flood of retries is really interesting

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 6/6 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

retries is where the paper goes next. the author calls out that retries as "selfish" & cause failures to amplify. this is why Amazon combines them with the backoff technique to avoid a flood of requests

๐Ÿงตโ˜๏ธ #cloud #devops