Security Cloud Privacy Tech
Making retries safe with idempotent APIs

Making retries safe with idempotent APIs

The Amazon Builder’s Library is a great set of deep dive papers into the challenges with modern systems. This post highlights some of the challenges that the retry pattern presents.

The paper, “Making retries safe with idempotent APIs, follows-up yesterday’s thread on the, “Timeouts, retries, and backoff with jitter” paper.

This one takes a much deeper dive into the challenges that a simple retry poses to an API. It’s all about balancing the customer experience with the systems’ stability & performance.

I call out a few more details in the Twitter thread below…

Tweet 1/9 ๐Ÿ‘‡ Next tweet

diving into the Amazon Builder's Library again today. this time with, "Making retries safe with idempotent APIs", by @mfeatonby

๐Ÿ“‘: https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/

this is a level 300 paper, digging a bit deeper than yesterday's' level 200

๐Ÿงตโ˜๏ธ #cloud #devops @awscloud

Tweet 2/9 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

this thread is available unrolled at https://t.co/nEPvsF8Awt

yesterday's thread is up at https://markn.ca/2021/timeouts-retries-and-backoff-with-jitter/

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 3/9 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

idempotent is one of my all time favourite words, especially in tech.

if you're unfamiliar, in this context it means that you can run operations more than once and the results won't change

more at https://en.wikipedia.org/wiki/Idempotence

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 4/9 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

for this paper, the author explores the concept of idempotency (see, awesome word) within the "retry" pattern

basically, how can the backend service make sure that retry doesn't end up being a duplicate or something worse

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 5/9 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

excellent quote to build by, "Weโ€™ve found that in many cases the simplest solution is the best solution", @mfeatonby, @awscloud

followed by, "a surprisingly large number of transient or random faults can be overcome by simply retrying the call"

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 6/9 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

the ๐Ÿ“‘ walks through some of the potential downsides of the retry pattern

it then moves on to a topic that isn't discussed enough; reducing complexity

the author discusses API design & how @awscloud uses an identifier handled by the SDKs to manage retries

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 7/9 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

this approach avoids lots of problems on the service side, but issues remain. that brings up to the various strategies that can be used to implement a retry pattern

๐Ÿ“‘ uses @awscloud EC2 as an example & this really helps drive some of these key points home

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 8/9 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

one fascinating edge case is that of late arriving requests. in any distributed system (especially one over the internet) this is a distinct possibility

๐Ÿ“‘ explores these challenges & explains how @awscloud looks at making reasonable trade offs to handle

๐Ÿงตโ˜๏ธ #cloud #devops

Tweet 9/9 ๐Ÿ‘‡ Next tweet ๐Ÿ‘† Start

overall, this is a fantastic paper. it dives deep into an area that most assume is simple. at scale, nothing is

however, these patterns & tips can help you replicate this pattern in your services to deliver a better customer experience

worth the ๐Ÿ•™ to read

/๐Ÿงตโ˜๏ธ #cloud #devops

More Content