When using the AWS SDK (AWS SDK for Go v2) in Golang, you may want to retry the SDK call.
There are several patterns of how to do this, so I wrote this.
Assumptions
The example in this article uses v1.18.7
Go and v1.17.5
aws-sdk-go-v2 version.
Also, regarding retries, it is possible to centrally configure retries for all SDK calls (set at client instance creation), but this article assumes that you want to set/change retries for each SDK call (=API call).
e.g.) Changing retry behavior between s3.ListObjects and iam.DeleteRole.
Repository
The source code for this project is available on GitHub.
[Retry Patterns 1.] Options.RetryMaxAttempts
Here is the simplest one first.
In the Options structure used for Client generation and API calls in the AWS SDK, there are parameters for retries called RetryMaxAttempts
and RetryMode
as shown below.
- Implementation Example
input := &iam.DeleteRoleInput{
RoleName: roleName,
}
optFn := func(o *iam.Options) {
o.RetryMaxAttempts = 3
o.RetryMode = aws.RetryModeStandard
}
_, err := i.client.DeleteRole(ctx, input, optFn)
Simply specifying these will cause exponential backoff retries to be performed up to the number of times specified in RetryMaxAttempts.
[Retry Patterns 2.] Options.Retryer
In addition, Options has a parameter called Retryer to implement a fine-tuned retry algorithm.
If this parameter is specified, the retry behavior specified (implemented) here will be applied, instead of RetryMaxAttempts
and RetryMode
listed above.
The Options.Retryer
should be an interface called Retryer
or RetryerV2
.
There are functions to adjust the retry decision logic (IsErrorRetryable
), the maximum number of attempts (MaxAttempts
), and the sleep time (RetryDelay
), which allow you to customize the retry behavior.
Specifically, IsErrorRetryable
allows you to specify in more detail "when to retry". The RetryDelay
allows you to set up logic like "wait a random number of seconds (Jitter), not just an exponential backoff".
- Code in SDK module (not example implementation)
type Retryer interface {
// IsErrorRetryable returns if the failed attempt is retryable. This check
// should determine if the error can be retried, or if the error is
// terminal.
IsErrorRetryable(error) bool
// MaxAttempts returns the maximum number of attempts that can be made for
// an attempt before failing. A value of 0 implies that the attempt should
// be retried until it succeeds if the errors are retryable.
MaxAttempts() int
// RetryDelay returns the delay that should be used before retrying the
// attempt. Will return error if the if the delay could not be determined.
RetryDelay(attempt int, opErr error) (time.Duration, error)
// GetRetryToken attempts to deduct the retry cost from the retry token pool.
// Returning the token release function, or error.
GetRetryToken(ctx context.Context, opErr error) (releaseToken func(error) error, err error)
// GetInitialToken returns the initial attempt token that can increment the
// retry token pool if the attempt is successful.
GetInitialToken() (releaseToken func(error) error)
}
// RetryerV2 is an interface to determine if a given error from an attempt
// should be retried, and if so what backoff delay to apply. The default
// implementation used by most services is the retry package's Standard type.
// Which contains basic retry logic using exponential backoff.
//
// RetryerV2 replaces the Retryer interface, deprecating the GetInitialToken
// method in favor of GetAttemptToken which takes a context, and can return an error.
//
// The SDK's retry package's Attempt middleware, and utilities will always
// wrap a Retryer as a RetryerV2. Delegating to GetInitialToken, only if
// GetAttemptToken is not implemented.
type RetryerV2 interface {
Retryer
// GetInitialToken returns the initial attempt token that can increment the
// retry token pool if the attempt is successful.
//
// Deprecated: This method does not provide a way to block using Context,
// nor can it return an error. Use RetryerV2, and GetAttemptToken instead.
GetInitialToken() (releaseToken func(error) error)
// GetAttemptToken returns the send token that can be used to rate limit
// attempt calls. Will be used by the SDK's retry package's Attempt
// middleware to get a send token prior to calling the temp and releasing
// the send token after the attempt has been made.
GetAttemptToken(context.Context) (func(error) error, error)
}
If you use this method, define a structure named Retryer
in a separate file.
The retry decision logic functions that should be implemented in the above IsErrorRetryable
can be defined by the caller and passed to the constructor for general use.
In the RetryDelay
in the middle of the example below, logic is written to "retry in a random number of seconds".
- Example implementation (retryer_options.go)
package retryer
import (
"context"
"math/rand"
"time"
"github.com/aws/aws-sdk-go-v2/aws"
)
const MaxRetryCount = 10
var _ aws.RetryerV2 = (*Retryer)(nil)
type Retryer struct {
isErrorRetryableFunc func(error) bool
delayTimeSec int
}
func NewRetryer(isErrorRetryableFunc func(error) bool, delayTimeSec int) *Retryer {
return &Retryer{
isErrorRetryableFunc: isErrorRetryableFunc,
delayTimeSec: delayTimeSec,
}
}
func (r *Retryer) IsErrorRetryable(err error) bool {
return r.isErrorRetryableFunc(err)
}
func (r *Retryer) MaxAttempts() int {
return MaxRetryCount
}
func (r *Retryer) RetryDelay(int, error) (time.Duration, error) {
rand.Seed(time.Now().UnixNano())
waitTime := 1
if r.delayTimeSec > 1 {
waitTime += rand.Intn(r.delayTimeSec)
}
return time.Duration(waitTime) * time.Second, nil
}
func (r *Retryer) GetRetryToken(context.Context, error) (func(error) error, error) {
return func(error) error { return nil }, nil
}
func (r *Retryer) GetInitialToken() func(error) error {
return func(error) error { return nil }
}
func (r *Retryer) GetAttemptToken(context.Context) (func(error) error, error) {
return func(error) error { return nil }, nil
}
Then, based on this, retries are specified when the SDK is called.
In the following implementation example, the variable retryable
contains a function with a decision logic such as "retry if there is an api error Throttling: Rate exceeded
message in SDK error response".
Then, in optFn
, define "a function to specify a Retryer instance created with the retryable
and SleepTimeSec
to Options.Retryer" and specify it as the third argument of the SDK call (in this case, DeleteRole
).
- Example Implementation (Caller)(iam.go)
const SleepTimeSec = 5
...
...
input := &iam.DeleteRoleInput{
RoleName: roleName,
}
retryable := func(err error) bool {
return strings.Contains(err.Error(), "api error Throttling: Rate exceeded")
}
optFn := func(o *iam.Options) {
o.Retryer = retryer.NewRetryer(retryable, SleepTimeSec)
}
_, err := i.client.DeleteRole(ctx, input, optFn)
[Retry Patterns 3.] Golang Generics
The above Options.Retryer
follows the official retry method and allows you to define your own logic, but here is a method that allows you to create your own logic.
This method uses Go's relatively new "generics" feature.
Retryer, it is difficult to freely create error messages to be output when an error occurs through retries (e.g., outputting information such as the name of the resource where the error occurred, etc.).
I describe here how to make these points even more flexible.
First, define the retry function with generics in a separate file.
- Example implementation (retryer_generics.go)
// T: Input type for API Request.
// U: Output type for API Response.
// V: Options type for API Request.
type RetryInput[T, U, V any] struct {
Ctx context.Context
SleepTimeSec int
TargetResource *string
Input *T
ApiOptions []func(*V)
ApiCaller func(ctx context.Context, input *T, optFns ...func(*V)) (*U, error)
RetryableChecker func(error) bool
}
// T: Input type for API Request.
// U: Output type for API Response.
// V: Options type for API Request.
func Retry[T, U, V any](
in *RetryInput[T, U, V],
) (*U, error) {
retryCount := 0
for {
output, err := in.ApiCaller(in.Ctx, in.Input, in.ApiOptions...)
if err == nil {
return output, nil
}
if in.RetryableChecker(err) {
retryCount++
if err := waitForRetry(in.Ctx, retryCount, in.SleepTimeSec, in.TargetResource, err); err != nil {
return nil, err
}
continue
}
return nil, err
}
}
func waitForRetry(ctx context.Context, retryCount int, sleepTimeSec int, targetResource *string, err error) error {
if retryCount > MaxRetryCount {
errorDetail := err.Error() + "\nRetryCount(" + strconv.Itoa(MaxRetryCount) + ") over, but failed to delete. "
return fmt.Errorf("RetryCountOverError: %v, %v", *targetResource, errorDetail)
}
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(getRandomSleepTime(sleepTimeSec)):
}
return nil
}
func getRandomSleepTime(sleepTimeSec int) time.Duration {
rand.Seed(time.Now().UnixNano())
waitTime := 1
if sleepTimeSec > 1 {
waitTime += rand.Intn(sleepTimeSec)
}
return time.Duration(waitTime) * time.Second
}
Here is the explanation: Define a function that performs a retry called Retry
with the type RetryInput
as input.
First, as the type ([T, U, V any]
) used for the RetryInput
generics, the caller should pass iam.DeleteRoleInput
for T, iam.DeleteRoleOutput
for U, and iam.Options
for V.
ApiCaller
is the actual SDK function itself (ex.iam.DeleteRole
).
RetriableChecker
is a function that defines the logic to determine when to retry.
// T: Input type for API Request.
// U: Output type for API Response.
// V: Options type for API Request.
type RetryInput[T, U, V any] struct {
Ctx context.Context
SleepTimeSec int
TargetResource *string
Input *T
ApiOptions []func(*V)
ApiCaller func(ctx context.Context, input *T, optFns ...func(*V)) (*U, error)
RetryableChecker func(error) bool
}
// T: Input type for API Request.
// U: Output type for API Response.
// V: Options type for API Request.
func Retry[T, U, V any](
in *RetryInput[T, U, V],
) (*U, error) {
retryCount := 0
for {
output, err := in.ApiCaller(in.Ctx, in.Input, in.ApiOptions...)
if err == nil {
return output, nil
}
if in.RetryableChecker(err) {
retryCount++
if err := waitForRetry(in.Ctx, retryCount, in.SleepTimeSec, in.TargetResource, err); err != nil {
return nil, err
}
continue
}
return nil, err
}
}
The waitForRetry
function is a process that returns an error that outputs an original error message when the maximum number of retries MaxRetryCount
is exceeded.
Also, using the context
passed as an argument, checks whether the context has been canceled (Done) each time a retry is performed (i.e., whether some error has occurred in some other process and the program should be terminated abnormally), and if it has been canceled, it returns ctx.Err()
without executing sleep for the next retry.
func waitForRetry(ctx context.Context, retryCount int, sleepTimeSec int, targetResource *string, err error) error {
if retryCount > MaxRetryCount {
errorDetail := err.Error() + "\nRetryCount(" + strconv.Itoa(MaxRetryCount) + ") over, but failed to delete. "
return fmt.Errorf("RetryCountOverError: %v, %v", *targetResource, errorDetail)
}
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(getRandomSleepTime(sleepTimeSec)):
}
return nil
}
Then, with getRandomSleepTime
, which appears in the above waitForRetry
function, I write the logic to adjust the sleep time for retries.
Here, I am writing a process that randomly waits (Jitter) within a specified upper time limit (sleepTimeSec
).
func getRandomSleepTime(sleepTimeSec int) time.Duration {
rand.Seed(time.Now().UnixNano())
waitTime := 1
if sleepTimeSec > 1 {
waitTime += rand.Intn(sleepTimeSec)
}
return time.Duration(waitTime) * time.Second
}
And here is an example implementation of the caller of this Retry
function.
- Example implementation (caller)(iam.go)
input := &iam.DeleteRoleInput{
RoleName: roleName,
}
retryable := func(err error) bool {
return strings.Contains(err.Error(), "api error Throttling: Rate exceeded")
}
_, err := retryer.Retry(
&retryer.RetryInput[iam.DeleteRoleInput, iam.DeleteRoleOutput, iam.Options]{
Ctx: ctx,
SleepTimeSec: SleepTimeSec,
TargetResource: roleName,
Input: input,
ApiCaller: i.client.DeleteRole,
RetryableChecker: retryable,
},
)
The feature of this method is that by using generics, it is possible to implement retry processing with a generic type relationship guarantee by combining input, output, and options types, even though it is a function created by the user.
However, if there is no particular reason, I think it would be better to use Options.Retryer
, which is officially provided.
Finally
Several retry patterns using AWS SDK for Go V2 were presented. In particular, Retryer is not familiar to some of you, so please take this opportunity to use it!
Top comments (0)