Resilience APIs to Transient Faults using Polly
In our previous article (.NET Nakama, 2022 July), we saw that Transient faults are inevitable temporary errors (especially in microservice and cloud-based applications). An API is
Resilient when it can recover from transient failures and continue functioning (in a way that avoids downtime or data loss). Therefore, we have learned the main strategies to handle transient faults. Transient fault handling may seem complicated, but libraries like Polly can simplify it.
In this article, we will use the Polly library to apply and combine the Retries, Circuit-Breaker, Network Timeout, and Fallbacks strategies. In our use case, we need to execute an action in our Web API (GetData), which retrieves and merges data from two third-party providers (via HTTP) to return them to our consumers (clients), as shown in Figure 1.
Polly is a NET resilience and transient-fault-handling library. Simply put, it provides a quick way to define and apply strategies (policies) to handle transient faults. Polly targets .NET Standard 1.1 and .NET Standard 2.0+, which means that it can be used (everywhere!):
- .NET Framework 4.6.1 and 4.7.2
- .NET Core 2.0+, .NET Core 3.0, and later.
The Polly NuGet library can be installed via the NuGet UI or the NuGet package manager console:
The strategies that we learned in .NET Nakama (2022, July) and we will use in the current article have an equivalent Polly policy, as we can see in the following table.
Table 1. - The Polly Policies-Strategies to Handle Transient Faults
|Retries||Retry Policy (with and without waiting)||Let’s retry after a short delay. Then, maybe the fault will be self-correct.|
|Circuit-Breaker||Circuit Breaker Policy||When a system is seriously struggling, failing fast is better to give that system a break.|
|Network Timeout||Timeout Policy||Don’t wait forever! Beyond a specific waiting time, a successful result is unlikely and worthless.|
|Fallbacks||Fallback Policy||Things will still fail! Plan what you will do when that happens.|
|Combination of Multiple Strategies||Policy Wrap||Different faults require different strategies. By combining multiple policies we increase the resiliency.|
We need to apply some policies (strategies) to handle transient faults. So, our first step is to define how to recognize these faults, which can be performed either from:
- Exceptions thrown (such as
Exception, etc.), or
- Returned Results (specifying the fault, e.g., in a related property such as Status, ErrorCode, etc.). In this case, we assume that the exceptions are handled with a try-catch, and a corresponding result will be returned.
In the following example code, we will handle the
In the following example code, we will handle all Exceptions. In addition, we state that our execution code and our fallback policies will return a nullable
MyCodesResponse class. In this way, we can define policies for execution code that are not void (i.e., returns something).
In the following example code, we will get a
MyResponseDTOClass object and handle the cases in which the
MyStatusCode property is either
In this step, we define the policies (scenarios) with their thresholds and how we will combine them. The following code samples show how we can define policies based on the
policyBuilder of Step 1. However, there are cases, such as the
TimeoutPolicy, in which we should use the
Polly static methods. To learn more about the different contractors of each policy, see the Polly documentation.
It’s time to apply the policy in the code that communicates with the third-party provider. We used the circuit breaker policy in the following examples to execute the
ThirdPartyProviderCommunication() function. In addition, we can see how we can get a response.
To use Polly, we have two options. We can use Policy Objects or the HttpClient factory (ASPNET Core 2.1 onwards). As we can understand from the naming:
- Policy Objects: Can be used everywhere we want to apply the Polly policies.
- HttpClient Factory: Add Polly policies directly on the HttpClient Factory to be applied to every outgoing call.
The HttpClient factory (
IHttpClientFactory) in ASPNET Core can be registered and used to pre-configure and create
HttpClient instances in an application. The
IHttpClientFactory offers additional benefits than using the
HttpClient directly. If you are interested, the Larkin K., et al. (2022, June 29) describes how we can use the
IHttpClientFactory (Basic usage, Named clients, etc.).
We aim to apply the Polly policies in the communication code with our third-party providers. In our case, the communication is performed via HTTP. So, both options are applicable. However, this will not always be the case. For example, different communication protocols or existing HTTP client libraries may be used that does not support Polly by default. For such cases, we can select the Policy Objects option.
In this article, we will start from the basics and use the Policy Objects to apply Polly policies, which we can use everywhere.
For the sake of our use case, we implemented two dummy providers (ProviderExampleApi1 & ProviderExampleApi2), which return random weather forecasts. In addition, we can define their execution delay and error response in each API request to simulate the transient faults.
WebApiPolly project represents our API which provides endpoints to simulate the different transient-fault scenarios. These endpoints communicate with our two providers to retrieve and combine the available results. For that purpose, two separate services have been implemented (assuming that each provider has a different API contract). In our example, we have applied Polly policies only to the integration of Provider2 (
In the following sections we will see in detail:
- How we defined each policy (as async).
- How we combined them, and
- Simulation transient-fault scenarios to investigate our system’s behavior.
Provider2Integration.cs file, we can see how we implemented the HTTP communication and applied the Polly policies. Let’s see some important details here:
- We have registered the
Transientlifetime in the Dependency Injection (DI). You can decide how to register your services depending on your project requirements. For a better understanding of Dependency Injection and Lifetime, read the .NET Nakama (2020, November) article.
HttpClientis static. The
HttpClientis intended to be instantiated once per application rather than per use (.NET 6.0 Documentation).
- Our policy object is static (
AsyncPolicyWrap). This object contains the information that is needed from the policies. For example, we might need to store the consecutive errors. As we can understand, we cannot instantiate this data per request.
We will handle all Exceptions and define that our execution code and fallback policies will return a nullable
We intend to use several policies to reduce and handle transient faults. However, there will be actions that will still fail. Using a fallback policy, we plan what we will do in those cases. In the following example, we will log (in console) these cases and return a null value. We could return a default or substitute value depending on each use case.
In this policy, we will retry the failed executions for
maxRetries (e.g. 2) and wait between the retries for a duration calculated based on the number of retry attempts. So, if we set the max retries with a value of two, the maximum executions would be three (initial execution + two retries). In this example, we are using a simple function (
waitTime = 2 ^ retryAttempt) to calculate the
- 2 ^ 1 = 2 seconds
- 2 ^ 2 = 4 seconds
- 2 ^ 3 = 8 seconds
In this circuit-breaker policy, we break the circuit after
breakCurcuitAfterErrors consecutive exceptions and keep the circuit broken for
keepCurcuitBreakForMinutes minutes. In addition, we are defining what to do when the circuit state changes to open (
onBreak) and when the circuit state changes to closed (
onReset). In our case, we are keeping an informational console log.
It is essential to notice that we have used an additional fallback policy for the circuit-breaker to handle the
BrokenCircuitException, keep a related log, and return an alternative response. We needed this because we would like to stop the repeat policy when the circuit is opened (blocked).
breakCurcuitAfterErrors, we must consider that the circuit-breaker also counts the failed repeat executions.
In our example, we are using an
HttpClient in which we can set the
Timeout. However, this would not always be the case. We may communicate using a client that does not support Timeout. In such cases, we can use the Polly timeout. In our example, we will timeout after
timeoutInSeconds and write a related log. The
TimeoutStrategy has the following two options:
- Optimistic: The called code honors the
CancellationTokenand cancels when needed.
- Pessimistic: The called code may not honor the
The Polly policies can be combined in any order using a
PolicyWrap. However, we should consider the ordering points that are described in the Polly documentation. In the following example, we combined all the studied policy strategies based on the typical policy ordering.
The tutorial project is configured as “Multiple Startup Projects” to start the two example providers and our main Web API project together. So, you just need to click the
Start button as shown in Figure 2.
The following table shows the endpoints that simulate the different transient-fault scenarios and their names in the provided Postman collection. We can find the complete code of the tutorial on GitHub and in the Postman collection to test it quickly.
|Postman Request Name||API GET Endpoints|
|Happy Path Scenario: No errors||https://localhost:7083/weatherforecasts|
|Continuous-Failures (Provider 2 is down)||https://localhost:7083/weatherforecasts/continuous-failures|
|Timeout-Errors (Provider 2 delay to respond)||https://localhost:7083/weatherforecasts/timeout-errors|
|Transient-Faults (Random errors or/and delays on Provider 2)||https://localhost:7083/weatherforecasts/transient-faults|
To test the retry and fallback policies, we can send the
Continuous-Failures and the
Timeout-Errors requests and investigate the produced console logs. For example, in the following figures, we can see:
- All executions fail either by general exception or timeout error (Figures 3 & 4).
- The initial execution and the two retries (Figures 3 & 4).
- The fallback policy returns a null value when the communication is not possible (Figures 3 & 4.
- The circuit-breaker policy opened (blocked) the circuit on the 6th consecutive failed execution (Figure 5).
- The circuit remained open for one minute (as configured) and did not accept messages to give that system a break (Figure 5).
- After one minute, one execution was attempted, and because a failure occurred, it opened the circuit again for another minute (Figure 5).
Transient-Faults endpoint produce random errors or/and delays. In the following figure, we can see an execution example, in which the first two executions failed (due to error and timeout). However, the third attempt was successful. Thus, in this request the client received the results.
Transient fault handling may seem complicated, but libraries like Polly can simplify it. This article teaches the three basic steps to use the Polly library. In addition, we applied and combined the Retries, Circuit-Breaker, Network Timeout, and Fallbacks policies to improve the resiliency of our Web API.
Using the provided source code and Postman collection, we simulated continuous and random failures (exceptions or/and timeouts). Finally, we investigated our system’s behavior by applying the Polly policies. As we saw, combining these policies provides a powerful tool that reduces and handles transient faults to provide resilient APIs.
- .NET 6.0 Documentation (Accessed on 2022 July). HttpClient Class. https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?view=net-6.0
- .NET Nakama (2020, November 4). ASP.NET Core Web API Fundamentals. https://www.dotnetnakama.com/blog/asp-dotnetcore-webapi-fundamentals/#basics-of-dependency-injection-in-aspnet-core
- .NET Nakama (2022, July 4). Strategies to Handle Transient Faults in Web APIs. https://www.dotnetnakama.com/blog/strategies-to-handle-transient-faults-in-web-apis/
- Larkin K., et al. (2022, June 29). Make HTTP requests using IHttpClientFactory in ASP.NET Core. https://docs.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-6.0