Prerequisites
To understand this topic fully, familiarity with transactions and microservices is recommended.
Here are some of my previous blog posts that will be useful:
Introduction
In distributed systems, workflows often span multiple microservices.
Let’s consider the example of a travel application that I have used in my previous blog posts. This application is responsible for coordinating between an airline service, a hotel service, and a car rental service. This allows users to book all essential services for their vacation in one place.
Example Workflow
- Reserve airline tickets
- Reserve a hotel
- Reserve a rental car
In a monolithic application, with all data in a single database, achieving a transaction is straightforward because everything is handled within the same database context.
However, in a distributed system, each service (airline, hotel, car rental) operates independently, with its own database. Ensuring that all services either succeed or fail as a unit becomes much more challenging!
Problem
To illustrate this concern, imagine the following scenario:
-
✔️ Airline reservation is successful.
-
✔️ Hotel reservation is successful.
-
❌ Rental car reservation fails.
Now, the user is left with partial reservations, an inconsistent state where only some services are booked, leading to a poor user experience.
Solution: The Saga Pattern
The Saga pattern helps by treating the entire workflow as a series of steps, each with an associated compensating action (a rollback step to undo previous work if something goes wrong).
Recall the example:
-
✔️ Airline reservation is successful.
-
✔️ Hotel reservation is successful.
-
❌ Rental car reservation fails.
Using the Saga Pattern, we can roll back previous steps:
↩️ Cancel hotel reservation
↩️ Cancel airline reservation
As illustred in the image above, the operation behaves atomically: either all services are booked, or none are, ensuring users will not be left with partial bookings.
Conclusion
- The Saga pattern is an approach to achieve transactions in a distributed system.
- This is achieved by each step in a workflow having an associated compensating action, i.e. rollback to undo previous work if a failure is encountered.
- This improves the reliability and data consistency of a system.