Paxos represents a family of distributed algorithms used to reach consensus in distributed systems. In this post we will briefly discuss about single decree Paxos. This algorithm was first proposed by Leslie Lamport in his paper "The Part-Time Parliament" and later described more directly in "Paxos Made Simple". Paxos finds a commonplace in almost all the distributed system environments.
Why do we need consensus in the first place?
Most of these distributed systems comprise of tens of thousands of machine working in unison. One employs replication for the purpose of redundancy as well efficiency. But, once you have replicas, you have a challenge to keep them consistent. Failure is a norm in such environments. There are multiple scenarios whereas consensus plays an important role: keeping the data consistent across replicas, leader election or even resource sharing/locking.
Let's keep all these aside and start with a simple example
Say there are 4 friends. They want to do something over the weekend together. They don't care about what they will do specifically, just that they want to do something together. Here is a timeline of various events.
Once a majority agrees on a proposal that is consensus.
In Paxos world, involved parties want to agree on a result, not on their proposal. Communication channels may be faulty, leading to message loss
Paxos basis- Proposers: Propose a new value to reach consensus
- Acceptors: Participate in reaching consensus
- Learners: Learn about the new value and can be queried
- Paxos nodes can take multiple roles, even all of them
- Paxos nodes must know how many acceptors a majority is (quorum)
- Paxos nodes must be persistent, they can't forget what they accepted
- Paxos run aims at reaching a single consensus . Can agree on only one value, and this cannot change.
- Paxos (there might be other complex variations) assumes fail-stop failures and not Byzantine failures. If you want to mutate a value, a different Paxos run will be needed.
Let's say we have one proposer and 5 acceptors. Majority in this case is 3.
Proposer wants to propose a value. It sends a PREPARE ID to all. the acceptors (or a majority of them). ID must be unique across rounds. If it doesn't listen back from acceptors, it will timeout and retry with a new (higher) ID. This ID should not be confused with actual value on which consensus must be reached. This ID can be seen as a round ID.
Acceptor will get a PREPARE message with ID 10. It will check if it has promised to ignore requests with this Id.
If Yes, it ignores the message
If No, it will promise to ignore any requests
lower than ID, subject to some conditions which we will see later. If majority of acceptors promise, no ID < 10 will make through
Proposer gets majority of PROMISE messages for ID = 10. It sends an ACCEPT-REQUEST(10,"VAL1") to a majority or all acceptors. Subject to some condition which we will see below.
Acceptor receives an ACCEPT-REQUEST message for an ID, value pair. It checks if it has promised to ignore this ID value.
If Yes then ignore this message.
If No, reply with ACCEPT(ID, val1). Also send to all Learners. If a majority of acceptors accept an (ID, val) pair, consensus is reached. Consensus will always be on this value called the "chosen" value, It will never change this for round.
Proposers and learners get ACCEPT messages for ID, value pair, If a proposer/learner gets majority of accepts for this (ID,value) pair, they know that consensus has been reached on value. The acceptors themselves don't know that consensus has been reached.
Let's say a new proposer comes along and sends a PREPARE 9 message. Since, majority of acceptors have promised ID 10, this request will time-out, Now, say the proposer tries with a higher value 11.
Acceptors checks the ID which is greater than 10. So they reply back with a PROMISE message subject to :
- If ever accepted anything?
- If YES, reply with promise ID, accepted ID1, value
- If No, reply with PROMISE ID.
Proposer gets majority of PROMISE messages, Here it checks:
- Have i got any already accepted value from promises
- If Yes, it picks value with the highest ID that it got
- If No, it picks any value which it wants
So, the run proceeds as,
If a majority of acceptors accept a request with ID and value, consensus has been reached and is that value.
No accept requests with lower ID will be accepted by a majority.
No accept requests with higher ID and a different value will be accepted by a majority, at least an acceptor will piggy back on ID-value pair which will propagate.
What could go wrong ?
- One or more acceptor fails: still works as long as majority are up
- Proposer fails in prepare phase: NO-OP , another proposer can make progress
- Proposer fails in accept phase: Another proposer finishes the job.
- Safety: Only a single value may be chosen
- Liveness: As long as majority of servers are up and communicating with reasonable timeliness, some proposed value is eventually chosen. If a value is chosen, servers eventually learn about it.