19. Quorums
Mar 17, 2021
Recall transition system modeled using signature, state, transitions \(t_1, t_2, \dots\) where each
transition has type NAME (args)
, a precondition (true/false), and effect: \(s' \leftarrow s\).
Actions take place in a sequence: \(s_0, a_1, s_1, a_2, \dots\)
When a transition is a synchronized input actions, the preconditions is true. Server cannot refuse input. In automata, skip the precondition (channel cannot refuse to accept messages). This says nothing about delivery of messages → add a concept of fairness.
Sequence of send actions alone is a valid execution → can reason about liveness after the fact
Use variables to represent phases, e.g.
1 2 3 4 |
|
state phase \(\in {1, 2}\) init 1 → enforces good behavior on the client of the service.
well-formedness condition: either applied to automata or the environment and primarily for convenience: exclude behaviors that should not be allowed → does not affect good behaviors in the environment.
Reasoning about atomicity
Recall atomicity: for every execution, there exists a partial order on completed operations such that
- if \(\pi_1\) and \(\pi_2\) are operations such that \(\pi_1\) completes before \(\pi_2\), then \(\pi_2 \nprec \pi_1\)
- if \(\omega_1\) and \(\omega_2\) are write operations then \(\omega_1 \prec \omega_2\) or \(\omega_2 \prec \omega_1\)
- reads are ordered so that \(r_1\) returns value of immediately preceding write or the initial value if there is no such write
Lifecycle or a write:
1 |
|
Lifecycle of a read:
1 |
|
\(\tau\) = timestamp of operation
- for write:
new ts
- for read:
choose ts
partial order (P.O.):
- \(\omega_1\), \(\omega_2\) then \(\omega_1 \prec \omega_2\) if \(\tau(\omega_1) \prec \tau(\omega_2)\)
- \(\omega\), \(r\) then \(\omega \prec r\) if \(\tau(\omega) \prec \tau(r)\)
note: writer only does 1 write at a time. Timestamps of writer are monotonic and do not roll back. Can reason about this behavior by observing the code. Timestamps never get lower.
Regarding rule 1 of atomicity, is it possible for the tags to be inverted?
- \(\tau(\pi_1) \leq \tau(\pi_2)\) therefore \(\pi_1 \leq \pi_2\)
- \(\exists s \in M_1 \cap M_2\)
- \(\tau(\pi_2)\) = \(max\{ ts\) from \(M_2\}\)
- atomicity rule 3: (W) \(\tau(\pi_1) = \tau(\pi_2)\) (R) → we have writer before reader
Multiple writers
The previous is for a single writer case. How to deal with multiple writers in MWMR case? If a writer wakes up it has no knowledge of current state of the system. The writer must learn about the current state.
writer
1 2 3 4 5 6 7 |
|
server
1 2 3 4 5 6 7 8 |
|
If multiple writers operate in the exact same time it is not possible to distinguish writers based on timestamp alone. In this scenario, tag is a pair: \(<ts, id>\) and are compared lexicographically.
Reasoning about behavior: \(s_1, \dots, s_k, a_{k+1}, s_{k+1}\) (similar to induction) → next state maintains properties of the system. → reason about interleavings of the system and consider all possible behaviors. This is in no relation to length of execution; execution may be infinite.
Performance & majorities → Quorums
When waiting for a majority, R/W does a broadcast to \(N\) servers and awaits for majority response. While awaiting the R/W is bombarded with responses and needs to collect \(\frac{N}{2}\) messages which can take a while.
What we actually care about is \(M_1 \cup M_2 \neq \varnothing\).
- A quorum over set \(S\): any \(Q \subseteq S\)
- \(\forall Q_i, Q_j: Q_i \cup Q_j \neq \varnothing\)
- \(|Q_i| = |Q_j|\) and minimal → how small can you get and still maintain a quorum that is equal? \(|Q_i| \geq \sqrt{N} + o(\sqrt{N})\) → but this is hard to construct and not practical
Another strategy:
From a set of servers, form a square. If not possible directly, apply padding. A quorum is a union or row and column = \(R \cup C\).
(the numbers in the grid are servers)
1 2 3 4 5 6 |
|
\(|Q| = \sqrt{N} + \sqrt{N} - 1 = 2 \sqrt{N} - 1\)
- Wait for response from a \((row, col)\) instead of majority
- Any quorum intersection response is sufficient
What about failures?
- in the majority response case, any minority failure model if fine → with quorums much more complicated
- case about failures: \(f = \sqrt{N}\) can break the system
A bad adversary can kill just the right servers:
1 2 3 4 5 6 |
|
But as long as a quorum is maintained, the system is fine. This would be OK:
1 2 3 4 5 6 |
|
The motivation for using quorums is not needing to wait for a majority response. Input buffers are a costly resource and quorums help to reduce the need for buffers. Can also design local optimizations for quorums that replay messages to historically responsive quorums.