Skip to content

17. Atomicity

Mar 10, 2021

Readers must write

Given:

  • asynchronous distributed system with arbitrary delays
  • uses message passing
    • reordering of messages due to asynchrony
    • no spontaneous messages
    • messages are eventually delivered
  • crash prone (for \(N\) servers \(f < \frac{N}{2}\))
  • want fault tolerance → require redundancy
  • single-writer, multi-reader

When a server in this system boots it knows its initial state and nothing else. The server will use majorities \(M_1 \cup M_2\) to discover current state → majority intersection is non-empty.

writer:

  • input: v:type, sequence_no: int = 0, accumulator: int = 0
  • state: <ts:0, lv: type>

server:

  • state: <ts, lv>

reader:

  • state: ts:int, v:type, sequence_no: int = 0, accumulator: int = 0
1
2
3
4
5
6
7
8
9
init acc = 0;
ts++, seq++;

broadcast('W', v, ts, seq)  // to N servers
await acc > N/2:            // wait for majority response
  then return

upon recv(w_ack, s):
  if s == seq: acc++
1
2
3
4
5
6
7
8
upon recv('W', v, ts, seq):
  if server.ts > ts:
     server.ts = ts
     server.lv = v
  send(w_ack, seq)

upon recv('R', seq):
  send(r_ack, lv, ts, seq)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
acc = 0, ts = 0, seq++

broadcast('R', seq)
await acc > N/2:
  acc = 0, seq++
  broadcast('W', lv, ts, seq)  // → "reader must write"
  await acc > N/2
  return lv

upon recv('r_ack', v, t, seq):
    if s == seq then
      acc++
      if t > ts:
        ts = t
        lv = v

Sequence number:

  • messages may be reordered, this number will account for reordered messages
  • Dutta & Guerraoui: (maybe this?)
    • if \(R < \frac{S}{f}\)
    • readers are inversely proportional to number of failures
    • \(R < (S / S/ 2) = 2\)\(R\) must be 1 (this is not practical)
    • we say "fast" algorithm if it does only 1 broadcast (not 2)
    • here "fast" means one-round broadcast

There is opportunity to optimize between broadcast & await, for example, can track which servers respond fast and prioritize those servers in the future.

Execution

execution means alternating sequence of states and actions, e.g. \(s_0, a_1, s_1, a_2, s_2, a_3,\dots\) There is no parallelism in this sequence.

  • writer does broadcast(...) → to S servers.
  • communication channel state changes because it now includes the sent message
  • adding messages to communication channel is visible to outside observer
  • the sequence is ordered by timestamps → no parallelism
  • every execution has interleaving semantics
  • outside observer can see this interleaving with timestamps
  • there are no infinite actions (no zeno)

Atomicity

Regarding read/write (RW) service, execution of RW is atomic is there exists a partial order (P.O.) on all completed operations such that:

  1. P.O. respects real time; means if \(\pi_1\) and \(\pi_2\) are operations such that \(\pi_1\) completes before \(\pi_2\), then \(\pi_2 \nprec \pi_1\)
  2. All writes are totally ordered. If \(\omega_1\) and \(\omega_2\) are write operations then \(\omega_1 \prec \omega_2\) or \(\omega_2 \prec \omega_1\).
  3. Reads are ordered so that \(r_1\) returns value of immediately preceding write or the initial value if there is no such write

Example of partial order of messages:

img

Partial order is asymmetric and transitive

  • if conditions 1, 2, 3 hold the implementation of R/W service is atomic if all executions are atomic
  • empty execution is atomic by definition

Safety & Liveness

  • safety means things never go wrong, or formally: no state is incorrect
  • liveness means the system eventually does something good → eventually system returns messages

Note:

  • Responding to messages is conditionless such that "something good"/response will occur without preconditions
  • If some condition exists, it is called performance not liveness
  • a R/W algorithm has a majority condition → this is an example of performance not liveness

Attya, Bar-Noy, Dolev (sp?) 1996: possibly the most important algorithm of R/W systems in a distributed setting. Pseudocode issue: timestamp, sequence number will eventually overflow; especially problematic if the init value is large and already close to overflow.