Instant Finality in Byzantine Atomic Broadcast Under Unknown/Dynamic Participation

Authors: Dahlia Malkhi (Chainlink Labs), Atsuki Momose (UIUC), Ling Ren (UIUC)

Synopsis

In a previous post, we presented a simple solution for one-shot, binary Byzantine agreement with Unknown/Dynamic participants that may be replaced at any time, conditioned on two thirds of the active participants being honest, that guarantees deterministic and unconditional finality and has a (small) constant expected latency. 

In this post, we extend the approach to multi-valued consensus on a sequence of values, i.e., solve the Byzantine atomic broadcast (BAB) problem. Our protocol inherits from the previous post the following advantages over existing solutions: 1) small 3-round latency, 2) deterministic and unconditional safety, 3) does not require eventual stable participation, 4) allow fluctuating participation of faulty nodes. 

Recap of the model. The set of active participants – also called nodes – is unknown, their count is unknown, and in each round, they may be replaced entirely, subject to the following assumptions:

  • PKI. Participating nodes are taken from a finite universe, each can be identified by a public key for which it owns the private key.
    • Note, PKI is needed only for VRF and liveness; messages do not need to be signed by their senders
  • Active nodes. Each round r has an unknown set of active nodes, whose (unknown) count nr satisfies nr ≥ 3fr + 1 , where fr are Byzantine.
  • Synchronous communication. In round r, honest and active nodes receive all the messages broadcast by honest nodes in round r-1.

We also note that any message from faulty nodes cannot be delayed to future rounds.

Atomic broadcast. The protocol allows nodes to agree on a growing sequence of values (input by nodes) such that:

  • Safety. Two honest nodes do not decide on a value at the same position of sequence. 
  • Liveness. Honest node’s input value is eventually decided.

Our protocol 

Chaining. We use the common technique of block chaining. Each block contains some value (e.g., new transactions) as well as the hash of a predecessor block. The first block includes the hash of a special predefined block called the “genesis” block. We say a block B’ extends a block B if the sequence ending in B is a prefix of the sequence ending in B. If two blocks do not extend one another, we say they are conflicting

Graded agreement (GA). Our BA protocol in the previous post has two communication steps that are very similar. Each step implements an abstraction called graded agreement (GA). We first present this GA building block, and then use it to build our atomic broadcast protocol.

In the context of unknown and dynamic participation, the set of active nodes at the beginning of this one-round (GA) may be different from the set of active nodes at the end of the GA (active nodes in the next round). Every active node at the beginning of the GA has an input block B. Every (newly) active node at the end of the GA determines a set { (B, g) } of pairs of block B and grade bit g as an output of GA. We emphasize again that we use block chaining, so a block B uniquely identifies a whole sequence of blocks starting from genesis and ending in B.

Our GA protocol

At the beginning of the GA, every active node p sends (vote, B) for its input B. 

(The set of active nodes from the beginning of GA may now be replaced, i.e., a different set of nodes may now receive and tally the votes.)

At the end of the GA, every active node tallies votes. If B’ extends B, then a vote for B’ also counts as a vote for B. Votes for conflicting blocks from the same voter are ignored.

  • Outputs (B, 1) if over ⅔ of the received votes vote for B. 
  • Outputs (B, 0) if over ⅓ of the received votes vote for B. 

Properties of GA. By the UDQ properties in the previous post, we have the following guarantees. 

  • Graded consistency. If an honest node outputs (B, 1), then all honest nodes output (B, *).
  • Integrity. If an honest node outputs (B, *), then at least one honest node inputs B’ that extends B.
  • Validity. Let B be the highest common ancestor of honest nodes’ inputs. Then, all honest nodes output (B, 1)
  • Uniqueness. If an honest node outputs (B, 1) and another honest node outputs (B’, 1), then B and B’ do not conflict with each other.
  • Bounded divergence. Each node outputs (with grade 0) at most two conflicting blocks.

Our atomic broadcast protocol. We now give our atomic broadcast protocol. Variables C, L are initialized to genesis. (C: candidate, L: lock)

Each view has two rounds. There is an extra initial round before the first view v = 1 starts.

Initial round 0: Send (propose, B, VRF) where B is a block that extends the genesis.

Round 1 of view v: 

If v > 1: 

Decide any block B such that GA2 of view v-1 outputs (B, 1)

Set L to the highest block such that GA2 of view v-1 outputs (L, *)

Input to GA1 a block with the highest VRF among “propose” messages from the previous round that extend L.

If there is no such block, input L to GA1.

Round 2 of view v: 

Set B to the highest block such that GA1 outputs (B, 1).

Set C to a highest block such that GA1 outputs (C, *). (If there are two such C, pick one at random.) 

Input B to GA2.

Send (propose, B’, VRF) where B’ extends C.

Proof sketch 

Safety. Suppose an honest active node decides on a block B in round 1 of view v, i.e., it outputs (B, 1) from GA2 that was started in view v-1. Due to graded consistency, any honest node p active in round 1 of view v must output (B, *). Also observe that GA2 of view v-1 could not have output any conflicting block. This is because no honest node could have input any conflicting block to GA2 of view v-1 (due to uniqueness of GA1) and GA2 could not have output any conflicting block without some honest node’s input (due to integrity). So p must have set L to a block extending B in round 1 of view v. Both GAs in view v, and inductively in all views after v, output only blocks extending B (due to validity). So no honest node decides any conflicting block.

Liveness. Let us call the node who sent the highest VRF in view v-1 the leader of view v. It is clear that if all honest active nodes input an honest leader’s proposal to GA1 of view v, the proposed block must be decided. The only reason an honest node p does not input the leader’s proposal to GA1 is that it conflicts with its lock L. Node p must have output (L, *) from the GA2 that started in the previous view v-1. Then, some honest node q must have input L (or a block extending L) to GA2 after outputting (L, 1) in GA1 of view v-1. Thus, the leader of view v outputs (L, 0) in GA1 of view v-1 (due to graded consistency). Due to bounded divergence, the leader outputs at most one other conflicting block from GA1 of view v-1. Therefore, with probability at least ½, the leader proposes a block extending L and all honest nodes accept the leader’s proposal. 

Discussion

Of the four foundational advantages listed at the beginning of this post, we highlight two important practical advantages over existing solutions. 

3-round latency. One important practical advantage of our protocol is the practical latency: only 3 rounds. In contrast, the previous state-of-the-art [Momose-Ren 2022] costs 16 rounds. This improvement is attributed to two factors. Firstly, we improved the latency of GA from 3 rounds to 1 round by sacrificing the fault tolerance from ½ to ⅓. Secondly, and more importantly, we improved the atomic broadcast protocol by reducing the number of GA invocations from 5 to 2.

Faulty nodes’ fluctuation. As mentioned in the previous post, our protocol allows not only honest nodes but also faulty nodes to join/leave during executions. This is an important practical advantage as all existing solutions without proof-of-work (or verifiable delay function) assume faulty nodes are online all the time (as shown in [Deb-Kannan-Tse, 2021]). This restriction on the adversary is hard to justify. Suppose only 10 nodes are active at the beginning, but 10 years later, a million nodes are active. Then, existing solutions will have to assume that the adversary controls at most 10 out of the 1 million active nodes. Our protocol still tolerates ⅓ million faulty active nodes. 

For the latest research on blockchains, smart contracts, and oracles, subscribe to the Chainlink newsletter and follow the official Chainlink Twitter.