Shoal framework optimization Aptos Consensus significantly drops Bullshark latency

Shoal Framework: How to drop latency on Aptos for Bullshark?

Overview

Aptos labs solved two important open problems in DAG BFT, significantly reducing latency, and for the first time eliminated the need for timeouts in deterministic practical protocols. Overall, in the case of no faults, Bullshark’s latency was improved by 40%, and in the case of faults, it was improved by 80%.

Shoal is a framework that enhances the Narwhal-based consensus protocol ( through pipelining and leader reputation, such as DAG-Rider, Tusk, and Bullshark ). Pipelining reduces DAG sorting latency by introducing an anchor point in each round, while leader reputation further improves latency issues by ensuring that anchor points are associated with the fastest validating nodes. Additionally, leader reputation allows Shoal to leverage asynchronous DAG constructions to eliminate timeouts in all scenarios. This enables Shoal to provide a property called universal responsiveness, which contains the typically required optimistic response.

This technology is very simple, involving the sequential execution of multiple instances of the underlying protocol. Therefore, when instantiated with Bullshark, we get a group of “sharks” participating in a relay race.

A Comprehensive Explanation of the Shoal Framework: How to Reduce Bullshark latency on Aptos?

Motivation

In the pursuit of high performance in blockchain networks, there has been a continuous focus on reducing communication complexity. However, this approach has not resulted in a significant increase in throughput. For example, Hotstuff implemented in early versions of Diem only achieved 3500 TPS, far below the target of 100k+ TPS.

The recent breakthrough stems from the realization that data propagation is the main bottleneck based on the leader protocol and can benefit from parallelization. The Narwhal system separates data propagation from the core consensus logic, proposing an architecture where all validators propagate data simultaneously, while the consensus component only orders a small amount of metadata. The Narwhal paper reports a throughput of 160,000 TPS.

In a previous article, we introduced Quorum Store. Our Narwhal implementation separates data propagation from consensus, and how we use it to scale the current consensus protocol Jolteon. Jolteon is a leader-based protocol that combines Tendermint’s linear fast path and PBFT-style view changes, which can reduce Hotstuff latency by 33%. However, it is clear that leader-based consensus protocols cannot fully leverage the throughput potential of Narwhal. Although data propagation is separated from consensus, the leaders of Hotstuff/Jolteon are still limited as throughput increases.

Therefore, we decided to deploy Bullshark on top of the Narwhal DAG, which is a zero-communication overhead consensus protocol. Unfortunately, compared to Jolteon, the DAG structure supporting Bullshark’s high throughput incurs a 50% latency cost.

This article describes how Shoal significantly reduces Bullshark latency.

DAG-BFT Background

Each vertex in the Narwhal DAG is associated with a round. To enter round r, a validator must first obtain n-f vertices that belong to round r-1. Each validator can broadcast one vertex per round, and each vertex must reference at least n-f vertices from the previous round. Due to the asynchronicity of the network, different validators may observe different local views of the DAG at any given time.

A key property of DAG is unambiguous: if two validating nodes have the same vertex v in their local DAG view, then they have exactly the same causal history of v.

Detailed explanation of the Shoal framework: How to reduce Bullshark latency on Aptos?

Overall Ranking

Consensus on the total order of all vertices in the DAG can be achieved without additional communication overhead. To this end, validators in DAG-Rider, Tusk, and Bullshark interpret the structure of the DAG as a consensus protocol, where vertices represent proposals and edges represent votes.

Although the group intersection logic of the DAG structure is different, all existing Narwhal-based consensus protocols have the following structure:

  1. Anchor Point: Every few rounds (, for example, in Bullshark, there will be a pre-determined leader in two rounds ), and the peak of the leader is called the anchor point;

  2. sorting anchor points: validators independently but deterministically decide which anchor points to order and which to skip;

  3. Causal History of Ordering: Validators process their ordered anchor point lists one by one, and for each anchor point, they sort all previously unordered vertices in their causal history according to some deterministic rules.

The key to ensuring security is to make sure that in step (2), all honest validating nodes create an ordered anchor point list so that all lists share the same prefix. In Shoal, we make the following observations regarding all the above protocols:

All validators agree on the first ordered anchor point.

Detailed Explanation of the Shoal Framework: How to Reduce Bullshark latency on Aptos?

Bullshark latency

The latency of Bullshark depends on the number of rounds between the ordered anchors in the DAG. While the synchronous version of Bullshark is more practical and has better latency than the asynchronous version, it is far from optimal.

Question 1: Average block latency. In Bullshark, each even round has an anchor point, and the vertices of each odd round are interpreted as votes. In common cases, it takes two rounds of DAG to order the anchor points; however, the vertices in the causal history of the anchor require more rounds to wait for the anchor to be sorted. In common cases, the vertices in the odd round require three rounds, while the non-anchor vertices in the even round require four rounds.

Question 2: Latency of failure cases, the above latency analysis applies to fault-free situations. On the other hand, if a round’s leader fails to broadcast the anchor point quickly enough, the anchor point cannot be sorted ( and is therefore skipped ). As a result, all unsorted vertices from the previous rounds must wait for the next anchor point to be sorted. This will significantly drop the performance of the geographic replication network, especially since Bullshark timeouts are used to wait for the leader.

A Comprehensive Explanation of the Shoal Framework: How to Reduce Bullshark Latency on Aptos?

Shoal Framework

Shoal addresses these two latency issues by enhancing Bullshark( or any other Narwhal-based BFT protocol) through pipelining, allowing for an anchor point in each round and reducing the latency of all non-anchor vertices in the DAG to three rounds. Shoal also introduces a zero-cost leader reputation mechanism in the DAG, which biases selection towards fast leaders.

Challenge

In the context of the DAG protocol, pipeline and leader reputation are considered difficult issues for the following reasons:

  1. Previous pipelines attempted to modify the core Bullshark logic, but this seems to be fundamentally impossible.

  2. The introduction of leader reputation in DiemBFT and its formalization in Carousel is based on the dynamic selection of future leaders according to the past performance of validators, the idea being the anchor in ( Bullshark. While differences in leader identity do not violate the security of these protocols, in Bullshark, it may lead to completely different rankings, which brings to light the core issue that dynamically and deterministically choosing the round anchor is necessary for resolving consensus, while validators need to reach consensus on the ordered history to select future anchors.

As evidence of the difficulty of the problem, we note that the implementation of Bullshark, including the current implementation in the production environment, does not support these features.

![Detailed explanation of the Shoal framework: How to drop Bullshark latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-0b0928cb6240e994c1514c75e080a4b2.webp(

Protocol

Despite the aforementioned challenges, as the saying goes, the solution is often hidden in simplicity.

In Shoal, we rely on the ability to perform local computations on the DAG and have implemented the capability to save and reinterpret information from previous rounds. With the core insight that all validators agree on the first ordered anchor point, Shoal sequentially combines multiple Bullshark instances for pipelining, making ) the first ordered anchor point the switching point of instances, and the causal history of ( anchors used to calculate the reputation of the leader.

Pipeline

V that maps rounds to leaders. Shoal runs instances of Bullshark one after another, so for each instance, the anchor is predetermined by the mapping F. Each instance orders an anchor, which triggers the switch to the next instance.

Initially, Shoal launched the first instance of Bullshark in the first round of DAG and ran it until the first ordered anchor point was determined, such as in round r. All validators agreed on this anchor point. Therefore, all validators can confidently agree to reinterpret DAG starting from round r+1. Shoal simply launched a new Bullshark instance in round r+1.

In the best case, this allows Shoal to order an anchor in each round. The anchor points in the first round are sorted by the first instance. Then, Shoal starts a new instance at the beginning of the second round, which itself has an anchor point that is sorted by that instance, and then another new instance orders anchor points in the third round, and the process continues.

![A Comprehensive Explanation of the Shoal Framework: How to Reduce Bullshark Latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-859e732e16c3eee0e2c93422474debc2.webp(

Leader Reputation

During the Bullshark sorting process, skipping anchor points increases latency. In this case, pipeline technology is powerless since a new instance cannot be started before placing an order for the anchor point in the previous instance. Shoal ensures that it is less likely to select the corresponding leader to handle missing anchor points in the future by assigning a score to each validator based on the recent activity history of each validation node using a reputation mechanism. Validators who respond and participate in the protocol will receive high scores; otherwise, validation nodes will be assigned low scores as they may crash, be slow, or act maliciously.

The concept is to deterministically recalculate the predefined mapping F from the round to the leader each time the score is updated, favoring the leader with the higher score. In order for validators to reach consensus on the new mapping, they should achieve consensus on the scores, thereby reaching consensus on the history used to derive the scores.

In Shoal, pipelines and leadership reputation can naturally combine, as they both use the same core technology, which is to reinterpret the DAG after reaching consensus on the first ordered anchor point.

In fact, the only difference is that after sorting the anchors in the r-th round, the validators only need to calculate the new mapping F’ starting from the (r+1)-th round based on the causal history of the ordered anchors in the r-th round. Then, the validating nodes perform a new instance of Bullshark using the updated anchor selection function F’ starting from the (r+1)-th round.

![A Comprehensive Explanation of the Shoal Framework: How to Reduce Bullshark Latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-9f789cb669f6fcc244ea7ff7648e48b4.webp(

No more timeouts

Timeout plays a crucial role in all leader-based deterministic partially synchronous BFT implementations. However, the complexity they introduce increases the number of internal states that need to be managed and observed, which adds to the complexity of the debugging process and requires more observability techniques.

Timeouts can also significantly increase latency, as it is crucial to configure them properly, and they often require dynamic adjustments since they are highly dependent on the environment ) network (. Before transferring to the next leader, the protocol imposes a full timeout latency penalty on the faulty leader. Therefore, timeout settings cannot be too conservative, but if

APT2.62%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 9
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin