Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Delayed Prevote Prototype #1460

Closed
wants to merge 41 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
5a90ec5
delays precommits
staheri14 Jul 25, 2024
86b556d
calculates delay based on 11s after start time
staheri14 Jul 26, 2024
7a727d7
minor replacement
staheri14 Jul 26, 2024
9ee8842
Merge branch 'v0.34.x-celestia' into sanaz/delayed-precommit
staheri14 Jul 26, 2024
710b5c9
increases timeout in the test to account for the delayed precommit
staheri14 Jul 26, 2024
064b55b
Merge branch 'v0.34.x-celestia' into sanaz/delayed-precommit
staheri14 Jul 26, 2024
691fd4e
increases the waittime based on the expected block time
staheri14 Jul 26, 2024
6b9cdf5
fixes the TestProvider
staheri14 Jul 26, 2024
b4c184b
increases waittime for TestApp_Tx
staheri14 Jul 26, 2024
6120f68
increases to block time
staheri14 Jul 26, 2024
8d67ff1
fixes failure in TestMempoolNoProgressUntilTxsAvailable
staheri14 Jul 26, 2024
c009f22
fixes TestHeaderEvents
staheri14 Jul 26, 2024
91a0bd5
adds a todo
staheri14 Jul 26, 2024
c1f8105
adds a todo
staheri14 Jul 26, 2024
ed4bd1c
adds todos and reminders
staheri14 Jul 26, 2024
2598052
adds a todo to fix TestBroadcastTx
staheri14 Jul 26, 2024
9d6925e
extends contex timeout to make the TestBroadcastTx pass
staheri14 Jul 26, 2024
d44bb3b
increases TimeoutBroadcastTxCommit
staheri14 Jul 26, 2024
35f921b
fixes TestNodeStartStop by taking the new block time into account
staheri14 Jul 26, 2024
39ace76
fixes TestReactorVotingPowerChange
staheri14 Jul 26, 2024
5ddeaf3
accounts for block time variance in TestApp_Tx
staheri14 Jul 29, 2024
7ca9f7c
clarifies comments about TimeoutBroadcastTxCommit
staheri14 Jul 29, 2024
65e5442
sets wait time higher for the sake of CIs
staheri14 Jul 29, 2024
8a90067
adjust waiting time if higher than 11s
staheri14 Jul 29, 2024
a7468f6
traces precommit time table
staheri14 Jul 31, 2024
8095e52
adds PRecommitTimeTable to the consensus tables
staheri14 Jul 31, 2024
de810b1
adds consensus prefix to the table name
staheri14 Jul 31, 2024
c544f8a
does not adjust wait time
staheri14 Jul 31, 2024
b73fb1a
adds a comment
staheri14 Aug 6, 2024
9b7daab
captures the time where a new proposal arrives
staheri14 Aug 6, 2024
35a0e3e
traces when a height starts by cs.StartTime or by the arrival of a ne…
staheri14 Aug 7, 2024
fd04b4f
traces the start time of each height
staheri14 Aug 8, 2024
c44cdc4
adds an alternative implementation
staheri14 Aug 9, 2024
b008c8e
implements delayed prevote
staheri14 Aug 21, 2024
bb0a5a0
Merge remote-tracking branch 'origin/v0.34.x-celestia' into sanaz/del…
staheri14 Aug 21, 2024
f48658d
adds some comments about the waiting time before prevotes
staheri14 Aug 21, 2024
f64fe43
traces prevote delays
staheri14 Aug 21, 2024
3a75988
traces the proposer
staheri14 Aug 21, 2024
ab97854
updates go version to 1.22.6
staheri14 Aug 26, 2024
41156de
Merge branch 'sanaz/bumps-to-go1.22.6' into sanaz/delayed-prevote
staheri14 Aug 26, 2024
4b39913
increases delay to 10 seconds
staheri14 Aug 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/check-generated.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
steps:
- uses: actions/setup-go@v4
with:
go-version: "1.22.5"
go-version: "1.22.6"

- uses: actions/checkout@v4
with:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-go@v4
with:
go-version: "1.22.5"
go-version: "1.22.6"
- name: Create a file with all the pkgs
run: go list ./... > pkgs.txt
- name: Split pkgs into 4 files
Expand Down Expand Up @@ -48,7 +48,7 @@ jobs:
steps:
- uses: actions/setup-go@v4
with:
go-version: "1.22.5"
go-version: "1.22.6"
- uses: actions/checkout@v3
- uses: technote-space/get-diff-action@v6
with:
Expand All @@ -70,7 +70,7 @@ jobs:
steps:
- uses: actions/setup-go@v4
with:
go-version: "1.22.5"
go-version: "1.22.6"
- uses: actions/checkout@v3
- uses: technote-space/get-diff-action@v6
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/e2e-manual.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
steps:
- uses: actions/setup-go@v4
with:
go-version: '1.22.5'
go-version: '1.22.6'

- uses: actions/checkout@v4

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/e2e-nightly-34x.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
steps:
- uses: actions/setup-go@v4
with:
go-version: '1.22.5'
go-version: '1.22.6'

- uses: actions/checkout@v4
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:
steps:
- uses: actions/setup-go@v4
with:
go-version: '1.22.5'
go-version: '1.22.6'
- uses: actions/checkout@v3
- uses: technote-space/get-diff-action@v6
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/fuzz-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
steps:
- uses: actions/setup-go@v4
with:
go-version: '1.22.5'
go-version: '1.22.6'

- uses: actions/checkout@v4

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/govulncheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
steps:
- uses: actions/setup-go@v3
with:
go-version: "1.22.5"
go-version: "1.22.6"
- uses: actions/checkout@v3
- uses: technote-space/get-diff-action@v6
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pre-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:

- uses: actions/setup-go@v4
with:
go-version: '1.22.5'
go-version: '1.22.6'

# Similar check to ./release-version.yml, but enforces this when pushing
# tags. The ./release-version.yml check can be bypassed and is mainly
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release-version.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:

- uses: actions/setup-go@v4
with:
go-version: '1.22.5'
go-version: '1.22.6'

- name: Check version
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:

- uses: actions/setup-go@v4
with:
go-version: '1.22.5'
go-version: '1.22.6'

- name: Generate release notes
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
steps:
- uses: actions/setup-go@v4
with:
go-version: "1.22.5"
go-version: "1.22.6"
- uses: actions/checkout@v3
- uses: technote-space/get-diff-action@v6
with:
Expand Down Expand Up @@ -121,7 +121,7 @@ jobs:
# steps:
# - uses: actions/setup-go@v3
# with:
# go-version: "1.22.5"
# go-version: "1.22.6"
# - uses: actions/checkout@v3
# - uses: technote-space/get-diff-action@v6
# with:
Expand Down
2 changes: 1 addition & 1 deletion DOCKER/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Use a build arg to ensure that both stages use the same,
# hopefully current, go version.
ARG GOLANG_BASE_IMAGE=golang:1.22.5-alpine
ARG GOLANG_BASE_IMAGE=golang:1.22.6-alpine

# stage 1 Generate CometBFT Binary
FROM --platform=$BUILDPLATFORM $GOLANG_BASE_IMAGE as builder
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ This repo intends on preserving the minimal possible diff with [cometbft/cometbf
- **specific to Celestia**: consider if [celestia-app](https://github.com/celestiaorg/celestia-app) is a better target
- **not specific to Celestia**: consider making the contribution upstream in CometBFT

1. [Install Go](https://go.dev/doc/install) 1.22.5+
1. [Install Go](https://go.dev/doc/install) 1.22.6+
2. Fork this repo
3. Clone your fork
4. Find an issue to work on (see [good first issues](https://github.com/celestiaorg/celestia-core/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22))
Expand Down
5 changes: 3 additions & 2 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -433,8 +433,9 @@ func DefaultRPCConfig() *RPCConfig {
MaxSubscriptionClients: 100,
MaxSubscriptionsPerClient: 5,
SubscriptionBufferSize: defaultSubscriptionBufferSize,
TimeoutBroadcastTxCommit: 10 * time.Second,
WebSocketWriteBufferSize: defaultSubscriptionBufferSize,
// this value is changed to align with the new consistent block time of 12ish seconds + 3 seconds block time variance (15 seconds)
TimeoutBroadcastTxCommit: 15 * time.Second,
WebSocketWriteBufferSize: defaultSubscriptionBufferSize,

MaxBodyBytes: int64(1000000), // 1MB
MaxHeaderBytes: 1 << 20, // same as the net/http default
Expand Down
2 changes: 1 addition & 1 deletion consensus/byzantine_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ func TestByzantinePrevoteEquivocation(t *testing.T) {
assert.Equal(t, prevoteHeight, ev.Height())
}
}
case <-time.After(20 * time.Second):
case <-time.After(1 * time.Minute):
for i, reactor := range reactors {
t.Logf("Consensus Reactor %d\n%v", i, reactor)
}
Expand Down
2 changes: 1 addition & 1 deletion consensus/common_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ type cleanupFunc func()
var (
config *cfg.Config // NOTE: must be reset for each _test.go file
consensusReplayConfig *cfg.Config
ensureTimeout = time.Millisecond * 200
ensureTimeout = time.Second * 12 // time.Millisecond * 200
)

func ensureDir(dir string, mode os.FileMode) {
Expand Down
4 changes: 2 additions & 2 deletions consensus/reactor_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -709,9 +709,9 @@ func timeoutWaitGroup(t *testing.T, n int, f func(int), css []*State) {
close(done)
}()

// we're running many nodes in-process, possibly in in a virtual machine,
// we're running many nodes in-process, possibly in a virtual machine,
// and spewing debug messages - making a block could take a while,
timeout := time.Second * 120
timeout := (12 * time.Second) * 120

select {
case <-done:
Expand Down
44 changes: 38 additions & 6 deletions consensus/state.go
Original file line number Diff line number Diff line change
Expand Up @@ -106,17 +106,17 @@ type State struct {
privValidatorPubKey crypto.PubKey

// state changes may be triggered by: msgs from peers,
// msgs from ourself, or by timeouts
// msgs from ourselves, or by timeouts
peerMsgQueue chan msgInfo
internalMsgQueue chan msgInfo
timeoutTicker TimeoutTicker

// information about about added votes and block parts are written on this channel
// information about added votes and block parts are written on this channel
// so statistics can be computed by reactor
statsMsgQueue chan msgInfo

// we use eventBus to trigger msg broadcasts in the reactor,
// and to notify external subscribers, eg. through a websocket
// and to notify external subscribers, e.g., through a websocket
eventBus *types.EventBus

// a Write-Ahead Log ensures we can recover from any kind of crash
Expand Down Expand Up @@ -842,7 +842,7 @@ func (cs *State) handleMsg(mi msgInfo) {
// if the proposal is complete, we'll enterPrevote or tryFinalizeCommit
added, err = cs.addProposalBlockPart(msg, peerID)

// We unlock here to yield to any routines that need to read the the RoundState.
// We unlock here to yield to any routines that need to read the RoundState.
// Previously, this code held the lock from the point at which the final block
// part was received until the block executed against the application.
// This prevented the reactor from being able to retrieve the most updated
Expand Down Expand Up @@ -915,6 +915,10 @@ func (cs *State) handleMsg(mi msgInfo) {

func (cs *State) handleTimeout(ti timeoutInfo, rs cstypes.RoundState) {
cs.Logger.Debug("received tock", "timeout", ti.Duration, "height", ti.Height, "round", ti.Round, "step", ti.Step)
if ti.Height == rs.Height {
schema.WriteRoundState(cs.traceClient, ti.Height, ti.Round,
schema.StartTimeIsReached)
}

// timeouts must be for current height, round, step
if ti.Height != rs.Height || ti.Round < rs.Round || (ti.Round == rs.Round && ti.Step < rs.Step) {
Expand Down Expand Up @@ -942,6 +946,8 @@ func (cs *State) handleTimeout(ti timeoutInfo, rs cstypes.RoundState) {

cs.enterPrevote(ti.Height, ti.Round)

case cstypes.RoundStepDelaydPrevote:
cs.enterPrevote(ti.Height, ti.Round)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you call doPrevotehere?
If you call enterPrevote here again, the defer in method enterPrevote will be called twice. Not catastrophic, but there'll be a few duplicated events that may confuse external observers (e.g., relayers)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sergio-mena for your comment, you are right, the duplicate events needs to be considered carefully which is not yet the case in the current prototype implementation. But, will certainly address this in the full version.

case cstypes.RoundStepPrevoteWait:
if err := cs.eventBus.PublishEventTimeoutWait(cs.RoundStateEvent()); err != nil {
cs.Logger.Error("failed publishing timeout wait", "err", err)
Expand Down Expand Up @@ -1010,6 +1016,11 @@ func (cs *State) enterNewRound(height int64, round int32) {
return
}

if cs.Round == round && cs.Step == cstypes.RoundStepNewHeight {
schema.WriteRoundState(cs.traceClient, cs.Height, cs.Round,
schema.NewHeightByStartTime)
}

if now := cmttime.Now(); cs.StartTime.After(now) {
logger.Debug("need to set a buffer and log message here for sanity", "start_time", cs.StartTime, "now", now)
}
Expand Down Expand Up @@ -1104,7 +1115,7 @@ func (cs *State) enterPropose(height int64, round int32) {
defer func() {
// Done enterPropose:
cs.updateRoundStep(round, cstypes.RoundStepPropose)
cs.newStep()
cs.newStep() // announce the new step

// If we have the whole proposal + POL, then goto Prevote now.
// else, we'll enterPrevote when the rest of the proposal is received (in AddProposalBlockPart),
Expand Down Expand Up @@ -1141,6 +1152,8 @@ func (cs *State) enterPropose(height int64, round int32) {
}

if cs.isProposer(address) {
// trace if the node is the proposer
schema.WriteProposer(cs.traceClient, height, round, address.String())
logger.Debug("propose step; our turn to propose", "proposer", address)
cs.decideProposal(height, round)
} else {
Expand Down Expand Up @@ -1252,10 +1265,21 @@ func (cs *State) createProposalBlock() (block *types.Block, blockParts *types.Pa
return cs.blockExec.CreateProposalBlock(cs.Height, cs.state, commit, proposerAddr)
}

// isReadyToPrevote calculates if the process has waited at least 7 seconds
// from their start time before they can vote
func (cs *State) isReadyToPrevote() (bool, time.Duration) {
// We wait for 12 seconds. Why 12?
// 10s for block propagation and 2s for processing it
prevoteVoteTime := cs.StartTime.Add(12 * time.Second)
waitTime := time.Until(prevoteVoteTime)
schema.WritePrevoteDelay(cs.traceClient, cs.Height, cs.Round, waitTime.Seconds())
return waitTime <= 0, waitTime
}

// Enter: `timeoutPropose` after entering Propose.
// Enter: proposal block and POL is ready.
// Prevote for LockedBlock if we're locked, or ProposalBlock if valid.
// Otherwise vote nil.
// Otherwise, vote nil.
func (cs *State) enterPrevote(height int64, round int32) {
logger := cs.Logger.With("height", height, "round", round)

Expand All @@ -1267,6 +1291,12 @@ func (cs *State) enterPrevote(height int64, round int32) {
return
}

if ready, waitTime := cs.isReadyToPrevote(); !ready {
// this will reenter prevote after the waitTime
cs.scheduleTimeout(waitTime, height, round, cstypes.RoundStepDelaydPrevote)
return
}

defer func() {
// Done enterPrevote:
cs.updateRoundStep(round, cstypes.RoundStepPrevote)
Expand Down Expand Up @@ -1898,6 +1928,7 @@ func (cs *State) defaultSetProposal(proposal *types.Proposal) error {

proposal.Signature = p.Signature
cs.Proposal = proposal
schema.WriteRoundState(cs.traceClient, cs.Height, cs.Round, schema.NewProposalArrived)
// We don't update cs.ProposalBlockParts if it is already set.
// This happens if we're already in cstypes.RoundStepCommit or if there is a valid block in the current round.
// TODO: We can check if Proposal is for a different block as this is a sign of misbehavior!
Expand Down Expand Up @@ -2004,6 +2035,7 @@ func (cs *State) handleCompleteProposal(blockHeight int64) {
// procedure at this point.
}

// I think this happens before the cs.StartTime starts.
if cs.Step <= cstypes.RoundStepPropose && cs.isProposalComplete() {
// Move onto the next step
cs.enterPrevote(blockHeight, cs.Round)
Expand Down
1 change: 1 addition & 0 deletions consensus/types/round_state.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ const (
RoundStepPrecommit = RoundStepType(0x06) // Did precommit, gossip precommits
RoundStepPrecommitWait = RoundStepType(0x07) // Did receive any +2/3 precommits, start timeout
RoundStepCommit = RoundStepType(0x08) // Entered commit state machine
RoundStepDelaydPrevote = RoundStepType(0x09) // Delayed prevote
// NOTE: RoundStepNewHeight acts as RoundStepCommitWait.

// NOTE: Update IsValid method if you change this!
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module github.com/tendermint/tendermint

go 1.22.5
go 1.22.6

require (
github.com/BurntSushi/toml v1.2.1
Expand Down
8 changes: 7 additions & 1 deletion light/provider/http/http_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,13 @@ func TestProvider(t *testing.T) {
require.Nil(t, lb)
assert.Equal(t, provider.ErrHeightTooHigh, err)

_, err = p.LightBlock(context.Background(), 1)
// when we reach here, height 1 is expected to be pruned
// however, due to the delayed precommit logic which makes the node to
// sleep to match the 12s block time,
// it seems the pruning does not happen immediately,
// so adding a sleep here to see if that fixes the issue
time.Sleep(12 * 10 * time.Second)
lb, err = p.LightBlock(context.Background(), 1)
require.Error(t, err)
require.Nil(t, lb)
assert.Equal(t, provider.ErrLightBlockNotFound, err)
Expand Down
3 changes: 2 additions & 1 deletion node/node_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,12 @@ func TestNodeStartStop(t *testing.T) {
// wait for the node to produce a block
blocksSub, err := n.EventBus().Subscribe(context.Background(), "node_test", types.EventQueryNewBlock)
require.NoError(t, err)
blockTime := 12 * time.Second
select {
case <-blocksSub.Out():
case <-blocksSub.Cancelled():
t.Fatal("blocksSub was cancelled")
case <-time.After(10 * time.Second):
case <-time.After(blockTime):
t.Fatal("timed out waiting for the node to produce a block")
}

Expand Down
Loading
Loading