db_stress: Rebuild expected state from DB when trace is truncated in blackbox crash tests#14587
db_stress: Rebuild expected state from DB when trace is truncated in blackbox crash tests#14587xingbowang wants to merge 3 commits intofacebook:mainfrom
Conversation
…blackbox crash tests
Summary:
In blackbox crash tests, db_stress can be killed at any point — including
while it is actively appending records to the trace file. This leaves the
recovered DB in a consistent state (thanks to WAL recovery), but the trace
file shorter than expected: fewer operations were recorded than the DB's
recovered sequence number implies.
Previously, when FinishInitDb detected this situation it called
FileExpectedStateManager::Restore() which ended up returning
Status::Corruption("Trace ended before replaying all expected write ops").
db_stress treated that as a hard failure and called exit(1), causing spurious
test failures that had nothing to do with actual data corruption.
This PR fixes the two-part problem:
1. expected_state.cc – FileExpectedStateManager::Restore():
When the trace replay finishes (either because Prepare() signals the trace
is already at EOF, or because Next() exhausts all records) but the handler
has not replayed all expected write operations, return Status::Incomplete()
instead of Status::Corruption(). Incomplete means "the trace ran out early,
which is expected in a blackbox crash"; Corruption is reserved for actual
data integrity problems.
2. db_stress_test_base.cc – StressTest::FinishInitDb() + new
RebuildExpectedStateFromDb():
When Restore() returns Incomplete, fall back to scanning the recovered DB
directly and rebuilding the expected state from its actual contents (the
same approach used when there is no history at all). If the rebuild scan
fails, or if Restore() returns any other non-OK status, still exit(1).
The new RebuildExpectedStateFromDb() method iterates every column family,
calls ClearColumnFamily() + SyncPut() for each key found, and validates wide
column consistency and key-range sanity along the way.
Test Plan:
Covered by existing blackbox crash test modes (blackbox, cf_consistency, etc.)
which can now survive a truncated trace without false failures.
Reviewers:
Subscribers:
Tasks:
Tags:
|
| Check | Count |
|---|---|
cert-err58-cpp |
2 |
clang-analyzer-deadcode.DeadStores |
1 |
concurrency-mt-unsafe |
1 |
| Total | 4 |
Details
db/internal_stats.cc (2 warning(s))
db/internal_stats.cc:332:26: warning: initialization of 'last_recovered_wal_batch_write_count' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
db/internal_stats.cc:408:35: warning: initialization of 'kLastRecoveredWalBatchWriteCount' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
db_stress_tool/db_stress_test_base.cc (1 warning(s))
db_stress_tool/db_stress_test_base.cc:532:9: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/expected_state.cc (1 warning(s))
db_stress_tool/expected_state.cc:1239:9: warning: Value stored to 'reached_trace_end' is never read [clang-analyzer-deadcode.DeadStores]
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit cb232a7 Code Review: db_stress: Rebuild expected state from DB when trace is truncated in blackbox crash testsOverall Assessment: APPROVE with minor suggestions The approach is sound, well-scoped, and follows existing patterns. The changes only affect the db_stress test tool, and the fallback mechanism correctly rebuilds state from the source of truth (the recovered DB). FindingsSUGGESTION S1: Defensive
|
| Concern | Why Invalid |
|---|---|
| Merge operator breaks GetValueBase | DBStressWideMergeOperator preserves value format |
| PreparedTxn ordering race | Prepared txns not visible to iterator |
| latest_ corruption | Validly points to Open() state; rebuild correctly overwrites |
Full report written to review-findings.md.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D100056511. |
|
How often does it return In this case, we only expect at most 1 trace record deviation. Is it possible to just read the WAL, and replay the last entry there? |
# Conflicts: # db_stress_tool/expected_state.cc
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D100056511. |
| options.write_buffer_size = 1024 * 1024; | ||
| Reopen(options); | ||
|
|
||
| WriteBatch batch; |
There was a problem hiding this comment.
nit: can we add two write batches and ensure only stats for last one is applied
|
|
||
| uint64_t replayed_write_ops = 0; | ||
| uint64_t expected_write_ops = 0; | ||
| if (!ParseExpectedStateRestoreShortfallStatus( |
There was a problem hiding this comment.
This is pretty fragile as it relies on string parsing of the Restore status.
Can we instead just thread the counts through Restore?
Status Restore(DB* db, uint64_t* replayed_write_ops = nullptr,
uint64_t* expected_write_ops = nullptr);
| missing_write_ops, s.ToString().c_str()); | ||
| s = RebuildExpectedStateFromDb(shared); | ||
| } else { | ||
| s = AppendExpectedStateRestoreShortfallContext( |
There was a problem hiding this comment.
Is this branch still possible? Should we have a different status message.
Summary
In blackbox crash tests, db_stress can be killed at any point — including while it is actively appending records to the trace file. This leaves the recovered DB in a consistent state (thanks to WAL recovery), but the trace file shorter than expected: fewer operations were recorded than the DB's recovered sequence number implies.
Previously, when
FinishInitDbdetected this situation it calledFileExpectedStateManager::Restore()which ended up returningStatus::Corruption("Trace ended before replaying all expected write ops"). db_stress treated that as a hard failure and calledexit(1), causing spurious test failures that had nothing to do with actual data corruption.This PR fixes the problem in three parts:
1.
expected_state.cc– StructuredIncompletestatus with shortfall detailsWhen the trace replay finishes but the handler has not replayed all expected write operations, return
Status::Incomplete()instead ofStatus::Corruption(). The status now carries structured metadata —replayed_write_ops=N; expected_write_ops=M— viaMakeExpectedStateRestoreShortfallStatus(). A correspondingParseExpectedStateRestoreShortfallStatus()allows callers to extract these counts and make informed decisions about whether the shortfall is benign.2.
db.h/db_impl– NewkLastRecoveredWalBatchWriteCountDB propertyA new uint64 DB property
rocksdb.last-recovered-wal-batch-write-countreports the number of data write operations in the last WAL batch recovered duringDB::Open(). Returns 0 when no WAL recovery was needed. This gives db_stress a way to cross-check the trace shortfall against what was actually replayed from the WAL.3.
db_stress_test_base.cc– Tightened rebuild guard inFinishInitDb()When
Restore()returnsIncomplete, the rebuild is now only triggered if the shortfall exactly matches the last recovered WAL batch write count. This means the gap is fully explained by a single partial WAL batch that was traced but not yet committed before the crash — a safe condition. If the counts don't match (suggesting a deeper problem), the originalIncompleteerror is preserved with appended context and db_stress still exits with failure.The
RebuildExpectedStateFromDb()method iterates every column family, callsClearColumnFamily()+SyncPut()for each key found, and validates wide column consistency and key-range sanity along the way.Test Plan
LastRecoveredWalBatchWriteCountPropertyverifies the property returns the correct count after recovery and 0 after a clean reopen.blackbox,cf_consistency, etc.) which can now survive a truncated trace without false failures.