Skip to content

db_stress: Rebuild expected state from DB when trace is truncated in blackbox crash tests#14587

Open
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:2026_04_08_T263723982
Open

db_stress: Rebuild expected state from DB when trace is truncated in blackbox crash tests#14587
xingbowang wants to merge 3 commits intofacebook:mainfrom
xingbowang:2026_04_08_T263723982

Conversation

@xingbowang
Copy link
Copy Markdown
Contributor

@xingbowang xingbowang commented Apr 8, 2026

Summary

In blackbox crash tests, db_stress can be killed at any point — including while it is actively appending records to the trace file. This leaves the recovered DB in a consistent state (thanks to WAL recovery), but the trace file shorter than expected: fewer operations were recorded than the DB's recovered sequence number implies.

Previously, when FinishInitDb detected this situation it called FileExpectedStateManager::Restore() which ended up returning Status::Corruption("Trace ended before replaying all expected write ops"). db_stress treated that as a hard failure and called exit(1), causing spurious test failures that had nothing to do with actual data corruption.

This PR fixes the problem in three parts:

1. expected_state.cc – Structured Incomplete status with shortfall details

When the trace replay finishes but the handler has not replayed all expected write operations, return Status::Incomplete() instead of Status::Corruption(). The status now carries structured metadata — replayed_write_ops=N; expected_write_ops=M — via MakeExpectedStateRestoreShortfallStatus(). A corresponding ParseExpectedStateRestoreShortfallStatus() allows callers to extract these counts and make informed decisions about whether the shortfall is benign.

2. db.h / db_impl – New kLastRecoveredWalBatchWriteCount DB property

A new uint64 DB property rocksdb.last-recovered-wal-batch-write-count reports the number of data write operations in the last WAL batch recovered during DB::Open(). Returns 0 when no WAL recovery was needed. This gives db_stress a way to cross-check the trace shortfall against what was actually replayed from the WAL.

3. db_stress_test_base.cc – Tightened rebuild guard in FinishInitDb()

When Restore() returns Incomplete, the rebuild is now only triggered if the shortfall exactly matches the last recovered WAL batch write count. This means the gap is fully explained by a single partial WAL batch that was traced but not yet committed before the crash — a safe condition. If the counts don't match (suggesting a deeper problem), the original Incomplete error is preserved with appended context and db_stress still exits with failure.

The RebuildExpectedStateFromDb() method iterates every column family, calls ClearColumnFamily() + SyncPut() for each key found, and validates wide column consistency and key-range sanity along the way.

Test Plan

  • New unit test LastRecoveredWalBatchWriteCountProperty verifies the property returns the correct count after recovery and 0 after a clean reopen.
  • Covered by existing blackbox crash test modes (blackbox, cf_consistency, etc.) which can now survive a truncated trace without false failures.

…blackbox crash tests

Summary:
In blackbox crash tests, db_stress can be killed at any point — including
while it is actively appending records to the trace file. This leaves the
recovered DB in a consistent state (thanks to WAL recovery), but the trace
file shorter than expected: fewer operations were recorded than the DB's
recovered sequence number implies.

Previously, when FinishInitDb detected this situation it called
FileExpectedStateManager::Restore() which ended up returning
Status::Corruption("Trace ended before replaying all expected write ops").
db_stress treated that as a hard failure and called exit(1), causing spurious
test failures that had nothing to do with actual data corruption.

This PR fixes the two-part problem:

1. expected_state.cc – FileExpectedStateManager::Restore():
   When the trace replay finishes (either because Prepare() signals the trace
   is already at EOF, or because Next() exhausts all records) but the handler
   has not replayed all expected write operations, return Status::Incomplete()
   instead of Status::Corruption(). Incomplete means "the trace ran out early,
   which is expected in a blackbox crash"; Corruption is reserved for actual
   data integrity problems.

2. db_stress_test_base.cc – StressTest::FinishInitDb() + new
   RebuildExpectedStateFromDb():
   When Restore() returns Incomplete, fall back to scanning the recovered DB
   directly and rebuilding the expected state from its actual contents (the
   same approach used when there is no history at all). If the rebuild scan
   fails, or if Restore() returns any other non-OK status, still exit(1).

The new RebuildExpectedStateFromDb() method iterates every column family,
calls ClearColumnFamily() + SyncPut() for each key found, and validates wide
column consistency and key-range sanity along the way.

Test Plan:
Covered by existing blackbox crash test modes (blackbox, cf_consistency, etc.)
which can now survive a truncated trace without false failures.

Reviewers:

Subscribers:

Tasks:

Tags:
@meta-cla meta-cla bot added the CLA Signed label Apr 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

⚠️ clang-tidy: 4 warning(s) on changed lines

Completed in 150.4s.

Summary by check

Check Count
cert-err58-cpp 2
clang-analyzer-deadcode.DeadStores 1
concurrency-mt-unsafe 1
Total 4

Details

db/internal_stats.cc (2 warning(s))
db/internal_stats.cc:332:26: warning: initialization of 'last_recovered_wal_batch_write_count' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
db/internal_stats.cc:408:35: warning: initialization of 'kLastRecoveredWalBatchWriteCount' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
db_stress_tool/db_stress_test_base.cc (1 warning(s))
db_stress_tool/db_stress_test_base.cc:532:9: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/expected_state.cc (1 warning(s))
db_stress_tool/expected_state.cc:1239:9: warning: Value stored to 'reached_trace_end' is never read [clang-analyzer-deadcode.DeadStores]

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit cb232a7


Code Review: db_stress: Rebuild expected state from DB when trace is truncated in blackbox crash tests

Overall Assessment: APPROVE with minor suggestions

The approach is sound, well-scoped, and follows existing patterns. The changes only affect the db_stress test tool, and the fallback mechanism correctly rebuilds state from the source of truth (the recovered DB).

Findings

SUGGESTION S1: Defensive Prepare() Incomplete Handling is Dead Code

expected_state.cc:776-780ReplayerImpl::Prepare() never returns Incomplete. Good defensive coding, but worth a brief comment documenting why.

SUGGESTION S2: GetIntVal String Copy Per Key

db_stress_test_base.cc:538iter->key().ToString() allocates per key. Cold path, negligible impact, but GetIntVal could accept Slice for efficiency.

SUGGESTION S3: Temp File Left After Incomplete

When Restore() returns Incomplete, LATEST.state.tmp remains on disk. Not a correctness issue — Clean() handles it on next Open().

LOW L1: saved_seqno_ Not Reset

Self-correcting: next SaveAtAndAfter() establishes fresh state.

LOW L2: No Dedicated Unit Tests

Consistent with existing db_stress testing practices (integration via crash tests).

Validated Correctness

  • GetValueBase works for all value formats — merge operator preserves format, wide-column default column has same format
  • ProcessRecoveredPreparedTxns ordering is correct — prepared txns not visible to iterator
  • Error handling flow is correct — Incomplete triggers rebuild; other errors exit(1)
  • Locking is correct — per-CF lock covers clear + rebuild during init (no concurrent writers)

Invalidated Concerns

Concern Why Invalid
Merge operator breaks GetValueBase DBStressWideMergeOperator preserves value format
PreparedTxn ordering race Prepared txns not visible to iterator
latest_ corruption Validly points to Open() state; rebuild correctly overwrites

Full report written to review-findings.md.


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 8, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D100056511.

@joshkang97
Copy link
Copy Markdown
Contributor

How often does it return Status::Incomplete(). I wonder if this could end up hiding many failures due to rebuilding the entire expected state from the DB.

In this case, we only expect at most 1 trace record deviation. Is it possible to just read the WAL, and replay the last entry there?

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 9, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D100056511.

options.write_buffer_size = 1024 * 1024;
Reopen(options);

WriteBatch batch;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add two write batches and ensure only stats for last one is applied


uint64_t replayed_write_ops = 0;
uint64_t expected_write_ops = 0;
if (!ParseExpectedStateRestoreShortfallStatus(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty fragile as it relies on string parsing of the Restore status.

Can we instead just thread the counts through Restore?

Status Restore(DB* db, uint64_t* replayed_write_ops = nullptr,
                 uint64_t* expected_write_ops = nullptr);

missing_write_ops, s.ToString().c_str());
s = RebuildExpectedStateFromDb(shared);
} else {
s = AppendExpectedStateRestoreShortfallContext(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this branch still possible? Should we have a different status message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants