Skip to content

bigtable: retry read on transient parent-table-not-ready (#25463)#80

Open
jbbqqf wants to merge 11 commits into
mainfrom
feat/25463-bigtable-schema-bundle-retry-read
Open

bigtable: retry read on transient parent-table-not-ready (#25463)#80
jbbqqf wants to merge 11 commits into
mainfrom
feat/25463-bigtable-schema-bundle-retry-read

Conversation

@jbbqqf

@jbbqqf jbbqqf commented May 9, 2026

Copy link
Copy Markdown
Owner

Summary

Add a retry predicate so google_bigtable_schema_bundle retries its post-create Read when the parent Bigtable table is still propagating from a create operation in the same apply. This eliminates the spurious "tainted resource" outcome the OP reports.

Fixes hashicorp/terraform-provider-google#25463 — see hashicorp/terraform-provider-google#25463

Why

The OP reports — and confirms via gcloud — that the POST to create a schema bundle succeeds, but the immediate GET against the same resource within the same apply returns:

googleapi: Error 400: Parent table projects/.../instances/.../tables/... is either
creating or deleting, please try again.

This is a classic "parent eventually consistent" race: Bigtable's table create returns success very fast, but a brief window remains where dependent operations against tables/X/schemaBundles/Y see the table as transitioning. The schema bundle resource itself was created (visible to gcloud), but Terraform reports the apply as failed and marks the bundle tainted because the post-create Read returned 400.

The exact same shape of bug has a proven fix already in this codebase: PubsubTopicProjectNotReady (mmv1/third_party/terraform/transport/error_retry_predicates.go:521) catches a transient 400 against new pubsub topics and retries. The error_retry_predicates mmv1 attribute applies the predicate to all CRUD calls of the resource, transparently routing the retry through the existing transport retry helper.

This PR adds:

  1. A new predicate BigtableSchemaBundleParentTableNotReady that matches 400 + body contains "is either creating or deleting".
  2. A reference to it in mmv1/products/bigtable/SchemaBundle.yaml.

GCP API reference: https://cloud.google.com/bigtable/docs/reference/admin/rest/v2/projects.instances.tables.schemaBundles

What changed

mmv1/products/bigtable/SchemaBundle.yaml                            |  2 ++
mmv1/third_party/terraform/transport/error_retry_predicates.go      | 15 +++++++++++++++
2 files changed, 17 insertions(+)

Edge cases tested

# Scenario Expected Verified by
1 Schema bundle created in a separate apply (table already exists) No 400 ever raised; Read succeeds first try; predicate is dormant. Static — predicate only fires on this exact 400 body.
2 Schema bundle created in the same apply as its parent table (the OP's case) POST 200; first GET → 400 with "is either creating or deleting"; predicate matches; transport retries; subsequent GET succeeds. Resource lands non-tainted. Static — same shape as PubsubTopicProjectNotReady, which has been in production since 2020 (issue GoogleCloudPlatform#4349).
3 Different 400 (e.g. permission, malformed request) Body does not contain "is either creating or deleting" → predicate returns false → existing behavior (error surfaced). Static — predicate is a tight string match.
4 Predicate on Update or Delete (not just Read) error_retry_predicates applies to all CRUD calls. Update can race the same way (e.g. table being deleted concurrently); Delete is fine because the bundle is gone if its parent is gone. Static — same blanket coverage as PubsubTopicProjectNotReady.

Test protocol

Test Result Notes
go build ./google/transport/... (after copying canonical source into tpg) ok Compiles cleanly.
YAML lint / mmv1 generation n/a in this PR Will be exercised by mmv1 CI. The change is one new function + one yaml line, both following an exact existing template.
Live BEFORE/AFTER smoke not run Reproducing the race deterministically requires creating a Bigtable instance + table + schema bundle in the same apply, then catching the brief window where the table is still propagating. The window is on the order of seconds and is not reliably reproducible. The OP's evidence (POST succeeds, GET fails, gcloud confirms the bundle exists) is the canonical fingerprint of this bug class, and the fix is mechanically identical to a proven 5-year-old precedent.

Resources

Disclosure

This PR was implemented with assistance from Claude Code as part of a focused contribution batch. The diff is 17 lines split between an mmv1 YAML attribute reference and a new error-retry predicate function modeled directly on PubsubTopicProjectNotReady. Compile was verified by hand-applying the new function to the rendered tpg transport package.

jcromanu and others added 11 commits May 8, 2026 16:43
…#25463)

Adding a schema bundle right after creating its parent table can race
with the table's create-operation propagation. The schema bundle POST
returns 200 (the bundle exists in GCP, verified via gcloud), but the
follow-up GET in the same apply returns "Parent table ... is either
creating or deleting" and Terraform marks the resource tainted.

Add an error_retry_predicate that recognises this transient 400 and
retries the read. Same pattern as PubsubTopicProjectNotReady.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

google_bigtable_schema_bundle creation always results in tainted resource

8 participants