Add chunked payload generation and network transport by marcobambini · Pull Request #50 · sqliteai/sqlite-sync

marcobambini · 2026-05-29T09:58:53Z

Summary

This PR adds chunk-aware payload generation and send-path transport support to sqlite-sync. It keeps the existing monolithic payload APIs intact while adding a streaming path for large rowsets and oversized individual BLOB/TEXT values.

Implemented changes:

Adds cloudsync_payload_chunks():
- SQLite virtual table interface.
- PostgreSQL set-returning function interface.
- Three optional inputs: since_db_version, filter_site_id / local site id, and until_db_version.
- Emits per-chunk metadata including chunk index, payload size, row count, db version range, and stable watermark.
Adds global persisted setting payload_max_chunk_size:
- Default: 5 MB.
- Technical minimum: 256 KB, with lower values clamped.
- Controls only payload generation, not apply-time acceptance.
Adds v3 fragment payload support for oversized single BLOB/TEXT values:
- Large values can be split across multiple transport chunks transparently.
- Receivers stage fragments internally and apply the final value once all fragments arrive.
- Duplicate fragment delivery is idempotent and stale incomplete fragment groups are cleaned up.
Keeps backward compatibility:
- Existing cloudsync_payload_encode() remains supported for monolithic payloads.
- cloudsync_payload_apply() accepts legacy payloads, monolithic payloads, and v3 fragment payloads regardless of the local chunk-size setting.
Updates cloudsync_network_send_changes():
- Streams outgoing changes from cloudsync_payload_chunks() instead of building one large payload first.
- Sends each chunk through the existing /apply backend contract, either inline as blob or through upload url.
- Advances the local send checkpoint only after the chunk stream completes successfully.
- Merges remote status responses monotonically while multiple chunks are in flight.
Updates PostgreSQL Docker debug images to build/install dblink, required by the test suite.
Updates API, performance docs, README, and changelog.
Adds extensive SQLite unit coverage and PostgreSQL SQL coverage for chunk sizing, large-value fragmentation, out-of-order fragments, stale fragment cleanup, and legacy monolithic apply compatibility.

Compatibility

Existing users of cloudsync_payload_encode() and current network APIs continue to work. The new chunking behavior is opt-in for direct SQL callers via cloudsync_payload_chunks(), and automatic for the built-in network send path. Incoming payload apply remains format-compatible with older payloads and does not reject payloads based on the local payload_max_chunk_size.

Companion backend PR

cloudsync: https://github.com/sqliteai/cloudsync/pull/45

Testing

make
make unittest
make test reached and passed the SQLite/unit portion, then stopped at the remote e2e stage because INTEGRATION_TEST_DATABASE_ID is not set locally.
make postgres-docker-debug-rebuild
Full PostgreSQL suite via Docker container: psql -U postgres -d postgres -f test/postgresql/full_test.sql (Failures: 0). This was run inside the debug container because the debug PostgreSQL build listens on loopback inside the container and the host make postgres-docker-run-test connection is closed by that setup.
git diff --check

…elpers - cloudsync_payload_chunks: add exclude_filter_site_id flag (SQLite hidden column / PG 4th arg) to stream changes from all sites except filter_site_id, as the /check download path needs; setting it without a site_id is an error - add cloudsync_uuid_text()/cloudsync_uuid_blob() scalar functions on SQLite and PostgreSQL to convert site_id between its 16-byte binary form and the canonical UUID string (tolerant of dashed/undashed input), so string-based callers can pass a site_id to cloudsync_payload_chunks - sqlite vtab: rewrite best_index to assign argv in canonical column order, fixing a latent argument-ordering bug - perf: throttle the v3 fragment stale-group GC to at most once per 60s per connection (cloudsync_context.last_fragment_cleanup), removing an O(n^2) full-table scan that ran on every applied fragment - add PostgreSQL 1.0->1.1 migration for the new chunked-payload SQL surface - build: neutralize the ambient build env for curl's ./configure (CURL_CONFIG_ENV) so exported LDFLAGS/CPPFLAGS/LIBS don't break it - test: rename PG 39_payload_chunks.sql -> 52 (39 was duplicated); add multi-site exclude, UUID roundtrip and stale-GC-throttle coverage (SQLite unit + PG) - docs: API.md (new argument + two functions) and CHANGELOG Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

andinux · 2026-05-30T02:11:54Z

Update — commit `92a048c`

Adds the /check download-path support to cloudsync_payload_chunks plus related fixes.

Changes

exclude_filter_site_id flag on cloudsync_payload_chunks (SQLite hidden column / PG 4th arg): stream all sites except filter_site_id; error if set without a site_id.
New cloudsync_uuid_text() / cloudsync_uuid_blob() (SQLite + PG) to convert site_id between 16-byte binary and canonical UUID string, so string callers (the /check endpoint) can pass a site_id.
perf: throttled the v3 fragment stale-GC to once/60s per connection — removes an O(n²) full-table scan that ran on every applied fragment.
best_index rewritten to assign argv in canonical column order (latent arg-ordering bug fix).
PostgreSQL 1.0→1.1 migration for the new SQL surface.
build: hermetic env for curl ./configure (CURL_CONFIG_ENV).
tests: renamed 39_payload_chunks.sql → 52 (39 was duplicated); added multi-site exclude, UUID roundtrip, and stale-GC-throttle coverage.

Verified: SQLite unittest green; PG full_test.sql → Failures: 0.

TODO

Cross-engine v3 fragment round-trip test (SQLite ↔ PG) to lock the payload interop invariant.
Integration tests on CI: network send path + receive split-delivery (needs INTEGRATION_TEST_CHUNKED_DATABASE_ID).
Backend (sqliteai/cloudsync#45): make /check download client-capability-aware — monolithic/v2 for ≤1.0.x clients, chunked/v3 only for new ones; adopt the generate-once download model. EDIT: new implementation plan proposed with commit https://github.com/sqliteai/cloudsync/commit/a7ee135
Minor cleanups: strtoll(...,10) in dbutils_settings_get_value; contract comment on the memset-from-eof vtab reset; named constant for the fragment-sizing iteration count.

Add dedicated integration coverage for chunked network sync using INTEGRATION_TEST_CHUNKED_DATABASE_ID and a single-table chunked_payload_items schema. Exercise both oversized TEXT values split into multiple v3 fragment payloads and multi-row non-v3 payload streams, then send cleanup deletes to limit remote storage growth. Rename the network trace build switch from SYNC_BENCH_DEBUG to the generic NETWORK_TRACE=1 so commands such as make NETWORK_TRACE=1 e2e and make NETWORK_TRACE=1 sync-bench compile with CLOUDSYNC_NETWORK_TRACE.

Expose INTEGRATION_TEST_CHUNKED_DATABASE_ID from repository secrets to the main build job so the dedicated chunked payload e2e tests can run in CI. Forward the same variable into the linux-musl arm64 Docker container and Android emulator test script, matching the existing integration test secret handling.

Add a chunked network failure-path integration test using INTEGRATION_TEST_CHUNKED_DATABASE_ID and a local-only chunked_payload_failure_items schema. Generate multiple non-v3 chunks, expect remote apply to fail because the table is absent remotely, and verify send_dbversion does not advance after the failed send.

Remove the dead old_eof placeholder from payload_chunks_filter. Document the cursor layout contract around the memset-from-eof reset so future field moves preserve cursor-lifetime state and per-scan state ownership.

Introduce a shared CLOUDSYNC_PAYLOAD_FRAGMENT_SIZE_FIXPOINT_ITERATIONS constant for payload fragment planning. Use the constant in both SQLite and PostgreSQL chunk planners and document why the bounded fixpoint loop is sufficient.

Change dbutils_settings_get_value to parse text-backed integer settings with base 10 instead of base 0. This avoids surprising octal handling for values with leading zeroes while preserving the documented decimal byte values used by payload_max_chunk_size. Add a unit assertion that '010' reads back as 10.

marcobambini and others added 2 commits May 29, 2026 11:58

Add chunked payload transport

f1ace81

andinux added 6 commits June 1, 2026 18:55

chore: document payload chunk cursor reset

763e9eb

Remove the dead old_eof placeholder from payload_chunks_filter. Document the cursor layout contract around the memset-from-eof reset so future field moves preserve cursor-lifetime state and per-scan state ownership.

chore: name fragment sizing iteration cap

1cf4a4d

Introduce a shared CLOUDSYNC_PAYLOAD_FRAGMENT_SIZE_FIXPOINT_ITERATIONS constant for payload fragment planning. Use the constant in both SQLite and PostgreSQL chunk planners and document why the bounded fixpoint loop is sufficient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add chunked payload generation and network transport#50

Add chunked payload generation and network transport#50
marcobambini wants to merge 8 commits into
mainfrom
codex/chunked-payloads-network

marcobambini commented May 29, 2026 •

edited

Loading

Uh oh!

andinux commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcobambini commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Compatibility

Companion backend PR

Testing

Uh oh!

andinux commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update — commit 92a048c

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcobambini commented May 29, 2026 •

edited

Loading

andinux commented May 30, 2026 •

edited

Loading

Update — commit `92a048c`