feat: drone swarm telemetry tuning — configurable timeouts, queue capacity, sync fix, shutdown() by thewoodfish · Pull Request #59 · algorealmInc/SwarmNL

thewoodfish · 2026-04-25T09:06:08Z

Summary

Four general-purpose improvements extracted from the ds-swarm drone-swarm integration, each a good upstream candidate.

Configurable recv_from_network polling (prelude.rs, mod.rs): replaces hardcoded NETWORK_READ_TIMEOUT = 30 s / TASK_SLEEP_DURATION = 3 s with CoreBuilder::with_network_timeout(max_polls, poll_interval_ms). Defaults are identical to the old behaviour (10 × 3 000 ms = 30 s). Also fixes the MutexGuard being held across the sleep call, which blocked other tasks from writing responses.
Configurable event queue capacity (prelude.rs, mod.rs): DataQueue previously had a hardcoded capacity of 300. Added DataQueue::with_capacity() and CoreBuilder::with_event_queue_capacity(). Default unchanged.
Fix: ReplNetworkConfig::Custom sync_wait_time ignored (replication.rs): sync_with_eventual_consistency was always sleeping SYNC_WAIT_TIME (5 s) regardless of the user-supplied value. Now reads sync_wait_time from self.config, matching the pattern used for data_aging_period in the same function.
Core::shutdown() (mod.rs): stores AbortHandles for the two background tasks spawned by build() in a shared Arc<Mutex<Vec<AbortHandle>>>. Core::shutdown() aborts them, releasing the libp2p Swarm and its TCP/UDP listeners immediately — enabling port reuse in tests and graceful restarts. Tokio-runtime only, no behaviour change unless shutdown() is called.

Test plan

cargo check --features tokio-runtime passes (verified clean)
Confirm defaults are unchanged: with_network_timeout not called → 30 s ceiling as before
Confirm with_event_queue_capacity not called → queue capacity 300 as before
Confirm ReplNetworkConfig::Default → sync_wait_time still 5 s
Call core.shutdown().await after GossipsubExitNetwork and verify port is released for re-use

🤖 Generated with Claude Code

…figurable Three targeted fixes for high-frequency telemetry workloads (drone swarm at 200 ms gossip interval, 50 nodes): - CoreBuilder::with_network_timeout(max_polls, poll_interval_ms): replaces the hardcoded 3 s poll sleep and 10-retry limit in recv_from_network. Defaults are unchanged (10 × 3 000 ms = 30 s). Also fixes the MutexGuard being held across the sleep call, which blocked concurrent response writers. - CoreBuilder::with_event_queue_capacity(capacity): DataQueue capacity was a hardcoded 300-element constant. It is now a per-instance runtime value with the same default. - Fix: ReplNetworkConfig::Custom sync_wait_time was stored but never read by the eventual-consistency background loop, which always slept for the constant SYNC_WAIT_TIME (5 s). The configured value is now honoured. See CHANGES_FOR_DRONE_SWARM.md for full rationale and upstream PR guidance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…listeners Stores AbortHandles for the two tokio tasks spawned by build() in a shared Arc<Mutex<Vec<AbortHandle>>> on Core. Core::shutdown() aborts all handles, releasing the libp2p Swarm (and its TCP/UDP listeners) immediately — enabling port reuse in tests and graceful drone restarts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sacha-l · 2026-04-27T13:07:37Z

@thewoodfish - can you refer to the specific issues this closes where relevant?

thewoodfish · 2026-05-02T01:40:59Z

Closes issue #57 — "Core::recv_from_network polls with hardcoded 3s TASK_SLEEP_DURATION — every RPC has a 3s floor"

Added CoreBuilder::with_network_timeout(max_polls, poll_interval_ms) to replace the hardcoded 3 s/30 s polling constants.
Bonus fix: the MutexGuard on stream_response_buffer was held across the sleep call (blocking concurrent response writers) — that's also
fixed.

Partially addresses #50 — "Gossipsub mesh takes ~5 s to form; broadcasts before then silently fail" - Not a direct fix for mesh formation, but the tunable poll interval means callers are no longer forced to wait up to 30 s for a response — reducing the symptom window.

Silent bug fix (no issue filed) — ReplNetworkConfig::Custom { sync_wait_time } was stored but the eventual consistency loop always slept 5 s regardless. Now correctly honored.

thewoodfish and others added 2 commits April 25, 2026 02:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: drone swarm telemetry tuning — configurable timeouts, queue capacity, sync fix, shutdown()#59

feat: drone swarm telemetry tuning — configurable timeouts, queue capacity, sync fix, shutdown()#59
thewoodfish wants to merge 2 commits into
mainfrom
feature/drone-swarm-telemetry-tuning

thewoodfish commented Apr 25, 2026

Uh oh!

sacha-l commented Apr 27, 2026

Uh oh!

thewoodfish commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thewoodfish commented Apr 25, 2026

Summary

Test plan

Uh oh!

sacha-l commented Apr 27, 2026

Uh oh!

thewoodfish commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants