feat: IPNS resolution improvement with Someguy sidecar and latency metrics#284
Conversation
Entire-Checkpoint: ebb8758497ee
Entire-Checkpoint: 0749171a608b
Entire-Checkpoint: 7ea5e67a2d6b
Entire-Checkpoint: 0562d2d9fb3c
Entire-Checkpoint: 73574a9ffaa7
- ghcr.io/ipfs/someguy:v0.11.1 with standard DHT mode - Listen on 0.0.0.0:8190 for Docker inter-container networking - Health check on /version endpoint (no /health available) - 768M memory limit, 0.5 CPU limit - No exposed host ports, no depends_on from api service Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 1bce7ad51f1c
- deploy-staging.yml: DELEGATED_ROUTING_URL=http://someguy:8190 - .env.example: document Someguy as recommended routing provider - Legacy delegated-ipfs.dev noted for reference - TEE worker config unchanged Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 7a7fafa49724
- Add ipnsResolveDuration histogram with source labels (network/db_cache/network_stale) - Add ipnsPublishDuration histogram with outcome labels (success/error/timeout) - Buckets tuned for IPNS operation latencies (50ms-30s resolve, 100ms-60s publish) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 19-01-SUMMARY.md with execution results - STATE.md updated with plan progress and metrics - ROADMAP.md updated with phase 19 progress - REQUIREMENTS.md: IPNS-01, IPNS-02, IPNS-03 marked complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Inject MetricsService into IpnsService constructor - Instrument resolveRecord() with timing: source=network|db_cache|network_stale - Instrument publishRecord() with timing: outcome=success|error|timeout - Skip histogram observation for null resolve results (not found) - Add MetricsService mock to unit, integration, and security test suites - Add 9 new test cases verifying histogram observation behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 6dec8936738b
- Create 19-02-SUMMARY.md with plan execution results - Update STATE.md: Phase 19 complete (2/2 plans) - Update ROADMAP.md: Phase 19 progress to Complete - Mark IPNS-04 requirement complete in REQUIREMENTS.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 8fedcf15782d
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 4f802c5c8dea
Entire-Checkpoint: 8b67e0b7f8ce
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughReplaces delegated-ipfs.dev with a self-hosted Someguy sidecar in staging (docker-compose + deploy workflow + docs) and adds Prometheus histograms and instrumentation for IPNS resolve/publish in MetricsService and IpnsService, plus corresponding test updates and Phase 19 planning docs. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Api as CipherBox API (IpnsService)
participant Someguy as Someguy (delegated routing)
participant DB as DB Cache
participant Metrics as MetricsService
Client->>Api: request resolveRecord(key)
Api->>Metrics: start timer (ipfsIpnsDuration)
Api->>Someguy: delegated routing resolve (network)
alt network returns fresh
Someguy-->>Api: network record
Api->>DB: optionally compare/store
Api->>Metrics: observe ipnsResolveDuration(source: network, outcome: success)
Api-->>Client: record
else network stale or bad gateway
Someguy-->>Api: error / stale result
Api->>DB: read cached record
alt DB usable and fresher
DB-->>Api: db record (fresher)
Api->>Metrics: observe ipnsResolveDuration(source: network_stale, outcome: success)
Api-->>Client: record (from DB)
else DB fallback used or not found
DB-->>Api: not found / error
Api->>Metrics: observe ipnsResolveDuration(source: db_cache, outcome: error)
Api-->>Client: not found/error
end
end
sequenceDiagram
actor Client
participant Api as CipherBox API (IpnsService)
participant Someguy as Someguy (delegated routing)
participant DB as DB
participant Metrics as MetricsService
Client->>Api: request publishRecord(payload)
Api->>Metrics: start timer (ipfsIpnsDuration)
Api->>DB: write local DB
Api->>Someguy: delegated routing publish
alt publish success
Someguy-->>Api: success
Api->>Metrics: observe ipnsPublishDuration(outcome: success)
Api-->>Client: success
else publish error
Someguy-->>Api: error (maybe AbortError)
Api->>Metrics: observe ipnsPublishDuration(outcome: error/timeout)
Api-->>Client: success (DB write) or error depending on path
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #284 +/- ##
==========================================
+ Coverage 48.13% 48.32% +0.18%
==========================================
Files 109 110 +1
Lines 8387 9097 +710
Branches 652 653 +1
==========================================
+ Hits 4037 4396 +359
- Misses 4177 4528 +351
Partials 173 173
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR improves IPNS reliability in staging by introducing a self-hosted delegated routing sidecar (Someguy) and adds IPNS resolve/publish latency histograms to the API for observability.
Changes:
- Add a
someguycontainer to the staging Docker Compose stack and update staging deployment env to route delegated routing traffic to it. - Add Prometheus histograms for IPNS resolve/publish latency and instrument
IpnsServiceto observe them. - Update Jest test suites and planning docs/state artifacts for Phase 19 execution.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| docker/docker-compose.staging.yml | Adds the Someguy delegated routing sidecar service to the staging stack. |
| .github/workflows/deploy-staging.yml | Switches DELEGATED_ROUTING_URL in staging deploy to the in-network Someguy URL. |
| apps/api/.env.example | Documents recommended delegated routing URLs (Someguy for staging/prod, mock for local). |
| apps/api/src/metrics/metrics.service.ts | Registers new cipherbox_ipns_*_duration_seconds histograms in the central MetricsService registry. |
| apps/api/src/ipns/ipns.service.ts | Instruments publish/resolve paths with hrtime-based histogram observations. |
| apps/api/src/ipns/ipns.service.spec.ts | Extends MetricsService mocking and adds unit tests for histogram observation behavior. |
| apps/api/src/ipns/tests/ipns.security.spec.ts | Updates MetricsService mock to include new histogram hooks. |
| apps/api/src/ipns/tests/ipns.integration.spec.ts | Updates MetricsService mocks to include new histogram hooks across integration scenarios. |
| .planning/phases/19-ipns-resolution-improvement/19-CONTEXT.md | Adds Phase 19 context and locked decisions for the work. |
| .planning/phases/19-ipns-resolution-improvement/19-RESEARCH.md | Captures research/background on Someguy and delegated routing tradeoffs. |
| .planning/phases/19-ipns-resolution-improvement/19-SCOPING_RATIONALE.md | Documents key scoping decisions (e.g., network-first vs DB-first). |
| .planning/phases/19-ipns-resolution-improvement/19-VALIDATION.md | Adds validation strategy/checklist for Phase 19. |
| .planning/phases/19-ipns-resolution-improvement/19-VERIFICATION.md | Adds verification report for Phase 19 must-haves and artifacts. |
| .planning/phases/19-ipns-resolution-improvement/19-01-PLAN.md | Adds execution plan for deploying Someguy + staging config updates. |
| .planning/phases/19-ipns-resolution-improvement/19-01-SUMMARY.md | Adds summary of Phase 19 plan 01 execution/results. |
| .planning/phases/19-ipns-resolution-improvement/19-02-PLAN.md | Adds execution plan for IPNS latency metrics instrumentation. |
| .planning/phases/19-ipns-resolution-improvement/19-02-SUMMARY.md | Adds summary of Phase 19 plan 02 execution/results. |
| .planning/STATE.md | Updates overall planning state to reflect Phase 19 completion. |
| .planning/ROADMAP.md | Marks Phase 19 as complete and updates roadmap progress sections. |
| .planning/REQUIREMENTS.md | Marks IPNS requirements complete and updates traceability table. |
You can also share your feedback on Copilot code review. Take the survey.
…uting Replace hardcoded delegated-ipfs.dev references with generic "delegated routing service" in code comments and API descriptions, and update planning docs to reflect the self-hosted Someguy sidecar on staging. Production and recovery tool references are preserved since they still use the public endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: b943eb7bcab5
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/api/src/ipns/ipns.service.spec.ts (1)
947-961:⚠️ Potential issue | 🟡 MinorThis DB-failure test is asserting the wrong boundary.
In
apps/api/src/ipns/ipns.service.ts,ipnsPublishDurationonly starts afterupsertFolderIpns()succeeds. A repository save failure never reachesdelegatedRouting.publish(), so this case should assert thatipnsPublishDuration.observewas not called; otherwise the test name suggests provider latency includes DB work.Suggested fix
- it('should observe publish duration with error label on DB failure', async () => { + it('should not observe publish duration when DB write fails before delegated publish', async () => { mockFolderIpnsRepo.findOne.mockResolvedValue(null); mockFolderIpnsRepo.create.mockReturnValue({ ...mockFolderEntity, sequenceNumber: '1' }); mockFolderIpnsRepo.save.mockRejectedValue(new Error('DB write error')); @@ ).rejects.toThrow('DB write error'); expect(mockEndTimer).toHaveBeenCalledWith({ result: 'error' }); + expect(mockMetricsService.ipnsPublishDuration.observe).not.toHaveBeenCalled(); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/api/src/ipns/ipns.service.spec.ts` around lines 947 - 961, The test currently expects the publish-duration timer to be ended on DB save failure, but since ipnsPublishDuration is only started after upsertFolderIpns() succeeds, update the assertion so the timer is not observed/ended when mockFolderIpnsRepo.save rejects; specifically change the final expectation from calling mockEndTimer with an error to asserting that the ipnsPublishDuration mock (or mockEndTimer) was not called (e.g., expect(mockEndTimer).not.toHaveBeenCalled() or expect(ipnsPublishDuration.observe).not.toHaveBeenCalled()), and optionally adjust the test name to reflect that the publish duration is not recorded on DB failure.
🧹 Nitpick comments (1)
docker/docker-compose.staging.yml (1)
87-109: Consider addingsomeguytoapiservice dependencies.The
apiservice usesDELEGATED_ROUTING_URL=http://someguy:8190but doesn't declare a dependency onsomeguy. If the API starts before Someguy is healthy, IPNS operations may fail during startup.💡 Suggested fix
services: api: image: ghcr.io/${GITHUB_REPOSITORY_OWNER:-OWNER}/cipherbox-api:${TAG:-latest} restart: unless-stopped env_file: .env.staging ports: - '127.0.0.1:3000:3000' logging: *default-logging depends_on: postgres: condition: service_healthy redis: condition: service_healthy + someguy: + condition: service_healthy🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/docker-compose.staging.yml` around lines 87 - 109, The api service starts without depending on someguy, yet it uses DELEGATED_ROUTING_URL=http://someguy:8190; add a dependency so api waits for someguy to be healthy: update the api service definition to include depends_on referencing someguy (use the service_healthy condition if your compose version supports it) so Docker waits for someguy’s healthcheck before starting api; reference the api service block and the someguy service (which has a healthcheck) and add the depends_on entry accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/api/src/ipns/ipns.service.ts`:
- Around line 299-301: The resolveFound flag is currently set to true and can
cause ipnsResolveDuration to record latency for failed/exceptional resolves;
change the initialization of resolveFound to false and only set it to true at
the point where a successful non-null resolve is confirmed (after the database
findOne() call and any BigInt(...) conversions succeed and just before returning
the resolved value). Update the finally block to only call
ipnsResolveDuration.observe(...) when resolveFound is true so thrown errors or
early returns do not record latency; ensure the same change is applied to the
other mirror locations around lines 365–375 where resolves and observations
occur.
---
Outside diff comments:
In `@apps/api/src/ipns/ipns.service.spec.ts`:
- Around line 947-961: The test currently expects the publish-duration timer to
be ended on DB save failure, but since ipnsPublishDuration is only started after
upsertFolderIpns() succeeds, update the assertion so the timer is not
observed/ended when mockFolderIpnsRepo.save rejects; specifically change the
final expectation from calling mockEndTimer with an error to asserting that the
ipnsPublishDuration mock (or mockEndTimer) was not called (e.g.,
expect(mockEndTimer).not.toHaveBeenCalled() or
expect(ipnsPublishDuration.observe).not.toHaveBeenCalled()), and optionally
adjust the test name to reflect that the publish duration is not recorded on DB
failure.
---
Nitpick comments:
In `@docker/docker-compose.staging.yml`:
- Around line 87-109: The api service starts without depending on someguy, yet
it uses DELEGATED_ROUTING_URL=http://someguy:8190; add a dependency so api waits
for someguy to be healthy: update the api service definition to include
depends_on referencing someguy (use the service_healthy condition if your
compose version supports it) so Docker waits for someguy’s healthcheck before
starting api; reference the api service block and the someguy service (which has
a healthcheck) and add the depends_on entry accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cf89c5a6-a2c9-4823-95e6-f2ef523ef495
📒 Files selected for processing (20)
.github/workflows/deploy-staging.yml.planning/REQUIREMENTS.md.planning/ROADMAP.md.planning/STATE.md.planning/phases/19-ipns-resolution-improvement/19-01-PLAN.md.planning/phases/19-ipns-resolution-improvement/19-01-SUMMARY.md.planning/phases/19-ipns-resolution-improvement/19-02-PLAN.md.planning/phases/19-ipns-resolution-improvement/19-02-SUMMARY.md.planning/phases/19-ipns-resolution-improvement/19-CONTEXT.md.planning/phases/19-ipns-resolution-improvement/19-RESEARCH.md.planning/phases/19-ipns-resolution-improvement/19-SCOPING_RATIONALE.md.planning/phases/19-ipns-resolution-improvement/19-VALIDATION.md.planning/phases/19-ipns-resolution-improvement/19-VERIFICATION.mdapps/api/.env.exampleapps/api/src/ipns/__tests__/ipns.integration.spec.tsapps/api/src/ipns/__tests__/ipns.security.spec.tsapps/api/src/ipns/ipns.service.spec.tsapps/api/src/ipns/ipns.service.tsapps/api/src/metrics/metrics.service.tsdocker/docker-compose.staging.yml
…nsistency - Fix resolveFound initialization (false by default, true only on successful resolve paths) to prevent error cases from skewing the ipnsResolveDuration histogram - Add outcome label to ipnsResolveDuration histogram for parity with ipnsPublishDuration - Fix ROADMAP.md progress table column shift for Phase 19 - Check completed plan boxes in ROADMAP.md Phase 19 section - Uncheck IPNS-02/03 requirements (DB-first resolution and recovery tool changes not yet delivered in Phase 19) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 02c46a4c0e6f
…ailure Rename test and add assertion that ipnsPublishDuration.observe is not called when DB save fails before reaching the delegated routing publish step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 5ba9a53d7a12
|
Addressing remaining CodeRabbit review items: Outside-diff comment (ipns.service.spec.ts:947-961): Fixed in 6fa1054. Renamed test to "should not observe ipnsPublishDuration when DB write fails before delegated publish" and added Nitpick (docker-compose.staging.yml — add |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.planning/codebase/ARCHITECTURE.md:
- Around line 56-57: The architecture doc currently lists the delegated-routing
service under the TEE layer; move that dependency to the Backend API layer and
update wording to reflect that DelegatedRoutingClient (instantiated in
delegated-routing.module.ts and injected into republish.service.ts and
ipns.service.ts) is part of the Backend API, not tee-worker; also note that
tee-worker only handles decrypting/signing IPNS records and that the backend
scheduler calls delegatedRouting.publish() after receiving the signed record.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fa5c7090-7234-4ba3-9cad-70adf4aaac94
📒 Files selected for processing (16)
.planning/ENVIRONMENTS.md.planning/REQUIREMENTS.md.planning/ROADMAP.md.planning/codebase/ARCHITECTURE.md.planning/codebase/CONCERNS.md.planning/codebase/INTEGRATIONS.md.planning/codebase/STACK.mdapps/api/src/ipns/ipns.controller.tsapps/api/src/ipns/ipns.service.spec.tsapps/api/src/ipns/ipns.service.tsapps/api/src/metrics/metrics.service.tsapps/desktop/src-tauri/src/api/ipns.rsapps/web/src/api/ipns/ipns.tsapps/web/src/services/ipns.service.tspackages/api-client/openapi.jsontests/e2e/utils/conflict-helpers.ts
✅ Files skipped from review due to trivial changes (6)
- apps/web/src/services/ipns.service.ts
- .planning/codebase/INTEGRATIONS.md
- packages/api-client/openapi.json
- apps/web/src/api/ipns/ipns.ts
- tests/e2e/utils/conflict-helpers.ts
- apps/desktop/src-tauri/src/api/ipns.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- apps/api/src/metrics/metrics.service.ts
- .planning/ROADMAP.md
| - Depends on: Backend republish schedule, delegated routing service (Someguy on staging, delegated-ipfs.dev on production) | ||
| - Used by: Backend cron jobs (every 6 hours) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- delegated-routing references in tee-worker and api ---'
rg -n -C3 'DELEGATED_ROUTING_URL|delegatedRouting|DelegatedRouting|someguy|delegated-ipfs\.dev' tee-worker apps/api/src
echo
echo '--- publish / republish ownership hints ---'
rg -n -C3 'publishRecord|publishBatch|republish|sign' tee-worker apps/api/srcRepository: FSM1/cipher-box
Length of output: 50372
Move delegated-routing routing dependency under Backend API layer, not TEE layer.
The DelegatedRoutingClient is instantiated in apps/api/src/ipns/delegated-routing.module.ts and injected into apps/api/src/republish/republish.service.ts (line 36) and apps/api/src/ipns/ipns.service.ts (line 33). The TEE layer (tee-worker) does not reference delegated routing directly; it only decrypts and signs IPNS records. The backend scheduler invokes delegatedRouting.publish() after receiving the signed record from TEE. Relocate the "delegated routing service" dependency to the Backend API section of the architecture document.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/codebase/ARCHITECTURE.md around lines 56 - 57, The architecture
doc currently lists the delegated-routing service under the TEE layer; move that
dependency to the Backend API layer and update wording to reflect that
DelegatedRoutingClient (instantiated in delegated-routing.module.ts and injected
into republish.service.ts and ipns.service.ts) is part of the Backend API, not
tee-worker; also note that tee-worker only handles decrypting/signing IPNS
records and that the backend scheduler calls delegatedRouting.publish() after
receiving the signed record.
- Remove unreachable AbortError timeout classification in publishRecord (DelegatedRoutingClient wraps all errors into HttpException BAD_GATEWAY) - Update test to match actual error behavior - Fix IPNS-02/03 traceability table to show Pending (matches unchecked checklist) - Remove stale duplicate import warnings from verification report - Update verification evidence to reflect outcome label on resolve histogram - Add URL schemes to DELEGATED_ROUTING_URL in environment matrix Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 161c4a1bf1af
Summary
DELEGATED_ROUTING_URLat it, eliminating reliance on the publicdelegated-ipfs.devendpoint that was causing 502 errors on IPNS resolutionIpnsServicewith Prometheus histograms (ipns_resolve_duration_seconds,ipns_publish_duration_seconds) broken down by outcome (success/failure/timeout) for observability into IPNS performanceTest plan
delegated-ipfs.dev/metricsendpoint exposes the newipns_resolve_duration_secondsandipns_publish_duration_secondshistograms🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores