Skip to content

feat: token cost tracking, data retention, domain verification & refactors#70

Open
duyetbot wants to merge 12 commits intoduyet:mainfrom
duyetbot:claude/feat-token-cost-tracking
Open

feat: token cost tracking, data retention, domain verification & refactors#70
duyetbot wants to merge 12 commits intoduyet:mainfrom
duyetbot:claude/feat-token-cost-tracking

Conversation

@duyetbot
Copy link
Copy Markdown
Contributor

@duyetbot duyetbot commented Mar 30, 2026

Summary

  • Token cost tracking — Schema fields (model, input_tokens, output_tokens, cost_microdollars) on messages, aggregated totals on conversations, V2 API support, analytics breakdowns
  • Data retention — Per-project retention_days setting with scheduled cleanup via Cloudflare Cron Trigger
  • Domain verification — Actual DNS-based domain verification logic (TXT record check)
  • Dashboard UI — Token cost columns, analytics cards, retention settings, data-tab component split
  • Refactors — Bulk operations extracted to service layer, hero SVG factories simplified, cascade delete fixes

Changes (50 files, +1438/-427)

Schema & Migrations

  • 0005_token_cost_tracking.sql — Message token/cost fields, conversation aggregates
  • 0006_retention_days.sql — Project retention_days column

API

  • Token cost fields on message creation/conversation responses
  • Retention service with scheduled cleanup
  • Domain verification service (DNS TXT check)
  • Bulk operations moved to services/bulk.ts
  • Analytics: cost breakdown by model

Dashboard

  • Token cost columns in conversation tables
  • Analytics: cost cards + time series
  • Retention settings per project
  • Data tab split into sub-components

Infra

  • esbuild + rollup updates for test runner
  • Cron trigger config for retention cleanup

Test plan

  • bunx biome check packages/api/src/ — lint clean
  • bunx tsc --noEmit -p packages/api/tsconfig.json — types clean
  • cd packages/api && bunx vitest run — all tests pass
  • cd packages/dashboard && bun run build — dashboard builds
  • Verify token cost fields appear in V2 conversation responses
  • Verify retention cleanup runs on schedule
  • Verify domain verification checks DNS TXT records

🤖 Generated with Claude Code

Co-Authored-By: Duyet Le me@duyet.net
Co-Authored-By: duyetbot bot@duyet.net
Co-Authored-By: Paperclip noreply@paperclip.ing

Summary by CodeRabbit

  • New Features

    • Token & cost tracking for messages and conversations; surfaced in APIs, analytics, and SDK types
    • Project-level data retention with a dashboard settings UI and scheduled cleanup
    • Embeddings & semantic search endpoints with background embedding on message create
    • Bulk export/delete improvements and domain verification methods (DNS TXT / HTTP / meta tag)
    • Dashboard: cost column, charts, and formatted cost display
  • Tests

    • Added retention and token/cost tracking tests

duyetbot and others added 8 commits March 27, 2026 04:22
Add per-message cost fields (model, input_tokens, output_tokens,
cost_microdollars) and conversation-level denormalized aggregates
(total_cost_microdollars, total_tokens). All fields are optional
for full backward compatibility.

- Drizzle migration 0005 adds 6 new columns
- V1 and V2 message creation accepts cost fields
- Conversation aggregates auto-increment on message insert
- Analytics summary includes total_cost_microdollars
- Timeseries supports new "cost" metric
- Tests cover create, append, and backward compat scenarios

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…analytics

Add cost columns to conversation data tables (project + conversations pages),
cost summary card to analytics dashboard, and cost trend chart. Shared
formatCostMicrodollars utility handles spec-compliant formatting (<$0.01,
$X.XX, $XXX). Updates shared types to include cost fields across the API
contract.

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace auto-verify stub with real verification using three methods
run in parallel: DNS TXT record via Cloudflare DoH, HTTP well-known
file check, and HTML meta tag parsing. First success verifies the
domain; all failures marks it as failed.

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
…anup

Projects can now set retention_days (1-3650) to auto-delete expired conversations.
A Cloudflare Cron Trigger runs daily at 3 AM UTC, batch-deleting in order
tags → messages → conversations to respect FK constraints. 25-second time
budget prevents Worker timeout. Dashboard settings UI with validation and
confirmation workflow.

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Move bulk delete and export logic from routes into dedicated service file.
Fix FK constraint order: conversation_tags → messages → conversations.
Export uses batch queries to avoid N+1, selects only relevant fields.

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Extract conversations-header and conversations-content into separate files
for better separation of concerns and reusability.

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Consolidate repeated node patterns into rectNode/dbNode helpers and
organize path constants. No visual changes.

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Fix version mismatch between esbuild host (0.27.4) and binary (0.19.12)
that was preventing vitest from starting.

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @duyetbot, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

Adds per-message token/cost columns and conversation aggregates, project retention with scheduled cleanup, domain verification with real network checks, bulk export/delete services, semantic search/embeddings, dashboard UI for cost/retention, shared type updates, and tests for retention and token-cost flows.

Changes

Cohort / File(s) Summary
Database migrations & test DDL
packages/api/drizzle/0005_token_cost_tracking.sql, packages/api/drizzle/0006_retention_days.sql, packages/api/drizzle/meta/_journal.json, packages/api/test/setup.ts
Add per-message cost/token columns, conversation aggregates, and projects.retention_days; update journal and test schema.
DB schema typings
packages/api/src/db/schema.ts
Expose new columns in Drizzle schema: messages (model, input_tokens, output_tokens, cost_microdollars), conversations (total_cost_microdollars, total_tokens), projects (retentionDays).
Retention service & scheduled handler
packages/api/src/services/retention.ts, packages/api/src/scheduled.ts, packages/api/wrangler.jsonc, packages/api/src/index.ts
New cleanupExpiredConversations with batching/time budget; onScheduled exported and wired to worker export and cron trigger.
Message/conversation accounting
packages/api/src/services/messages.ts, packages/api/src/services/conversations.ts, packages/api/src/services/v2-conversations.ts, packages/api/src/services/v2-conversations.ts
Persist per-message model/tokens/costs; compute/persist conversation totals; update insert/select/update flows and types accordingly.
Bulk operations & routes
packages/api/src/services/bulk.ts, packages/api/src/routes/conversations/bulk.ts
New bulkDeleteConversations and exportConversations services; routes refactored to call these helpers.
Semantic search & embeddings
packages/api/src/services/embeddings.ts, packages/api/src/routes/v2/conversations/semantic.ts, packages/api/src/types.ts, packages/api/src/routes/v2/conversations/messages.ts
New embedding generation, upsert/query/delete helpers; new semantic search and context endpoints; message append triggers embed/upsert when VECTORIZE_INDEX present; add VECTORIZE_INDEX binding.
Domain verification
packages/api/src/lib/domain-verification.ts, packages/api/src/services/domains.ts
New verifiers (dns_txt, http_file, meta_tag) with 5s timeouts and safety checks; domains service uses checkDomainVerification and conditionally persists verification state.
API wiring & dev tooling
packages/api/src/index.ts, packages/api/package.json, packages/api/wrangler.jsonc
Default export changed to { fetch, scheduled }; added dev deps and wrangler triggers/vectorize binding.
Serialization, validation & helpers
packages/api/src/lib/serialization.ts, packages/api/src/lib/validation.ts, packages/api/src/lib/helpers.ts
Deserialize/validate new message and conversation fields; mapConversationToResponse returns aggregates.
Analytics & dashboard UI
packages/api/src/routes/analytics.ts, packages/api/src/services/analytics.ts, packages/dashboard/src/components/analytics/*, packages/dashboard/src/lib/format-cost.ts, packages/dashboard/src/lib/constants.ts
Add cost metric, project total cost and cost-per-day series; chart formatter, color, and UI cards/area-chart formatting.
Dashboard — conversations & project settings
packages/dashboard/src/app/dashboard/conversations/*, packages/dashboard/src/app/dashboard/project/*, packages/dashboard/src/app/dashboard/project/data-tab/*
Add cost/total_tokens columns, cell renderers, default columns; add _RetentionSettings component and wire update callback; refactor data-tab into header/content.
Shared types
packages/shared/src/index.ts
Extend shared interfaces: message fields (model, input_tokens, output_tokens, cost_microdollars), conversation aggregates, project.retention_days, analytics cost series & totals, plus new semantic/context types.
Tests
packages/api/test/retention.test.ts, packages/api/test/v2-conversations.test.ts
Add retention cleanup test suite; expand v2 conversation tests for per-message cost/token fields and aggregates.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant DomainsService as Domains Service
    participant Verifier as Verification Module
    participant DNS as DNS-over-HTTPS
    participant HTTP as Origin Server
    participant Site as Site HTML

    Client->>DomainsService: verifyDomain(domain, token)
    DomainsService->>Verifier: checkDomainVerification(domain, token)

    par DNS TXT check
        Verifier->>DNS: fetch TXT _agentstate.{domain} (timeout 5s)
        DNS-->>Verifier: TXT value / error
    and HTTP file check
        Verifier->>HTTP: GET https://{domain}/.well-known/agentstate-{token} (no redirects, timeout 5s)
        HTTP-->>Verifier: body / error
    and Meta tag check
        Verifier->>Site: GET https://{domain}/ (no redirects, timeout 5s)
        Site-->>Verifier: HTML / error
    end

    Verifier-->>DomainsService: { verified: bool, results: [...] }
    DomainsService->>DB: conditional UPDATE (only if status in ["pending","failed"])
    DomainsService-->>Client: persisted verification status & verified_at
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hopped through tables, tokens, and charts,

counted tiny costs and very small parts.
I checked domains, then swept away old threads,
updated dashboards and tucked data to beds.
A cozy burrow where tidy numbers rest.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.90% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: token cost tracking, data retention, domain verification, and refactors are all well-represented in the file summaries and constitute the primary scope of this changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces token cost tracking and automated data retention features. It adds database columns for tracking model usage, input/output tokens, and costs at both message and conversation levels. A new scheduled worker task handles the cleanup of expired conversations based on project-specific retention settings. Additionally, the PR implements domain ownership verification via DNS TXT records, HTTP files, and HTML meta tags. Feedback was provided to simplify the meta tag regex logic by combining duplicated patterns.

Comment on lines +145 to +164
const pattern =
/<meta\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["']\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["'][^>]*>/i;
const match = html.match(pattern);

if (!match) {
// Try reversed attribute order: content before name
const reversedPattern =
/<meta\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["']\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["'][^>]*>/i;
const reversedMatch = html.match(reversedPattern);

if (!reversedMatch) {
return { method: "meta_tag", success: false, error: "Meta tag not found" };
}

const success = reversedMatch[1] === expectedToken;
return { method: "meta_tag", success, error: success ? undefined : "Token mismatch in meta tag" };
}

const success = match[1] === expectedToken;
return { method: "meta_tag", success, error: success ? undefined : "Token mismatch in meta tag" };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for matching the meta tag with attributes in different orders is duplicated. You can combine the two regular expressions into one using an OR | operator and check for the captured content in either of the capture groups. This simplifies the code and makes it more maintainable.

    const pattern = /<meta\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["']\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["'][^>]*>|<meta\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["']\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["'][^>]*>/i;
    const match = html.match(pattern);

    if (!match) {
      return { method: "meta_tag", success: false, error: "Meta tag not found" };
    }

    const content = match[1] || match[2];
    const success = content === expectedToken;
    return { method: "meta_tag", success, error: success ? undefined : "Token mismatch in meta tag" };

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/dashboard/src/app/dashboard/project/_data-tab.tsx (1)

40-45: ⚠️ Potential issue | 🔴 Critical

Fix the type mismatch causing the pipeline failure.

The { key: "expand", label: "" } object literal has its key inferred as string rather than ColumnKey, causing the TypeScript error shown in the pipeline. Use as const to narrow the literal type.

🔧 Proposed fix
   const columns = [
-    { key: "expand", label: "" },
+    { key: "expand" as const, label: "" },
     ...allColumns
       .filter((c) => visibleCols.includes(c.key))
       .map((col) => ({ key: col.key, label: col.label })),
   ];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/dashboard/src/app/dashboard/project/_data-tab.tsx` around lines 40 -
45, The pipeline fails because the literal { key: "expand", label: "" } is
inferred as a plain string key instead of the ColumnKey union; update the
literal so its key is narrowed to ColumnKey—e.g. replace the inline object with
a typed narrow form like { key: "expand" as ColumnKey, label: "" } (or use {
key: "expand", label: "" } as const and then cast to the expected Column type)
before spreading with allColumns/visibleCols in the columns array to fix the
type mismatch in columns.
packages/api/src/services/v2-projects.ts (1)

382-403: ⚠️ Potential issue | 🟠 Major

Preserve an explicit null in the update response.

If the caller clears retention with retention_days: null, Line 403 falls back to existing.retentionDays because ?? treats the persisted null as missing. The update succeeds, but the response reports the stale value.

Suggested fix
-    retention_days: updated?.retentionDays ?? existing.retentionDays ?? null,
+    retention_days:
+      updated?.retentionDays === null
+        ? null
+        : updated?.retentionDays ?? existing.retentionDays ?? null,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/services/v2-projects.ts` around lines 382 - 403, The
response currently uses the nullish coalescing operator which treats a persisted
null as missing; update the retention_days expression to detect undefined rather
than null. Replace the existing expression using updated?.retentionDays ??
existing.retentionDays ?? null with a conditional that checks for undefined,
e.g. retention_days: updated?.retentionDays !== undefined ?
updated.retentionDays : existing.retentionDays ?? null, so an explicit null from
updated is preserved in the response (referencing the variables updated,
existing, and retentionDays).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/api/drizzle/0006_retention_days.sql`:
- Line 1: The migration adds projects.retention_days without a constraint,
allowing values outside 1–3650; modify the migration to create/alter the
projects table to include a CHECK constraint enforcing retention_days BETWEEN 1
AND 3650 (e.g., add CHECK (retention_days >= 1 AND retention_days <= 3650) with
a clear constraint name like chk_projects_retention_days_range), and handle
existing NULLs/invalid rows beforehand (backfill or set a safe default) so the
constraint can be applied without failing.

In `@packages/api/drizzle/meta/_journal.json`:
- Around line 41-45: The migration journal (_journal.json) is missing an entry
for the new migration 0006_retention_days referenced by 0006_retention_days.sql;
update the journal so SQL and metadata stay in sync by regenerating migrations
metadata from the packages/api/drizzle/ directory or manually add an entry with
idx 6 and tag "0006_retention_days" (matching the existing shape that includes
version, when, tag, and breakpoints similar to the existing
"0005_token_cost_tracking" entry).

In `@packages/api/package.json`:
- Around line 22-24: Replace the platform-specific devDependency
"@rollup/rollup-linux-x64-gnu" with the generic "rollup" package in package.json
devDependencies: remove the "@rollup/rollup-linux-x64-gnu" entry and add
"rollup" (e.g. version "^4.60.0") so Rollup's platform-specific optional
binaries resolve correctly across macOS/Windows/Linux; ensure package.json keys
match "rollup" and run your package manager install to update lockfile.

In `@packages/api/src/db/schema.ts`:
- Line 34: The retention_days column currently defined as retentionDays:
integer("retention_days") needs a DB-level CHECK enforcing 1..3650 to prevent
invalid writes; update packages/api/src/db/schema.ts to attach a check
constraint to the column/table (e.g., a CHECK (retention_days BETWEEN 1 AND
3650) via the schema builder or column.check/constraint API) and add/emit a
corresponding migration so the database enforces the rule for all write paths;
ensure the constraint name is unique and reference retentionDays/retention_days
in the change.

In `@packages/api/src/lib/domain-verification.ts`:
- Around line 54-57: Add a hard guard in the verification entrypoints
(verifyDnsTxt, verifyTxtRecord, verifyHttpTxt, verifyHttpFile) to reject
single-label hosts and IP literals before performing any network ops: detect
hostnames without a dot or purely numeric IP strings and throw/return a
validation error; additionally perform a DNS resolution for A/AAAA records and
reject any resolved addresses that are private/reserved (RFC1918, RFC6598,
link-local, multicast, loopback, etc.) so the subsequent HTTP/DNS checks never
target non-public addresses. Ensure the checks run synchronously at the top of
each function (before HTTP fetches or TXT queries) and return the same
VerificationCheckResult/error shape used elsewhere.

In `@packages/api/src/routes/v2/projects/crud.ts`:
- Around line 35-38: The UpdateProjectSchema currently allows an empty object so
a PATCH with {} bypasses validation; update the schema (UpdateProjectSchema) to
require at least one present field by adding a refinement (e.g., .refine or
.superRefine) that checks Object.keys(data).length > 0 and returns a clear error
message like "At least one field is required", and also add a runtime guard
before calling updateProject that rejects empty payloads (check if
Object.keys(data).length === 0) to avoid passing a no-op update to
updateProject; reference the UpdateProjectSchema and the updateProject call to
locate where to apply these changes.

In `@packages/api/src/scheduled.ts`:
- Around line 5-21: Wrap the call to cleanupExpiredConversations inside
onScheduled in a try-catch: call drizzle(env.DB) and then try awaiting
cleanupExpiredConversations(db, env.DB); on success keep the existing structured
console.log; on error catch and emit a structured error log (include event.cron,
any inputs like env.DB identifier, and error.stack or message) using
console.error (or your app logger) for observability, then rethrow or return a
rejected promise so the runtime knows the job failed.

In `@packages/api/src/services/bulk.ts`:
- Around line 111-120: The fetch currently truncates combined messages with
.limit(5000) on the allMsgs query (convIds, messages, MESSAGE_EXPORT_FIELDS),
causing incomplete exports; remove the hard 5000 cap and implement proper paging
or explicit failure: either iterate over convIds and fetch messages per
conversation in paged queries (limit/offset or cursor) aggregating results and
accurate message_count, or run batched queries by convIds in slices and/or a
COUNT(*) per conversation to detect when total messages exceed an export-safe
threshold and return a clear error instead of partial data; apply the same fix
where the other message fetch block (lines referenced 138-149) uses a 5000
limit.
- Around line 46-63: CONVERSATION_EXPORT_FIELDS and MESSAGE_EXPORT_FIELDS are
currently omitting persisted columns (external_id, token_count, message.metadata
and the new accounting columns) and later replacements are emitting null/0,
which drops real data; update the field maps so each export key maps to the
actual DB column expressions (e.g., include conversations.externalId,
conversations.tokenCount, messages.metadata, messages.tokenCount and any
accounting columns by their column identifiers) instead of hardcoded null/0, and
remove the code paths that overwrite them with constants so the export
serializes the real stored values for CONVERSATION_EXPORT_FIELDS and
MESSAGE_EXPORT_FIELDS (also apply the same fixes where these constants are used
around the other blocks referenced).
- Line 3: The bulk delete sequence deletes messages and conversations
(references: messages, conversations) but never clears conversation_tags,
causing FK failures; update the deletion order to first remove conversation_tags
rows for the target conversation IDs (delete from conversation_tags where
conversation_id IN ...), then delete messages, then delete conversations,
ensuring the code that currently calls messages.delete and conversations.delete
is preceded by a conversation_tags deletion step (use the existing DB/table
symbol for conversation_tags).

In `@packages/api/src/services/domains.ts`:
- Around line 314-327: verifyDomain() currently updates customDomains by id
unconditionally which lets a slow failure overwrite a concurrent successful
verification; change the failure-path update to include the prior-state guard
(e.g., in the db.update(...).where(...) add an extra condition on
customDomains.verificationStatus to only update when still in the non-verified
expected state such as 'pending' or not 'verified'), and after performing the
update re-query the row (select from customDomains by domainId) and return the
freshly read row fields (verification_status / verified_at) instead of the
pre-read domain object to ensure the response reflects the final persisted
state.

In `@packages/api/src/services/retention.ts`:
- Around line 25-28: The function cleanupExpiredConversations currently declares
an unused parameter _d1Db of type D1Database; remove this parameter from the
function signature and all call sites (and any corresponding imports or type
references) to clean the interface, or if you intend to keep it for future use,
add a concise comment above the parameter or the function noting it's reserved
for future D1-specific operations (e.g., "// reserved for future D1Database
use") and keep the underscore to signal intentional unused status; update
tests/consumers accordingly to match the new signature.
- Around line 99-104: The results.push currently uses the global start variable
so durationMs measures cumulative time; before processing each project (the loop
handling project.id and computing totalDeleted and retentionDays) record a
per-project start timestamp (e.g., projectStart = Date.now()) and then set
durationMs to Date.now() - projectStart in the results.push call so each entry
reflects that project's elapsed processing time rather than the global start
time.

In `@packages/api/test/retention.test.ts`:
- Around line 88-92: The retention tests are relying on shared DB state seeded
in beforeAll (applyMigrations and seed) which makes cases order-dependent;
change the suite so each test is independent by resetting and reseeding the DB
in a beforeEach (call the same applyMigrations/seed helpers or use a
clearDB+seed helper) or by making each test create and use unique fixtures
within the test itself (so no test reads rows left by another). Update the
describe("Retention Service") suite to replace the shared beforeAll seed with
per-test setup/teardown (or explicit unique data creation) and remove
assumptions that rows from other tests exist (the tests that currently expect
leftover rows should instead create the exact rows they assert against).

In `@packages/dashboard/src/app/dashboard/project/_retention-settings.tsx`:
- Around line 25-32: handleSave currently converts retentionDays to a Number and
only checks range, which allows fractional days; update the validation in
handleSave to also require an integer by adding a Number.isInteger(value) check
(i.e., treat non-integer values as invalid) and show the same toast.error
message when the value is not an integer, null/empty remains allowed as
infinite.

In
`@packages/dashboard/src/app/dashboard/project/data-tab/conversations-header.tsx`:
- Line 2: Remove the unused import ColumnKey from conversations-header.tsx:
locate the import statement "import type { ColumnKey } from \"../_types\";" and
delete it so the file no longer imports this unused type; ensure no other
references to ColumnKey remain in the file before committing.

---

Outside diff comments:
In `@packages/api/src/services/v2-projects.ts`:
- Around line 382-403: The response currently uses the nullish coalescing
operator which treats a persisted null as missing; update the retention_days
expression to detect undefined rather than null. Replace the existing expression
using updated?.retentionDays ?? existing.retentionDays ?? null with a
conditional that checks for undefined, e.g. retention_days:
updated?.retentionDays !== undefined ? updated.retentionDays :
existing.retentionDays ?? null, so an explicit null from updated is preserved in
the response (referencing the variables updated, existing, and retentionDays).

In `@packages/dashboard/src/app/dashboard/project/_data-tab.tsx`:
- Around line 40-45: The pipeline fails because the literal { key: "expand",
label: "" } is inferred as a plain string key instead of the ColumnKey union;
update the literal so its key is narrowed to ColumnKey—e.g. replace the inline
object with a typed narrow form like { key: "expand" as ColumnKey, label: "" }
(or use { key: "expand", label: "" } as const and then cast to the expected
Column type) before spreading with allColumns/visibleCols in the columns array
to fix the type mismatch in columns.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1e27dc5f-a549-4dc9-893c-e0524a782a39

📥 Commits

Reviewing files that changed from the base of the PR and between 8f5741c and b6c837f.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (49)
  • packages/api/drizzle/0005_token_cost_tracking.sql
  • packages/api/drizzle/0006_retention_days.sql
  • packages/api/drizzle/meta/_journal.json
  • packages/api/package.json
  • packages/api/src/db/schema.ts
  • packages/api/src/index.ts
  • packages/api/src/lib/domain-verification.ts
  • packages/api/src/lib/helpers.ts
  • packages/api/src/lib/serialization.ts
  • packages/api/src/lib/validation.ts
  • packages/api/src/routes/analytics.ts
  • packages/api/src/routes/conversations/bulk.ts
  • packages/api/src/routes/conversations/crud.ts
  • packages/api/src/routes/v2/projects/crud.ts
  • packages/api/src/scheduled.ts
  • packages/api/src/services/analytics.ts
  • packages/api/src/services/bulk.ts
  • packages/api/src/services/conversations.ts
  • packages/api/src/services/domains.ts
  • packages/api/src/services/messages.ts
  • packages/api/src/services/projects.ts
  • packages/api/src/services/retention.ts
  • packages/api/src/services/v2-conversations.ts
  • packages/api/src/services/v2-projects.ts
  • packages/api/test/retention.test.ts
  • packages/api/test/setup.ts
  • packages/api/test/v2-conversations.test.ts
  • packages/api/wrangler.jsonc
  • packages/dashboard/src/app/dashboard/analytics/_analytics-content.tsx
  • packages/dashboard/src/app/dashboard/conversations/_conversation-row.tsx
  • packages/dashboard/src/app/dashboard/conversations/_table-columns.tsx
  • packages/dashboard/src/app/dashboard/project/_components.tsx
  • packages/dashboard/src/app/dashboard/project/_conversation-cell-renderers.tsx
  • packages/dashboard/src/app/dashboard/project/_conversation-columns.ts
  • packages/dashboard/src/app/dashboard/project/_data-tab-state.ts
  • packages/dashboard/src/app/dashboard/project/_data-tab.tsx
  • packages/dashboard/src/app/dashboard/project/_project-settings.tsx
  • packages/dashboard/src/app/dashboard/project/_retention-settings.tsx
  • packages/dashboard/src/app/dashboard/project/_types.ts
  • packages/dashboard/src/app/dashboard/project/data-tab/conversations-content.tsx
  • packages/dashboard/src/app/dashboard/project/data-tab/conversations-header.tsx
  • packages/dashboard/src/app/dashboard/project/data-tab/index.ts
  • packages/dashboard/src/app/dashboard/project/page.tsx
  • packages/dashboard/src/components/analytics/area-chart.tsx
  • packages/dashboard/src/components/analytics/summary-cards.tsx
  • packages/dashboard/src/components/landing/_hero-svg-factories.tsx
  • packages/dashboard/src/lib/constants.ts
  • packages/dashboard/src/lib/format-cost.ts
  • packages/shared/src/index.ts

@@ -0,0 +1 @@
ALTER TABLE `projects` ADD `retention_days` integer;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce retention bounds at the database layer.

retention_days is added without a CHECK, so out-of-range values can be stored despite the 1–3650 requirement.

Proposed fix
-ALTER TABLE `projects` ADD `retention_days` integer;
+ALTER TABLE `projects`
+ADD `retention_days` integer
+CHECK (`retention_days` IS NULL OR `retention_days` BETWEEN 1 AND 3650);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ALTER TABLE `projects` ADD `retention_days` integer;
ALTER TABLE `projects` ADD `retention_days` integer CHECK (`retention_days` IS NULL OR `retention_days` BETWEEN 1 AND 3650);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/drizzle/0006_retention_days.sql` at line 1, The migration adds
projects.retention_days without a constraint, allowing values outside 1–3650;
modify the migration to create/alter the projects table to include a CHECK
constraint enforcing retention_days BETWEEN 1 AND 3650 (e.g., add CHECK
(retention_days >= 1 AND retention_days <= 3650) with a clear constraint name
like chk_projects_retention_days_range), and handle existing NULLs/invalid rows
beforehand (backfill or set a safe default) so the constraint can be applied
without failing.

Comment on lines +41 to +45
"idx": 5,
"version": "6",
"when": 1774764000000,
"tag": "0005_token_cost_tracking",
"breakpoints": true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Migration journal is missing the 0006_retention_days entry.

This PR includes 0006_retention_days.sql, but _journal.json only adds 0005. Please regenerate migrations metadata (or add idx 6) so the journal and SQL files stay in sync.

Expected journal shape (illustrative)
     {
       "idx": 5,
       "version": "6",
       "when": 1774764000000,
       "tag": "0005_token_cost_tracking",
       "breakpoints": true
+    },
+    {
+      "idx": 6,
+      "version": "6",
+      "when": <generated_timestamp>,
+      "tag": "0006_retention_days",
+      "breakpoints": true
     }

As per coding guidelines "Commit generated SQL migrations from 'packages/api/drizzle/' directory to git".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/drizzle/meta/_journal.json` around lines 41 - 45, The migration
journal (_journal.json) is missing an entry for the new migration
0006_retention_days referenced by 0006_retention_days.sql; update the journal so
SQL and metadata stay in sync by regenerating migrations metadata from the
packages/api/drizzle/ directory or manually add an entry with idx 6 and tag
"0006_retention_days" (matching the existing shape that includes version, when,
tag, and breakpoints similar to the existing "0005_token_cost_tracking" entry).

Comment thread packages/api/package.json
Comment on lines +22 to +24
"@rollup/rollup-linux-x64-gnu": "^4.60.0",
"drizzle-kit": "^0.30.0",
"esbuild": "0.27.4",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify platform constraints and whether current deps are OS-specific.
npm view `@rollup/rollup-linux-x64-gnu` os cpu optionalDependencies
npm view rollup optionalDependencies

Repository: duyet/agentstate

Length of output: 1258


🏁 Script executed:

cd packages/api && cat -n package.json | head -40

Repository: duyet/agentstate

Length of output: 1082


Remove platform-specific Rollup binary from devDependencies.

@rollup/rollup-linux-x64-gnu is restricted to Linux x64 systems and will cause dependency resolution failures on macOS and Windows developer machines. Replace with rollup, which resolves the appropriate platform binary via its optional dependencies.

Proposed fix
-    "@rollup/rollup-linux-x64-gnu": "^4.60.0",
+    "rollup": "^4.60.0",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"@rollup/rollup-linux-x64-gnu": "^4.60.0",
"drizzle-kit": "^0.30.0",
"esbuild": "0.27.4",
"rollup": "^4.60.0",
"drizzle-kit": "^0.30.0",
"esbuild": "0.27.4",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/package.json` around lines 22 - 24, Replace the
platform-specific devDependency "@rollup/rollup-linux-x64-gnu" with the generic
"rollup" package in package.json devDependencies: remove the
"@rollup/rollup-linux-x64-gnu" entry and add "rollup" (e.g. version "^4.60.0")
so Rollup's platform-specific optional binaries resolve correctly across
macOS/Windows/Linux; ensure package.json keys match "rollup" and run your
package manager install to update lockfile.

.references(() => organizations.id),
name: text("name").notNull(),
slug: text("slug").notNull(),
retentionDays: integer("retention_days"), // NULL = infinite retention
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce the retention_days range in the schema.

The PR defines this field as 1–3650, but Line 34 currently accepts 0, negatives, and arbitrarily large values. A bad write here can make the retention job delete data immediately or never expire it. Please add a DB-level CHECK in the schema/migration so every write path shares the same guardrail.

As per coding guidelines, "Maintain database schema in 'packages/api/src/db/schema.ts' as single source of truth".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/db/schema.ts` at line 34, The retention_days column
currently defined as retentionDays: integer("retention_days") needs a DB-level
CHECK enforcing 1..3650 to prevent invalid writes; update
packages/api/src/db/schema.ts to attach a check constraint to the column/table
(e.g., a CHECK (retention_days BETWEEN 1 AND 3650) via the schema builder or
column.check/constraint API) and add/emit a corresponding migration so the
database enforces the rule for all write paths; ensure the constraint name is
unique and reference retentionDays/retention_days in the change.

Comment on lines +54 to +57
export async function verifyDnsTxt(
domain: string,
expectedToken: string,
): Promise<VerificationCheckResult> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reject non-public targets before starting verification.

All three checks trust a stored domain as a network target. Right now the service layer only normalizes input, and its validator still accepts values like localhost and IPv4 literals, so this path can be used against non-public hosts. Add a hard guard here for single-label hosts/IP literals, and ideally resolve/reject private or reserved addresses before the HTTP checks run.

Also applies to: 97-100, 128-131, 176-179

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/lib/domain-verification.ts` around lines 54 - 57, Add a hard
guard in the verification entrypoints (verifyDnsTxt, verifyTxtRecord,
verifyHttpTxt, verifyHttpFile) to reject single-label hosts and IP literals
before performing any network ops: detect hostnames without a dot or purely
numeric IP strings and throw/return a validation error; additionally perform a
DNS resolution for A/AAAA records and reject any resolved addresses that are
private/reserved (RFC1918, RFC6598, link-local, multicast, loopback, etc.) so
the subsequent HTTP/DNS checks never target non-public addresses. Ensure the
checks run synchronously at the top of each function (before HTTP fetches or TXT
queries) and return the same VerificationCheckResult/error shape used elsewhere.

Comment thread packages/api/src/services/domains.ts Outdated
Comment thread packages/api/src/services/retention.ts
Comment thread packages/api/test/retention.test.ts
Comment thread packages/dashboard/src/app/dashboard/project/data-tab/conversations-header.tsx Outdated
Widen the columns type to accept ColumnKey | "expand" so the expand
toggle column doesn't widen the whole array to string, which fails CI
typecheck.

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@duyetbot
Copy link
Copy Markdown
Contributor Author

CI Fix: Type error in `_data-tab.tsx` column construction

The "expand" key in the columns array widens the type to `string`, causing:

Type '{ key: string; label: string; }[]' is not assignable to type '{ key: ColumnKey; label: string; }[]'

Fix (2 files):

1. _data-tab.tsx — explicitly type the columns array:
```ts
const columns: Array<{ key: ColumnKey | "expand"; label: string }> = [
```

2. data-tab/conversations-content.tsx — widen the columns prop type:
```ts
columns: Array<{ key: ColumnKey | "expand"; label: string }>;
```

This passes local typecheck. Ready for someone with push access to cherry-pick onto the branch.

- Add isSafeVerificationTarget guard to reject single-label hosts and
  IP literals before making outbound verification requests (SSRF)
- Add optimistic concurrency check on verifyDomain — only updates rows
  still in pending/failed status, then re-queries for actual state
- Fix ColumnKey type inference in _data-tab.tsx (as const on expand)

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
@duyetbot
Copy link
Copy Markdown
Contributor Author

CI is green — ready to merge

All checks pass:

  • Lint + Typecheck + Test: pass (49s)
  • GitGuardian Security Checks: pass
  • 285/285 tests passing

This PR includes Phase 1 deliverables:

  • Token cost tracking (schema + API + dashboard UI)
  • Data retention with auto-expiry
  • Domain verification logic
  • Supporting refactors and quality fixes

Requesting merge approval from maintainers.

Build the Memory Layer (Phase 2 Part 1+2):
- Embedding service (embeddings.ts): generate 768d vectors via
  @cf/baai/bge-m3, upsert/query/delete via Vectorize
- Semantic search endpoint: POST /v2/conversations/search with
  project-scoped vector queries, ranked results with relevance scores
- Context retrieval endpoint: POST /v2/conversations/context with
  token-budget packing for LLM injection
- Wire embedding on message creation (fire-and-forget via waitUntil)
- Add VECTORIZE_INDEX binding to wrangler config + Bindings type
- Shared types: SemanticSearchRequest/Response, ContextRetrieval types

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
packages/api/src/lib/domain-verification.ts (1)

44-52: ⚠️ Potential issue | 🔴 Critical

The network-target hardening is still incomplete.

isSafeVerificationTarget() only validates the original string, while both HTTP verifiers blindly follow redirects. A public hostname can still resolve, or redirect, to loopback/link-local/RFC1918 targets, so verification can still hit internal services. Resolve the hostname to public A/AAAA records before fetch, and re-check every redirect target instead of trusting redirect: "follow".

Also applies to: 123-126, 157-160

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/lib/domain-verification.ts` around lines 44 - 52, The
current isSafeVerificationTarget(domain) only inspects the input string but the
HTTP verifiers follow redirects and never re-validate targets; update the
verification flow used around the HTTP fetch logic (the functions calling fetch
in this file — replace any fetch(..., { redirect: "follow" }) usage) to: 1)
perform DNS resolution of the hostname to A/AAAA records and reject if any
resolved address is loopback, link-local, or RFC1918/private; 2) perform the
HTTP request with redirects disabled (no automatic follow) and, on receiving a
3xx with Location, parse the redirect target, resolve its hostname to A/AAAA and
re-run the same IP checks before following; 3) repeat manual redirect handling
up to the existing redirect limit; 4) keep isSafeVerificationTarget(domain) as a
syntactic check but move network-safety decisions to the DNS-and-IP checks
described above so redirects and final targets are always validated before any
connection is made.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/api/src/lib/domain-verification.ts`:
- Around line 203-210: The current use of Promise.all delays returning verified
true until every verifier finishes; change to short-circuit for the boolean by
creating per-probe promises that resolve only on success (and reject on failure)
and use Promise.any on [verifyDnsTxt, verifyHttpFile, verifyMetaTag] to compute
verified quickly, while concurrently running the full probes (e.g., via
Promise.allSettled or the existing Promise.all) to collect per-method results;
update the code that sets verified (currently using results.some(...)) to use
the Promise.any outcome (fall back to false if Promise.any rejects) and still
return the full results array for callers that need details.
- Around line 47-50: The existing IPv6 heuristic in domain-verification.ts
incorrectly treats hex-only hostnames like `bead.cafe` as IPv6 by testing
/^[0-9a-f:]+$/i against domain.replace(/\./g, ""), so change the logic to only
apply the IPv6-literal test when the original host actually looks like an IPv6
literal (e.g. it starts with "[" or contains a ":"), rather than unconditionally
after removing dots; update the condition around domain.startsWith("[") and the
/^[0-9a-f:]+$/i check to first check domain.includes(":") (or
domain.startsWith("[")) before running the hex/colon regex or, better, replace
the heuristic with a call to a proper IP check (e.g. net.isIP) to decide if the
input is an IP literal.

In `@packages/api/src/services/domains.ts`:
- Around line 329-331: The code currently falls back to the stale pre-read
domain when re-querying returns null; change the logic around
getDomain(domainId, projectId) so that if fresh is null you do not return the
old domain but instead surface a not-found error (or return null) to the caller;
specifically update the block that calls getDomain and buildVerificationResult
to only call buildVerificationResult(fresh) when fresh exists and otherwise
throw/return an appropriate not-found response for the domainId/projectId rather
than using the stale domain variable.

In `@packages/dashboard/src/app/dashboard/project/_data-tab.tsx`:
- Around line 40-45: Create a shared column descriptor type in _types.ts (e.g.,
export type ColumnDescriptor = { key: ColumnKey | "expand"; label: string }) and
replace the inline type annotation Array<{ key: ColumnKey | "expand"; label:
string }> used in the columns declaration and the similar declaration in
data-tab/conversations-content.tsx with this exported ColumnDescriptor (or
Array<ColumnDescriptor>), updating imports to import { ColumnDescriptor } from
"_types" so both call sites use the single source of truth.

---

Duplicate comments:
In `@packages/api/src/lib/domain-verification.ts`:
- Around line 44-52: The current isSafeVerificationTarget(domain) only inspects
the input string but the HTTP verifiers follow redirects and never re-validate
targets; update the verification flow used around the HTTP fetch logic (the
functions calling fetch in this file — replace any fetch(..., { redirect:
"follow" }) usage) to: 1) perform DNS resolution of the hostname to A/AAAA
records and reject if any resolved address is loopback, link-local, or
RFC1918/private; 2) perform the HTTP request with redirects disabled (no
automatic follow) and, on receiving a 3xx with Location, parse the redirect
target, resolve its hostname to A/AAAA and re-run the same IP checks before
following; 3) repeat manual redirect handling up to the existing redirect limit;
4) keep isSafeVerificationTarget(domain) as a syntactic check but move
network-safety decisions to the DNS-and-IP checks described above so redirects
and final targets are always validated before any connection is made.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 016a0baf-380e-4c79-adcd-743079715f48

📥 Commits

Reviewing files that changed from the base of the PR and between b6c837f and 2765a63.

📒 Files selected for processing (4)
  • packages/api/src/lib/domain-verification.ts
  • packages/api/src/services/domains.ts
  • packages/dashboard/src/app/dashboard/project/_data-tab.tsx
  • packages/dashboard/src/app/dashboard/project/data-tab/conversations-content.tsx

Comment thread packages/api/src/lib/domain-verification.ts Outdated
Comment on lines +203 to +210
const results = await Promise.all([
verifyDnsTxt(domain, expectedToken),
verifyHttpFile(domain, expectedToken),
verifyMetaTag(domain, expectedToken),
]);

const verified = results.some((r) => r.success);
return { verified, results };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Promise.all keeps successful verifications waiting on the slowest probe.

A fast DNS success still waits for the HTTP checks to finish or time out because Promise.all only resolves after every verifier settles. verifyDomain() only uses the boolean, so consider a short-circuit helper for that path and keep full per-method results only where they are needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/lib/domain-verification.ts` around lines 203 - 210, The
current use of Promise.all delays returning verified true until every verifier
finishes; change to short-circuit for the boolean by creating per-probe promises
that resolve only on success (and reject on failure) and use Promise.any on
[verifyDnsTxt, verifyHttpFile, verifyMetaTag] to compute verified quickly, while
concurrently running the full probes (e.g., via Promise.allSettled or the
existing Promise.all) to collect per-method results; update the code that sets
verified (currently using results.some(...)) to use the Promise.any outcome
(fall back to false if Promise.any rejects) and still return the full results
array for callers that need details.

Comment thread packages/api/src/services/domains.ts Outdated
Comment on lines +40 to 45
const columns: Array<{ key: ColumnKey | "expand"; label: string }> = [
{ key: "expand" as const, label: "" },
...allColumns
.filter((c) => visibleCols.includes(c.key))
.map((col) => ({ key: col.key, label: col.label })),
];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Extract a shared column descriptor type to prevent drift.

Array<{ key: ColumnKey | "expand"; label: string }> is duplicated here and in data-tab/conversations-content.tsx; centralizing this in _types.ts will keep both call sites aligned.

♻️ Suggested refactor
diff --git a/packages/dashboard/src/app/dashboard/project/_types.ts b/packages/dashboard/src/app/dashboard/project/_types.ts
@@
 export type ColumnKey =
   | "title"
   | "external_id"
   | "message_count"
   | "token_count"
   | "total_cost"
   | "total_tokens"
   | "metadata"
   | "created_at"
   | "updated_at";
+
+export type DataTableColumnKey = ColumnKey | "expand";
+export type DataTableColumn = { key: DataTableColumnKey; label: string };

diff --git a/packages/dashboard/src/app/dashboard/project/_data-tab.tsx b/packages/dashboard/src/app/dashboard/project/_data-tab.tsx
@@
-import type { ColumnKey } from "./_types";
+import type { ColumnKey, DataTableColumn } from "./_types";
@@
-  const columns: Array<{ key: ColumnKey | "expand"; label: string }> = [
+  const columns: DataTableColumn[] = [

diff --git a/packages/dashboard/src/app/dashboard/project/data-tab/conversations-content.tsx b/packages/dashboard/src/app/dashboard/project/data-tab/conversations-content.tsx
@@
-import type { ColumnKey } from "../_types";
+import type { ColumnKey, DataTableColumn } from "../_types";
@@
-  columns: Array<{ key: ColumnKey | "expand"; label: string }>;
+  columns: DataTableColumn[];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/dashboard/src/app/dashboard/project/_data-tab.tsx` around lines 40 -
45, Create a shared column descriptor type in _types.ts (e.g., export type
ColumnDescriptor = { key: ColumnKey | "expand"; label: string }) and replace the
inline type annotation Array<{ key: ColumnKey | "expand"; label: string }> used
in the columns declaration and the similar declaration in
data-tab/conversations-content.tsx with this exported ColumnDescriptor (or
Array<ColumnDescriptor>), updating imports to import { ColumnDescriptor } from
"_types" so both call sites use the single source of truth.

P1 Security:
- Block redirect-based SSRF in HTTP/meta_tag verification (redirect: "error")
- Fix IPv6 false positive rejecting domains like "bead.cafe"

P1 Data Integrity:
- Add conversation_tags cascade delete in bulk operations
- Remove silent 5000-row truncation from bulk message export
- Include external_id, token_count, metadata, accounting fields in exports

P2 Correctness:
- Fix retention_days null handling with explicit undefined check
- Error on stale domain verify re-query instead of silent fallback
- Reject empty PATCH body on project update (schema refine)
- Add Number.isInteger() check for fractional retention days

P3 Code Quality:
- Remove unused ColumnKey import from conversations-header
- Add beforeEach test isolation in retention tests
- Track per-project timing in retention cleanup
- Remove unused _d1Db parameter from cleanupExpiredConversations
- Add try-catch structured error logging in scheduled handler

Co-Authored-By: Duyet Le <me@duyet.net>
Co-Authored-By: duyetbot <bot@duyet.net>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

♻️ Duplicate comments (1)
packages/api/src/scheduled.ts (1)

24-33: ⚠️ Potential issue | 🟠 Major

Re-throw after logging the cron failure.

The catch block currently turns a failed cleanup into a successful scheduled run. That hides retention outages from Cron monitoring and makes retries/alerting much harder.

♻️ Minimal fix
   } catch (err) {
     console.error(
       JSON.stringify({
         level: "error",
         event: "retention_cleanup_failed",
         error: err instanceof Error ? err.message : String(err),
         trigger: event.cron,
       }),
     );
+    throw err;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/scheduled.ts` around lines 24 - 33, The catch block in
scheduled.ts is currently swallowing errors (logging a retention_cleanup_failed
event) which makes the scheduled run appear successful; after logging the error
details (the existing JSON.stringify block referencing event.cron and the
retention_cleanup_failed event), re-throw the original error (throw err) so the
failure surfaces to the scheduler/monitoring and enables retries/alerting;
ensure you throw the same err object (not a new Error) to preserve
stack/context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/api/src/lib/domain-verification.ts`:
- Around line 168-189: The regex-based meta-tag detection in
domain-verification.ts (variables pattern and reversedPattern) is fragile;
update the implementation to robustly handle arbitrary attribute order and
whitespace by either switching to an HTML parser (e.g., use a DOM/HTML parsing
library to query <meta name="agentstate-verification"> and read its content
attribute) or replace the two fragile regexes with single, more permissive
regexes that allow newlines and arbitrary attributes (use flags and patterns
that permit [\s\S] or the DOTALL/s flag and \b anchors so the content attribute
is captured regardless of ordering). Ensure the code that constructs the result
({ method: "meta_tag", success, error }) continues to run unchanged, only
changing how match/reversedMatch are produced (i.e., adjust
pattern/reversedPattern or replace with parser logic).

In `@packages/api/src/routes/v2/conversations/semantic.ts`:
- Around line 172-178: The code uses r.metadata.message_id directly when
checking and adding to the seen Set and when looking up in messageLookup,
causing inconsistency with the String() normalization used earlier; update the
three usages to normalize the id by calling String(r.metadata.message_id) in
seen.has(...), seen.add(...), and messageLookup.get(...) so the dedup logic
matches the behavior on line 148 and avoids mismatches between primitive types.
- Around line 49-84: The code inconsistently converts r.metadata.message_id to
String() when building messageIds for the DB query (messageIds) but uses the raw
value when populating/looking up in messageLookup and when checking/inserting
into seen; normalize the type to a single canonical form across all usages
(e.g., always String) by mapping results -> messageIds = results.map(r =>
String(r.metadata.message_id)), constructing messageLookup with keys as
String(m.id), and using String(r.metadata.message_id) when checking seen and
looking up messageLookup so comparisons (in messageIds, messageLookup, seen) are
consistent; update references to messageLookup, seen, and anywhere
r.metadata.message_id is used to use the same String conversion.
- Around line 13-18: The request body declares filter fields
(SemanticSearchRequest) but the router.post handler currently ignores them and
only uses c.get("projectId"); update the router.post handler to read body.filter
and apply its criteria: include project_id (falling back to c.get("projectId")
if absent), apply date_from/date_to as a metadata timestamp range, and require
tags (all or any per spec) when querying the vector store or by post-filtering
the returned results; modify the vector search call (or post-query loop that
produces results) to accept these metadata filters so date and tag constraints
are enforced rather than left unused.

In `@packages/api/src/services/bulk.ts`:
- Around line 19-41: In bulkDeleteConversations, add an early return when the
input ids array is empty to avoid executing the DB query and batch operations;
i.e., at the top of the function check if ids.length === 0 (or ids?.length ===
0) and return 0 immediately so the subsequent select(...).where(inArray(...,
ids)) and db.batch(...) are skipped.
- Around line 76-83: Update the JSDoc for the "Export conversations with their
messages." function to reflect that when ids is empty the function returns a
maximum of 100 most recent conversations (not "all"); explicitly state the limit
in the description and adjust the `@returns` or `@param` ids text to note "exports
up to 100 most recent conversations when ids is omitted or empty." Reference the
JSDoc block immediately above the export conversations function to make this
change.
- Around line 148-172: The map lookup msgsByConv.get(conv.id) is performed twice
inside the rows.map callback; cache it in a local variable (e.g., const msgs =
msgsByConv.get(conv.id) ?? []) at the top of the mapping for each conv and then
use msgs for both message_count and messages to avoid redundant lookups and make
the code in the rows.map callback (where conv, msgsByConv, deserializeMetadata
are used) cleaner.

In `@packages/api/src/services/embeddings.ts`:
- Around line 99-110: Update the mapping over result.matches to detect when
critical metadata fields (message_id, conversation_id, project_id, role,
created_at) are missing or fall back to defaults, and emit a warning including
match.id and which fields are missing; inside the same function that uses
result.matches.map and constructs TypedVectorMetadata, compute an array of
missingKeys by checking match.metadata?.message_id, ?.conversation_id, etc.,
then call the module’s logger (use the existing logger/processLogger in this
file) with logger.warn and a concise message listing match.id and missingKeys
before returning the mapped object so missing metadata is visible in logs
without changing the fallback behavior.
- Around line 42-43: The file defines an unused constant VECTORIZE_INDEX which
should be removed to avoid dead code; keep the EMBEDDING_MODEL declaration
(including its as unknown as keyof AiModels cast) but delete the VECTORIZE_INDEX
line and any related unused import/usage in this file so only EMBEDDING_MODEL
remains.

In `@packages/api/src/services/retention.ts`:
- Around line 44-97: The current loop over projectsWithRetention with an
unbounded per-project while(true) can starve later projects; modify the
retention loop (involving projectsWithRetention, TIME_BUDGET_MS, BATCH_SIZE, and
the inner deletion logic that calls db.batch and queries
conversations/messages/conversationTags) to enforce a per-project cap—e.g.,
limit max batches or max deleted rows per project (add a maxBatchesPerProject or
maxDeletesPerProject counter and break to the next project when reached) and
still respect the global TIME_BUDGET_MS; alternatively (or additionally) persist
a resume cursor (lastProcessedConversationId or lastProjectCursor) between runs
so subsequent executions continue where the previous run left off.
- Around line 25-27: cleanupExpiredConversations currently only deletes D1 rows
leaving Vectorize embeddings behind; update its signature to accept the vector
binding (e.g., add a parameter named vectorBinding or vectorStore) and, inside
cleanupExpiredConversations, collect the message row ids to be removed (these
are the same ids stored in Vectorize as row.id in messages.ts), call the vector
binding's delete method for those ids before executing the D1 deletions, and
then proceed to delete the DB rows; ensure you reference
cleanupExpiredConversations and the same row.id identifiers so vector removal
runs prior to the database cleanup.
- Around line 68-90: The deletion currently has a TOCTOU gap: after selecting
expired ids (expired / ids) a concurrent update can resurrect a conversation
before db.batch runs; update the three deletes in db.batch (conversationTags,
messages, conversations) to re-apply the expiry predicate so only conversations
still satisfying lt(conversations.updatedAt, cutoff) are removed — e.g.,
restrict deletions to ids that are both in ids and still present in a
subquery/select from conversations where eq(projectId, project.id) and
lt(updatedAt, cutoff); modify the where clauses for the
conversationTags/messages deletes to use that same filtered conversation-id set
so tags/messages for resurrected conversations are not deleted.

In
`@packages/dashboard/src/app/dashboard/project/data-tab/conversations-header.tsx`:
- Around line 35-48: ColumnPickerTrigger should link the button to the popup via
aria-controls and drop the unnecessary wrapper: update the ColumnPickerTrigger
signature to accept a unique popup id prop (e.g., controlsId or menuId) in
addition to show and onToggle, remove the surrounding <div
className="relative">, and set aria-controls={controlsId} on the Button while
keeping aria-expanded={show}, aria-haspopup="menu" and the existing handlers so
the button is programmatically associated with the controlled popup.

In `@packages/shared/src/index.ts`:
- Around line 305-314: The SemanticSearchRequest type declares filter fields
(date_from, date_to, tags) that aren’t used by the semantic search
implementation; either remove those fields from the interface or explicitly mark
them as reserved/unused with JSDoc so consumers know they’re not implemented.
Update the SemanticSearchRequest declaration in shared/src/index.ts (and any
usages) to either drop date_from/date_to/tags or add JSDoc comments like
"@reserved for future use — not used by semantic.ts" and keep semantic.ts’s
search functions unchanged; ensure TypeScript and tests compile after the
change.
- Around line 334-338: The ContextRetrievalRequest interface declares project_id
as required but the runtime in
packages/api/src/routes/v2/conversations/semantic.ts ignores this field and uses
the authenticated project's context; update the interface
(ContextRetrievalRequest) to make project_id optional (project_id?: string) or
add a clear JSDoc comment on the ContextRetrievalRequest type stating that
project_id is ignored when requests are authenticated and the project is derived
from the auth context so callers aren’t misled; ensure any related type usages
respect the optionality.

---

Duplicate comments:
In `@packages/api/src/scheduled.ts`:
- Around line 24-33: The catch block in scheduled.ts is currently swallowing
errors (logging a retention_cleanup_failed event) which makes the scheduled run
appear successful; after logging the error details (the existing JSON.stringify
block referencing event.cron and the retention_cleanup_failed event), re-throw
the original error (throw err) so the failure surfaces to the
scheduler/monitoring and enables retries/alerting; ensure you throw the same err
object (not a new Error) to preserve stack/context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 82d44f15-73cb-4fec-a69d-72e6ee321a8e

📥 Commits

Reviewing files that changed from the base of the PR and between 2765a63 and 456dd89.

📒 Files selected for processing (16)
  • packages/api/src/lib/domain-verification.ts
  • packages/api/src/routes/v2/conversations/messages.ts
  • packages/api/src/routes/v2/conversations/semantic.ts
  • packages/api/src/routes/v2/projects/crud.ts
  • packages/api/src/scheduled.ts
  • packages/api/src/services/bulk.ts
  • packages/api/src/services/domains.ts
  • packages/api/src/services/embeddings.ts
  • packages/api/src/services/retention.ts
  • packages/api/src/services/v2-projects.ts
  • packages/api/src/types.ts
  • packages/api/test/retention.test.ts
  • packages/api/wrangler.jsonc
  • packages/dashboard/src/app/dashboard/project/_retention-settings.tsx
  • packages/dashboard/src/app/dashboard/project/data-tab/conversations-header.tsx
  • packages/shared/src/index.ts

Comment on lines +168 to +189
const html = await response.text();
// Match <meta name="agentstate-verification" content="..."> in either attribute order
const pattern =
/<meta\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["']\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["'][^>]*>/i;
const match = html.match(pattern);

if (!match) {
// Try reversed attribute order: content before name
const reversedPattern =
/<meta\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["']\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["'][^>]*>/i;
const reversedMatch = html.match(reversedPattern);

if (!reversedMatch) {
return { method: "meta_tag", success: false, error: "Meta tag not found" };
}

const success = reversedMatch[1] === expectedToken;
return { method: "meta_tag", success, error: success ? undefined : "Token mismatch in meta tag" };
}

const success = match[1] === expectedToken;
return { method: "meta_tag", success, error: success ? undefined : "Token mismatch in meta tag" };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Regex-based meta tag parsing is pragmatic but fragile.

The dual-pattern approach handles the common attribute orderings. However, HTML regex parsing has known edge cases:

  • Extra attributes between name and content (e.g., <meta id="x" name="agentstate-verification" class="y" content="...">)
  • Non-standard whitespace or newlines within the tag

The current pattern (?:[^>]*?\s)? is designed to skip intervening attributes but requires a trailing space, which may not match multi-attribute tags correctly.

This is acceptable for MVP since domain owners control their HTML, but a more robust approach would use a proper HTML parser or a more permissive regex.

💡 Optional: More permissive regex pattern
-    const pattern =
-      /<meta\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["']\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["'][^>]*>/i;
+    // More permissive: allow any attributes between name and content
+    const pattern =
+      /<meta\s+[^>]*?name\s*=\s*["']agentstate-verification["'][^>]*?content\s*=\s*["']([^"']+)["'][^>]*>/i;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const html = await response.text();
// Match <meta name="agentstate-verification" content="..."> in either attribute order
const pattern =
/<meta\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["']\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["'][^>]*>/i;
const match = html.match(pattern);
if (!match) {
// Try reversed attribute order: content before name
const reversedPattern =
/<meta\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["']\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["'][^>]*>/i;
const reversedMatch = html.match(reversedPattern);
if (!reversedMatch) {
return { method: "meta_tag", success: false, error: "Meta tag not found" };
}
const success = reversedMatch[1] === expectedToken;
return { method: "meta_tag", success, error: success ? undefined : "Token mismatch in meta tag" };
}
const success = match[1] === expectedToken;
return { method: "meta_tag", success, error: success ? undefined : "Token mismatch in meta tag" };
const pattern =
/<meta\s+(?:[^>]*?\s)?name\s*=\s*["']agentstate-verification["']\s+(?:[^>]*?\s)?content\s*=\s*["']([^"']+)["'][^>]*>/i;
const match = html.match(pattern);
if (!match) {
// Try reversed attribute order: content before name
// More permissive: allow any attributes between name and content
const reversedPattern =
/<meta\s+[^>]*?content\s*=\s*["']([^"']+)["'][^>]*?name\s*=\s*["']agentstate-verification["'][^>]*>/i;
const reversedMatch = html.match(reversedPattern);
if (!reversedMatch) {
return { method: "meta_tag", success: false, error: "Meta tag not found" };
}
const success = reversedMatch[1] === expectedToken;
return { method: "meta_tag", success, error: success ? undefined : "Token mismatch in meta tag" };
}
const success = match[1] === expectedToken;
return { method: "meta_tag", success, error: success ? undefined : "Token mismatch in meta tag" };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/lib/domain-verification.ts` around lines 168 - 189, The
regex-based meta-tag detection in domain-verification.ts (variables pattern and
reversedPattern) is fragile; update the implementation to robustly handle
arbitrary attribute order and whitespace by either switching to an HTML parser
(e.g., use a DOM/HTML parsing library to query <meta
name="agentstate-verification"> and read its content attribute) or replace the
two fragile regexes with single, more permissive regexes that allow newlines and
arbitrary attributes (use flags and patterns that permit [\s\S] or the DOTALL/s
flag and \b anchors so the content attribute is captured regardless of
ordering). Ensure the code that constructs the result ({ method: "meta_tag",
success, error }) continues to run unchanged, only changing how
match/reversedMatch are produced (i.e., adjust pattern/reversedPattern or
replace with parser logic).

Comment on lines +13 to +18
router.post("/", async (c) => {
const body = await c.req.json<{
query?: string;
limit?: number;
filter?: { project_id?: string; date_from?: number; date_to?: number; tags?: string[] };
}>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if filter fields are documented or used elsewhere
rg -n "date_from|date_to" packages/api/src/routes/v2/conversations/

Repository: duyet/agentstate

Length of output: 208


🏁 Script executed:

# First, let's see the actual implementation to check if filter is used anywhere
cat -n packages/api/src/routes/v2/conversations/semantic.ts | head -50

Repository: duyet/agentstate

Length of output: 2192


🏁 Script executed:

# Check if the filter object is destructured or used in the semantic.ts file
rg -n "filter" packages/api/src/routes/v2/conversations/semantic.ts -A 2 -B 2

Repository: duyet/agentstate

Length of output: 651


🏁 Script executed:

# Verify the shared types definition
rg -n "SemanticSearchRequest\|ContextRetrievalRequest" packages/shared/src/index.ts -A 10

Repository: duyet/agentstate

Length of output: 42


🏁 Script executed:

# Find and check shared types file
cat -n packages/shared/src/index.ts | grep -A 20 -B 2 "Semantic\|Conversation"

Repository: duyet/agentstate

Length of output: 5588


🏁 Script executed:

# Also check the entire request body type structure
rg -n "project_id" packages/api/src/routes/v2/conversations/semantic.ts

Repository: duyet/agentstate

Length of output: 306


🏁 Script executed:

# Verify if body.filter is used anywhere in the implementation
rg -n "body\.filter" packages/api/src/routes/v2/conversations/semantic.ts

Repository: duyet/agentstate

Length of output: 42


🏁 Script executed:

# Check the full semantic.ts implementation to see complete flow
cat packages/api/src/routes/v2/conversations/semantic.ts

Repository: duyet/agentstate

Length of output: 6718


Filter fields declared in request body are not used.

The request body type includes filter?: { project_id?: string; date_from?: number; date_to?: number; tags?: string[] }, matching SemanticSearchRequest in packages/shared/src/index.ts (lines 305-314). However, the implementation ignores these filters entirely—only c.get("projectId") from the auth context is used for filtering.

Either remove the unused filter fields from the shared types or implement the filtering logic (date range, tags) in the Vectorize query or post-query filtering.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/routes/v2/conversations/semantic.ts` around lines 13 - 18,
The request body declares filter fields (SemanticSearchRequest) but the
router.post handler currently ignores them and only uses c.get("projectId");
update the router.post handler to read body.filter and apply its criteria:
include project_id (falling back to c.get("projectId") if absent), apply
date_from/date_to as a metadata timestamp range, and require tags (all or any
per spec) when querying the vector store or by post-filtering the returned
results; modify the vector search call (or post-query loop that produces
results) to accept these metadata filters so date and tag constraints are
enforced rather than left unused.

Comment on lines +49 to +84
const messageIds = results.map((r) => String(r.metadata.message_id));
const matchingMessages = await c
.get("db")
.select({
id: messages.id,
conversationId: messages.conversationId,
role: messages.role,
content: messages.content,
createdAt: messages.createdAt,
})
.from(messages)
.where(inArray(messages.id, messageIds));

// Build a lookup for fast access
const messageLookup = new Map(matchingMessages.map((m) => [m.id, m]));

// Fetch conversation titles for context
const convIds = [...new Set(results.map((r) => String(r.metadata.conversation_id)))];
const convRows = await c
.get("db")
.select({ id: conversations.id, title: conversations.title })
.from(conversations)
.where(inArray(conversations.id, convIds));
const titleLookup = new Map(convRows.map((c) => [c.id, c.title]));

// Merge vector results with DB data, deduplicate by message
const seen = new Set<string>();
const searchResults = results
.filter((r) => {
if (seen.has(r.metadata.message_id)) return false;
seen.add(r.metadata.message_id);
return messageLookup.has(r.metadata.message_id);
})
.slice(0, limit)
.map((r) => {
const msg = messageLookup.get(r.metadata.message_id)!;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent String() conversion may cause lookup mismatches.

Line 49 wraps r.metadata.message_id with String() for the DB query, but lines 78-84 use r.metadata.message_id directly for the seen Set and messageLookup Map. If metadata values are stored as numbers in Vectorize, the Set/Map lookups will fail because "123" !== 123.

🔧 Proposed fix for consistent type handling
     const seen = new Set<string>();
     const searchResults = results
       .filter((r) => {
-        if (seen.has(r.metadata.message_id)) return false;
-        seen.add(r.metadata.message_id);
-        return messageLookup.has(r.metadata.message_id);
+        const msgId = String(r.metadata.message_id);
+        if (seen.has(msgId)) return false;
+        seen.add(msgId);
+        return messageLookup.has(msgId);
       })
       .slice(0, limit)
       .map((r) => {
-        const msg = messageLookup.get(r.metadata.message_id)!;
+        const msg = messageLookup.get(String(r.metadata.message_id))!;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const messageIds = results.map((r) => String(r.metadata.message_id));
const matchingMessages = await c
.get("db")
.select({
id: messages.id,
conversationId: messages.conversationId,
role: messages.role,
content: messages.content,
createdAt: messages.createdAt,
})
.from(messages)
.where(inArray(messages.id, messageIds));
// Build a lookup for fast access
const messageLookup = new Map(matchingMessages.map((m) => [m.id, m]));
// Fetch conversation titles for context
const convIds = [...new Set(results.map((r) => String(r.metadata.conversation_id)))];
const convRows = await c
.get("db")
.select({ id: conversations.id, title: conversations.title })
.from(conversations)
.where(inArray(conversations.id, convIds));
const titleLookup = new Map(convRows.map((c) => [c.id, c.title]));
// Merge vector results with DB data, deduplicate by message
const seen = new Set<string>();
const searchResults = results
.filter((r) => {
if (seen.has(r.metadata.message_id)) return false;
seen.add(r.metadata.message_id);
return messageLookup.has(r.metadata.message_id);
})
.slice(0, limit)
.map((r) => {
const msg = messageLookup.get(r.metadata.message_id)!;
const messageIds = results.map((r) => String(r.metadata.message_id));
const matchingMessages = await c
.get("db")
.select({
id: messages.id,
conversationId: messages.conversationId,
role: messages.role,
content: messages.content,
createdAt: messages.createdAt,
})
.from(messages)
.where(inArray(messages.id, messageIds));
// Build a lookup for fast access
const messageLookup = new Map(matchingMessages.map((m) => [m.id, m]));
// Fetch conversation titles for context
const convIds = [...new Set(results.map((r) => String(r.metadata.conversation_id)))];
const convRows = await c
.get("db")
.select({ id: conversations.id, title: conversations.title })
.from(conversations)
.where(inArray(conversations.id, convIds));
const titleLookup = new Map(convRows.map((c) => [c.id, c.title]));
// Merge vector results with DB data, deduplicate by message
const seen = new Set<string>();
const searchResults = results
.filter((r) => {
const msgId = String(r.metadata.message_id);
if (seen.has(msgId)) return false;
seen.add(msgId);
return messageLookup.has(msgId);
})
.slice(0, limit)
.map((r) => {
const msg = messageLookup.get(String(r.metadata.message_id))!;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/routes/v2/conversations/semantic.ts` around lines 49 - 84,
The code inconsistently converts r.metadata.message_id to String() when building
messageIds for the DB query (messageIds) but uses the raw value when
populating/looking up in messageLookup and when checking/inserting into seen;
normalize the type to a single canonical form across all usages (e.g., always
String) by mapping results -> messageIds = results.map(r =>
String(r.metadata.message_id)), constructing messageLookup with keys as
String(m.id), and using String(r.metadata.message_id) when checking seen and
looking up messageLookup so comparisons (in messageIds, messageLookup, seen) are
consistent; update references to messageLookup, seen, and anywhere
r.metadata.message_id is used to use the same String conversion.

Comment on lines +114 to +119
router.post("/context", async (c) => {
const body = await c.req.json<{
query?: string;
max_tokens?: number;
project_id?: string;
}>();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Request body project_id is ignored; shared type declares it as required.

ContextRetrievalRequest in packages/shared/src/index.ts (line 337) declares project_id: string as required, but the implementation uses c.get("projectId") from the authentication context instead. This creates a contract mismatch—clients may send project_id expecting it to be respected, but it's silently ignored.

Either:

  1. Remove project_id from ContextRetrievalRequest and document that project scoping is automatic via auth, or
  2. Use the body's project_id for cross-project queries (if that's the intent) with appropriate authorization checks.

Comment on lines +172 to +178
const seen = new Set<string>();
for (const r of results) {
if (seen.has(r.metadata.message_id)) continue;
seen.add(r.metadata.message_id);

const msg = messageLookup.get(r.metadata.message_id);
if (!msg) continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same String() inconsistency in /context endpoint.

Apply consistent String() conversion here as well to match line 148.

🔧 Proposed fix
     const seen = new Set<string>();
     for (const r of results) {
-      if (seen.has(r.metadata.message_id)) continue;
-      seen.add(r.metadata.message_id);
+      const msgId = String(r.metadata.message_id);
+      if (seen.has(msgId)) continue;
+      seen.add(msgId);

-      const msg = messageLookup.get(r.metadata.message_id);
+      const msg = messageLookup.get(msgId);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const seen = new Set<string>();
for (const r of results) {
if (seen.has(r.metadata.message_id)) continue;
seen.add(r.metadata.message_id);
const msg = messageLookup.get(r.metadata.message_id);
if (!msg) continue;
const seen = new Set<string>();
for (const r of results) {
const msgId = String(r.metadata.message_id);
if (seen.has(msgId)) continue;
seen.add(msgId);
const msg = messageLookup.get(msgId);
if (!msg) continue;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/routes/v2/conversations/semantic.ts` around lines 172 - 178,
The code uses r.metadata.message_id directly when checking and adding to the
seen Set and when looking up in messageLookup, causing inconsistency with the
String() normalization used earlier; update the three usages to normalize the id
by calling String(r.metadata.message_id) in seen.has(...), seen.add(...), and
messageLookup.get(...) so the dedup logic matches the behavior on line 148 and
avoids mismatches between primitive types.

Comment on lines +44 to +97
for (const project of projectsWithRetention) {
const projectStart = Date.now();
if (projectStart - start > TIME_BUDGET_MS) {
console.warn(
JSON.stringify({
level: "warn",
event: "retention_time_budget_exceeded",
message: "Stopping retention cleanup — time budget exceeded",
elapsedMs: Date.now() - start,
}),
);
break;
}

const retentionDays = project.retentionDays as number;
const cutoff = Date.now() - retentionDays * 86_400_000;
let totalDeleted = 0;
let batchCount = 0;

// eslint-disable-next-line no-constant-condition
while (true) {
if (Date.now() - start > TIME_BUDGET_MS) break;

// Fetch a batch of expired conversation IDs
const expired = await db
.select({ id: conversations.id })
.from(conversations)
.where(
and(
eq(conversations.projectId, project.id),
lt(conversations.updatedAt, cutoff),
),
)
.limit(BATCH_SIZE);

if (expired.length === 0) break;

const ids = expired.map((r) => r.id);

// Batch delete: tags → messages → conversations (order respects FK constraints)
await db.batch([
db
.delete(conversationTags)
.where(inArray(conversationTags.conversationId, ids)),
db.delete(messages).where(inArray(messages.conversationId, ids)),
db.delete(conversations).where(inArray(conversations.id, ids)),
]);

totalDeleted += ids.length;
batchCount++;

// If we got fewer than BATCH_SIZE, this was the last batch
if (expired.length < BATCH_SIZE) break;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This loop can starve later projects indefinitely.

Each run starts from the first retention-enabled project and keeps deleting until that project is drained or the 25s budget is exhausted. If early projects keep producing enough expired rows, later projects may never be processed, so their retention policy is effectively unenforced. Cap work per project or persist a resume cursor between runs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/services/retention.ts` around lines 44 - 97, The current
loop over projectsWithRetention with an unbounded per-project while(true) can
starve later projects; modify the retention loop (involving
projectsWithRetention, TIME_BUDGET_MS, BATCH_SIZE, and the inner deletion logic
that calls db.batch and queries conversations/messages/conversationTags) to
enforce a per-project cap—e.g., limit max batches or max deleted rows per
project (add a maxBatchesPerProject or maxDeletesPerProject counter and break to
the next project when reached) and still respect the global TIME_BUDGET_MS;
alternatively (or additionally) persist a resume cursor
(lastProcessedConversationId or lastProjectCursor) between runs so subsequent
executions continue where the previous run left off.

Comment on lines +68 to +90
const expired = await db
.select({ id: conversations.id })
.from(conversations)
.where(
and(
eq(conversations.projectId, project.id),
lt(conversations.updatedAt, cutoff),
),
)
.limit(BATCH_SIZE);

if (expired.length === 0) break;

const ids = expired.map((r) => r.id);

// Batch delete: tags → messages → conversations (order respects FK constraints)
await db.batch([
db
.delete(conversationTags)
.where(inArray(conversationTags.conversationId, ids)),
db.delete(messages).where(inArray(messages.conversationId, ids)),
db.delete(conversations).where(inArray(conversations.id, ids)),
]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Re-check expiry when you delete.

Lines 68-77 select candidate ids, then Lines 84-89 delete by bare id. A concurrent append can bump conversations.updated_at after selection but before the batch runs, and this code will still delete that now-active conversation plus its messages/tags. Re-apply updated_at < cutoff in the delete subqueries, or make the selection+delete atomic, to close the TOCTOU gap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/api/src/services/retention.ts` around lines 68 - 90, The deletion
currently has a TOCTOU gap: after selecting expired ids (expired / ids) a
concurrent update can resurrect a conversation before db.batch runs; update the
three deletes in db.batch (conversationTags, messages, conversations) to
re-apply the expiry predicate so only conversations still satisfying
lt(conversations.updatedAt, cutoff) are removed — e.g., restrict deletions to
ids that are both in ids and still present in a subquery/select from
conversations where eq(projectId, project.id) and lt(updatedAt, cutoff); modify
the where clauses for the conversationTags/messages deletes to use that same
filtered conversation-id set so tags/messages for resurrected conversations are
not deleted.

Comment on lines +35 to +48
function ColumnPickerTrigger({ show, onToggle }: ColumnPickerTriggerProps) {
return (
<div className="relative">
<Button
size="sm"
variant="outline"
type="button"
onClick={onToggle}
aria-expanded={show}
aria-haspopup="menu"
>
Columns
</Button>
</div>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Associate the trigger with the popup for better a11y (and drop redundant wrapper).

aria-expanded is present, but the button isn’t linked to the controlled popup via aria-controls. Also, the relative wrapper has no effect in this component.

Suggested patch
 interface ColumnPickerTriggerProps {
   show: boolean;
   onToggle: () => void;
+  controlsId?: string;
 }
@@
-function ColumnPickerTrigger({ show, onToggle }: ColumnPickerTriggerProps) {
+function ColumnPickerTrigger({ show, onToggle, controlsId }: ColumnPickerTriggerProps) {
   return (
-    <div className="relative">
-      <Button
-        size="sm"
-        variant="outline"
-        type="button"
-        onClick={onToggle}
-        aria-expanded={show}
-        aria-haspopup="menu"
-      >
-        Columns
-      </Button>
-    </div>
+    <Button
+      size="sm"
+      variant="outline"
+      type="button"
+      onClick={onToggle}
+      aria-expanded={show}
+      aria-haspopup="menu"
+      aria-controls={controlsId}
+    >
+      Columns
+    </Button>
   );
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function ColumnPickerTrigger({ show, onToggle }: ColumnPickerTriggerProps) {
return (
<div className="relative">
<Button
size="sm"
variant="outline"
type="button"
onClick={onToggle}
aria-expanded={show}
aria-haspopup="menu"
>
Columns
</Button>
</div>
interface ColumnPickerTriggerProps {
show: boolean;
onToggle: () => void;
controlsId?: string;
}
function ColumnPickerTrigger({ show, onToggle, controlsId }: ColumnPickerTriggerProps) {
return (
<Button
size="sm"
variant="outline"
type="button"
onClick={onToggle}
aria-expanded={show}
aria-haspopup="menu"
aria-controls={controlsId}
>
Columns
</Button>
);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/dashboard/src/app/dashboard/project/data-tab/conversations-header.tsx`
around lines 35 - 48, ColumnPickerTrigger should link the button to the popup
via aria-controls and drop the unnecessary wrapper: update the
ColumnPickerTrigger signature to accept a unique popup id prop (e.g., controlsId
or menuId) in addition to show and onToggle, remove the surrounding <div
className="relative">, and set aria-controls={controlsId} on the Button while
keeping aria-expanded={show}, aria-haspopup="menu" and the existing handlers so
the button is programmatically associated with the controlled popup.

Comment on lines +305 to +314
export interface SemanticSearchRequest {
query: string;
limit?: number;
filter?: {
project_id?: string;
date_from?: number;
date_to?: number;
tags?: string[];
};
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Filter fields declared but not implemented.

date_from, date_to, and tags filters are declared in the type but the implementation in semantic.ts doesn't use them. Consider removing unimplemented fields or marking them with JSDoc as "reserved for future use" to set proper expectations for API consumers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/src/index.ts` around lines 305 - 314, The
SemanticSearchRequest type declares filter fields (date_from, date_to, tags)
that aren’t used by the semantic search implementation; either remove those
fields from the interface or explicitly mark them as reserved/unused with JSDoc
so consumers know they’re not implemented. Update the SemanticSearchRequest
declaration in shared/src/index.ts (and any usages) to either drop
date_from/date_to/tags or add JSDoc comments like "@reserved for future use —
not used by semantic.ts" and keep semantic.ts’s search functions unchanged;
ensure TypeScript and tests compile after the change.

Comment on lines +334 to +338
export interface ContextRetrievalRequest {
query: string;
max_tokens?: number;
project_id: string;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

project_id declared as required but ignored by implementation.

As noted in packages/api/src/routes/v2/conversations/semantic.ts, the implementation uses the authenticated project context rather than this request field. Either make project_id optional or document that it's ignored when authentication already scopes to a project.

🔧 Suggested fix to align with implementation
 export interface ContextRetrievalRequest {
   query: string;
   max_tokens?: number;
-  project_id: string;
+  project_id?: string; // Optional: defaults to authenticated project context
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export interface ContextRetrievalRequest {
query: string;
max_tokens?: number;
project_id: string;
}
export interface ContextRetrievalRequest {
query: string;
max_tokens?: number;
project_id?: string; // Optional: defaults to authenticated project context
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared/src/index.ts` around lines 334 - 338, The
ContextRetrievalRequest interface declares project_id as required but the
runtime in packages/api/src/routes/v2/conversations/semantic.ts ignores this
field and uses the authenticated project's context; update the interface
(ContextRetrievalRequest) to make project_id optional (project_id?: string) or
add a clear JSDoc comment on the ContextRetrievalRequest type stating that
project_id is ignored when requests are authenticated and the project is derived
from the auth context so callers aren’t misled; ensure any related type usages
respect the optionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant