Skip to content

feat(kilo-vscode): migrate legacy sessions into new extension#7924

Merged
imanolmzd-svg merged 63 commits intomainfrom
imanol/migration
Mar 31, 2026
Merged

feat(kilo-vscode): migrate legacy sessions into new extension#7924
imanolmzd-svg merged 63 commits intomainfrom
imanol/migration

Conversation

@imanolmzd-svg
Copy link
Copy Markdown
Contributor

@imanolmzd-svg imanolmzd-svg commented Mar 30, 2026

Context

  • This change adds session migration from the legacy VS Code extension into the new Kilo extension.
  • It focuses on preserving enough conversation structure for users to continue migrated sessions in the new UI.

Implementation

  • Backend:

    • Added Kilo-specific session import endpoints for project, session, message, and part under the Kilo routes to minimize upstream merge conflicts.
  • Frontend:

    • Added legacy session migration alongside the existing legacy settings/provider migration flow.
    • Detects legacy sessions from VS Code global storage.
    • Tracks migrated sessions locally so they are not repeatedly offered for migration.
    • Parses legacy sessions into the backend import format for projects, sessions, messages, and parts.
    • Reuses the backend-resolved projectID when inserting the migrated session.
    • Refreshes the session list after migration so successfully migrated sessions appear immediately.
  • Session parsing:

    • Preserves user and assistant messages with stable deterministic ids.
    • Preserves text content, completion results, tool activity, and reasoning content in a form the new extension can continue from.
    • Filters legacy prompt scaffolding such as standalone <environment_details> blocks from visible migrated conversation text.
    • Keeps attempt_completion visible as normal assistant text, so some final completion content may appear duplicated for now.

Out of Scope

  • Subagent sessions are not fully represented yet; the parent conversation is the main preserved experience.
  • Token counts, costs, and provider metadata from ui_messages.json are not migrated yet.
  • Rendering parity for every legacy tool case is not complete; the priority here is continuity and readable migrated history.

Video/Screenshots

  • Before
testing-7-before testing-8-before

Testing

  • Ran bun test for the new legacy migration unit tests in packages/kilo-vscode/tests/unit/legacy-migration/.
  • Ran bun run format in packages/kilo-vscode.
  • Ran bun run typecheck in packages/kilo-vscode.
  • Verified migrated sessions are inserted into kilo.db, appear in the correct project when the canonical project id is reused, and can be continued in the new extension.
  • Verified migrated sessions do not get offered repeatedly once marked as migrated.

Fixes after review feedback

  • The parsing treats the [ERROR] message as a part that's not shown, but it's in the new convo. So the agent has the right context
  • Some messages were hidden because they were on a tool_result type part, but they were user messages. Fixed that.
  • Checked images are ignored, the conversation is parsed without them
  • Checked I tell the agent "Use the edit tool to add X" in old conv, and then say "Use the tool again" on new conv, and it works.
  • You can delete your migrated convs on kilo.db and run the migration wizard again and it will do the parsing again

Comment thread packages/kilo-vscode/src/legacy-migration/sessions/lib/parts/parts.ts Outdated
…indexing

Move the role filter from inside `parseParts` to `parsePartsFromConversation`
so that message indices are derived only from the filtered list. This ensures
part message IDs stay aligned with the imported messages and skipped entries
(e.g. system role) do not cause index gaps in the generated IDs.
Comment thread packages/kilo-vscode/src/legacy-migration/sessions/lib/ids.ts
@markijbema
Copy link
Copy Markdown
Contributor

Note: This review was LLM-generated (Claude). Take it with a grain of salt -- some observations may be off-base or miss context that a human reviewer would have.

Summary

Overall this is solid work. The architecture is well-structured with clean separation between parsing, orchestration, and backend insertion. The deterministic ID scheme and upsert-everywhere approach make the migration idempotent, which is the right call for a feature like this.

Below are the issues I found, ordered roughly by severity.


Bugs / Correctness Issues

1. Conversation file is read twice per session (P2)

File: parser.ts:20-24

const messages = await createMessages(id, dir, item)
const parts = await createParts(id, dir, item)

Both createMessages and createParts independently call getApiConversationHistory(id, dir) which reads and parses the same api_conversation_history.json file from disk. For large legacy sessions this doubles the I/O and JSON parsing work.

Recommendation: Read and parse the file once in parseSession, then pass the parsed LegacyApiMessage[] array into both parseMessagesFromConversation and parsePartsFromConversation (which already exist as exported functions that accept a conversation array). The createMessages / createParts async wrappers become unnecessary.

2. cleanLegacyTaskText uses a non-greedy <task> regex (P3)

File: parts-util.ts:114

const task = input.match(/<task>([\s\S]*?)<\/task>/i)?.[1]?.trim()

The *? (non-greedy) quantifier means if a legacy message contains multiple <task> blocks (unlikely but possible with nested prompts or copy-paste), only the content of the first block up to the first </task> is kept. Everything between subsequent <task>...</task> pairs would be silently dropped.

This is probably fine in practice since the legacy extension only emits one <task> wrapper, but worth noting.

3. toText and toTextWithinMessage are identical functions (P3)

File: parts-builder.ts:47-97

toText and toTextWithinMessage have exactly the same implementation -- both call cleanLegacyTaskText(text), build an identical Text part, and return it. There is no behavioral difference between them.

Recommendation: Remove one and use the other everywhere, or add a comment explaining why the distinction exists if there's a planned divergence.

4. Empty-worktree sessions are silently skipped with no user feedback (P3)

File: service.ts:13-15 and project.ts:10

The backend throws if worktree is empty:

if (!input.worktree.trim()) {
  throw new Error("Legacy project import requires a non-empty worktree")
}

But the frontend happily creates projects with empty worktrees (project.ts:10: project.worktree = item?.workspace ?? ""). If a legacy session has no workspace field, the project POST will throw, migrate.ts catches it, and the session is reported as failed -- but the user just sees "Session migration failed" with no indication that the problem was a missing workspace path.

Recommendation: Either skip sessions with no workspace during detection (don't offer them for migration), or provide a more descriptive error message so the user understands why it failed.

5. isReasoning and isProviderSpecificReasoning can both fire, producing duplicate reasoning parts (P3)

File: parts.ts:61-73

if (isReasoning(entry)) {
  parts.push(toReasoning(..., entry.text))
}

if (isProviderSpecificReasoning(entry)) {
  const reasoning = getReasoningText(entry)
  if (reasoning) {
    parts.push(toReasoning(..., reasoning))
  }
}

If a message has type: "reasoning" with text: "foo" AND also has reasoning_content: "foo", both checks pass and two reasoning parts are emitted with the same (or overlapping) content. The createExtraPartID calls use different kind strings ("reasoning" vs "provider-reasoning") so the IDs won't collide, but the user sees the thinking duplicated.

Recommendation: Make the second check an else if, or deduplicate by content.


Design / Architecture Concerns

6. No error recovery mid-session -- partial imports leave orphaned data (P2)

File: migrate.ts:12-31

Messages and parts are inserted one by one in a loop. If the 5th message insert fails, the session row and first 4 messages are already committed to the DB. The catch block returns { ok: false } but doesn't roll back the partial data. The session is also NOT marked as migrated (line 33 is inside the try block), so the next migration attempt will re-import from scratch -- the upserts make this safe from a data perspective, but it means the user could end up with partially-visible sessions in their session list.

Recommendation: Consider wrapping the message+part inserts in a batch endpoint or a transaction on the backend side, so a session is either fully imported or not at all. Alternatively, accept the partial state but document the behavior.

7. Sequential HTTP requests per message and part -- O(n) round trips (P2)

File: migrate.ts:25-31

for (const msg of payload.messages) {
  await client.kilocode.sessionImport.message(msg, { throwOnError: true })
}
for (const part of payload.parts) {
  await client.kilocode.sessionImport.part(part, { throwOnError: true })
}

A typical legacy session might have 50+ messages and 100+ parts. That's 150+ sequential HTTP round trips to localhost. While latency is low for localhost, this could make migration of many sessions noticeably slow.

Recommendation: Consider a batch endpoint that accepts arrays of messages and parts, or at minimum use Promise.all with concurrency control (e.g. batches of 10). Since the backend upserts are independent of each other (no FK ordering issues given that the session row is already inserted), parallel insertion should be safe.

8. Project ID computed locally then ignored in favor of backend response (P3 - not a bug, just noisy)

File: parser.ts:21-22, migrate.ts:13-14

The parser creates a project with a locally-computed deterministic ID:

const project = createProject(item) // hashes workspace to get project.id
const session = createSession(id, item, project.id)

Then migrate.ts immediately overwrites the session's projectID with whatever the backend returns:

const projectID = project.data?.id ?? payload.project.id

The locally-computed project ID is never actually used (except as a fallback if the backend somehow returns no ID). The createProjectID function and the id field on the project payload are dead code in the happy path.

This is correct behavior (the backend should be the source of truth for project resolution), but the local ID computation is misleading -- it suggests the client controls the project ID when it doesn't.

9. Backend ProjectTable import is dead code (P3)

File: service.ts:2

import { ProjectTable } from "../../project/project.sql"

ProjectTable is imported but never used. The project() function delegates to Project.fromDirectory() which handles its own DB operations. The import should be removed.


Edge Cases / Robustness

10. parseFile silently returns [] for malformed JSON (P3)

File: legacy-conversation.ts:12

const json = JSON.parse(text) as unknown

If the JSON file is corrupt or truncated, JSON.parse throws and the error bubbles up uncaught through createMessages/createParts all the way to migrate.ts's catch block. The session fails silently. This is acceptable but a warning log would help debugging.

However, if the file contains valid JSON that isn't an array (e.g. {}), line 13 returns [] silently, producing a session with no messages. That session still gets inserted into the DB as an empty conversation.

11. readSessionsInGlobalStorage uses a fragile FileType.Directory lookup (P3)

File: migration-service.ts:115

const kind = (vscode as { FileType?: { Directory?: number } }).FileType?.Directory ?? 2

This casts vscode to access FileType.Directory with a fallback to 2. The cast is presumably to avoid a type error, but it's brittle -- if VS Code ever changes the enum value (unlikely, but) the fallback 2 would silently become wrong. A simpler approach would be to just use vscode.FileType.Directory directly (it's been stable since VS Code 1.30).

12. SHA-1 truncation to 26 hex chars gives ~104 bits of collision resistance (P4)

File: ids.ts:24

return `${prefix}_${hash(value).slice(0, 26)}`

SHA-1 produces 40 hex chars (160 bits). Truncating to 26 gives ~104 bits. For a migration tool processing at most thousands of sessions this is more than enough, but it's worth noting that the ID space is smaller than it appears. Not a real issue, just documenting the design choice.


Test Coverage Gaps

13. No test for sessions with no workspace (empty worktree path)

The project test (project.test.ts:27-34) tests createProject() with no arguments and confirms empty defaults, but there's no integration test that verifies what happens when migrate() sends an empty-worktree project to the backend and gets an error back. This is the scenario from issue #4 above.

14. No test for cleanLegacyTaskText with text that contains <task> but also other content outside it

E.g. "Some preamble\n<task>actual task</task>\nSome postamble" -- the current regex would extract "actual task" and drop the preamble and postamble. Whether that's correct depends on how the legacy extension actually formats these messages.

15. No test for the double file read in parseSession

parseSession calls both createMessages(id, dir, item) and createParts(id, dir, item) which both read the file. There's no test verifying they produce consistent results from the same file (they will, but the double-read itself isn't tested or asserted against).

16. No test for tool_result with no text content

merge-tools.ts:24 does getText(result.content) ?? name -- if result.content is an empty array or contains only non-text blocks, the output falls back to the tool name. This fallback behavior isn't tested.


Minor / Nits

  • parts-builder.ts: The record() import is used in toTool but also re-imported in merge-tools.ts. Not a problem, just noting the shared utility is used from two places.

  • migration-service.ts:225: The progress callback always reports "Chat sessions" as the item name for every session being migrated, rather than the individual session title. This means the UI progress shows the same label for all sessions rather than distinguishing them.

  • legacy-types.ts: The LegacyApiMessage type extends Anthropic.MessageParam with many optional fields. Some of these (like encrypted_content, condenseId, condenseParent, truncationId, truncationParent, isTruncationMarker) are defined but never read by any parsing code. They're harmless but add noise to the type definition.

@markijbema
Copy link
Copy Markdown
Contributor

Running this on my machine fails without feedback, then autocloses the screen, that doesnt seem like the best behaviour. It should at least keep open the migration wizard when it fails, and it should show why it fails/what happens

CleanShot 2026-03-30 at 18 25 24

Prefix all migrated entity IDs (sessions, messages, parts) with a
`_migrated_` segment so they are clearly distinguishable from IDs
produced by the regular runtime. Updates tests to assert the new
prefix format.
…rseSession`

Remove the intermediate `createMessages` and `createParts` async
wrappers that each independently read and parsed the conversation
file. The file is now fetched and parsed once in `parseSession`,
with the resulting conversation passed directly to
`parseMessagesFromConversation` and `parsePartsFromConversation`,
eliminating redundant I/O.
…ectly

`toTextWithinMessage` was a duplicate of `toText` with identical
behavior. Replace all call sites with `toText` and delete the
redundant function.
…asoning entry exists

Guard `isProviderSpecificReasoning` with `!isReasoning` to prevent
duplicate reasoning parts when an entry carries both a `type:
"reasoning"` field and a provider-specific reasoning field (e.g.
`reasoning_content`). The explicit reasoning entry takes precedence.
…s parsing

Add unit tests for two previously uncovered scenarios:

- Task block extraction: verify only the content inside `<task>` tags
  is kept when text exists outside the legacy task wrapper
- Empty tool result: verify the tool name is used as fallback output
  when `tool_result` carries no readable text content

Also replace the dynamic `FileType.Directory` workaround with the
direct `vscode.FileType.Directory` enum reference, and harden
`parseFile` to throw on non-array JSON instead of silently returning
an empty array.
@imanolmzd-svg
Copy link
Copy Markdown
Contributor Author

The error handling changes will be sent as a separate PR @markijbema

@marius-kilocode
Copy link
Copy Markdown
Collaborator

@imanolmzd-svg Can you confirm that this is reversible for users? We do not risk any data loss with this migration?

@imanolmzd-svg
Copy link
Copy Markdown
Contributor Author

@marius-kilocode A few things:

There's an option to delete legacy settings that is currently not shown because of a bug.

In the migration error handling PR that I'm preparing, it shows fine.

Still, I'm commenting it out so that we don't delete legacy settings, at least for now.

We don't risk data loss from old sessions, but we do store a list of sessions successfully migrated. If they run the migration wizard again, they won't be able to reinsert those conversations.

Thinking about it, I should remove that check, so session migration can be reattempted.

There are other checks in place that prevent inserting duplicates. So that way we don't insert duplicates but the migration can be attempted more than once.

I'll update the code

… state

Drop the `readSessionsToMigrate` filter and the post-migration
`globalState` flag that tracked which sessions had already been
migrated. Detection now returns all sessions found in global storage
without filtering by prior migration status.
Check for an existing session row before inserting and return a
`skipped: true` flag when the session already exists. The migration
client honours this flag to bypass message and part imports, avoiding
redundant work when migration is re-run.

Replace `onConflictDoUpdate` with `onConflictDoNothing` on the session
insert path to align with the new pre-check behaviour. Propagate the
optional `skipped` field through the result types and generated SDK
types for project, session, message, and part responses.
Tag text parts whose content begins with `[ERROR]` with `ignored: true`
and a `legacy-system-error` metadata source. Introduces a new
`isLegacySystemErrorText` utility in `parts-util` to encapsulate the
detection logic.
… visible parts

Parse `<feedback>` tags embedded in `tool_result` content and emit a
dedicated text part so user feedback is surfaced as a visible message
rather than buried inside tool result data.

Introduces `getFeedbackText` utility to extract and trim the inner
content of `<feedback>` blocks from arbitrary input.
{ throwOnError: true },
)
// Skip child imports when the session already exists so rerunning migration only imports missing sessions.
if (session.data?.skipped) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Retrying a partially imported session now stops too early

If a previous migration managed to insert the session row and then failed while importing message or part records, this branch returns success on every retry and never backfills the missing children. Before this change the per-row upserts made retries self-healing; now a transient failure can leave the migrated conversation permanently incomplete unless the existing session row is deleted first.

@imanolmzd-svg imanolmzd-svg merged commit 80f187e into main Mar 31, 2026
17 checks passed
@imanolmzd-svg imanolmzd-svg deleted the imanol/migration branch March 31, 2026 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add session migration to the migration wizard Move sessions on migration

3 participants