Skip to content

feat(pipeline): ✨ rebuild AI Engine pipeline#9

Merged
sondoannam merged 17 commits intorefactor/processing-pipelinefrom
feat/rebuild-pipeline
Mar 22, 2026
Merged

feat(pipeline): ✨ rebuild AI Engine pipeline#9
sondoannam merged 17 commits intorefactor/processing-pipelinefrom
feat/rebuild-pipeline

Conversation

@sondoannam
Copy link
Copy Markdown
Contributor

@sondoannam sondoannam commented Mar 21, 2026

Summary by CodeRabbit

  • New Features

    • GET /media/:id/artifacts to view durable subtitle artifact inventory
    • Player placeholder screen and “Open final artifact” download action
    • “Ready” badge on media cards when subtitles are available
  • Improvements

    • Processing UI now shows artifact counts and a resume-friendly, artifact-driven status flow
    • Unified pipeline stage labeling (PROCESSING) and socket-first status hydration
    • Simplified submission flow (no processing mode selection)
  • Bug Fixes

    • Enforced monotonic progress updates and stronger event/upload ordering guarantees

- apps/ai-engine/tests/test_streaming_contracts.py
- apps/ai-engine/src/minio_client.py
- apps/ai-engine/src/schemas.py
- apps/ai-engine/ARCHITECTURE_CONTEXT.md
- .gsd/milestones/M001/slices/S02/tasks/T01-SUMMARY.md
- .gsd/milestones/M001/slices/S02/S02-PLAN.md
- .gsd/STATE.md
- apps/ai-engine/src/schemas.py
- apps/ai-engine/src/async_pipeline.py
- apps/mobile-app/src/types/subtitle.ts
- apps/ai-engine/tests/test_streaming_contracts.py
- apps/mobile-app/src/types/subtitle.ts
- apps/mobile-app/src/hooks/useProcessingSubtitles.ts
- apps/ai-engine/ARCHITECTURE_CONTEXT.md
- apps/ai-engine/src/scripts/test_v2_pipeline.py
- apps/ai-engine/tests/test_first_batch_streaming.py
- apps/ai-engine/ARCHITECTURE_CONTEXT.md
- .gsd/KNOWLEDGE.md
- .gsd/DECISIONS.md
- apps/ai-engine/tests/test_event_discipline.py
- .gsd/milestones/M001/slices/S03/tasks/T01-PLAN.md
- apps/ai-engine/src/scripts/test_v2_pipeline.py
- .gsd/KNOWLEDGE.md
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 21, 2026

📝 Walkthrough

Walkthrough

Migrates AI Engine from a V1 LLM-centered translator to a V2 async pipeline using NMT (NLLB via CTranslate2) with optional LLM refinement; removes processingMode from contracts; adds durable MinIO artifact inventory, socket media rooms, artifact-driven mobile UI, and new artifact/event contract tests and scripts.

Changes

Cohort / File(s) Summary
AI Engine — V2 Pipeline & entry
apps/ai-engine/src/async_pipeline.py, apps/ai-engine/src/pipelines.py, apps/ai-engine/src/main.py
Added async producer/consumer pipeline (run_v2_pipeline_async/run_v2_pipeline), unified entrypoint, removed processingMode job branching.
AI Engine — NMT & LLM refinement
apps/ai-engine/src/core/nmt_translator.py, apps/ai-engine/src/core/llm_provider.py, apps/ai-engine/src/core/prompts.py, apps/ai-engine/src/config.py
Introduced NMTTranslator (CTranslate2) and NMT config; refactored LLMProvider to offer generic generate() + refine_batch(); replaced legacy translation prompts.
AI Engine — Removed V1 components
apps/ai-engine/src/core/translator_engine.py, apps/ai-engine/src/incremental_pipeline.py, apps/ai-engine/src/scripts/test_phase*.py, apps/ai-engine/tests/test_two_tier_streaming.py
Deleted legacy TranslatorEngine, IncrementalPipeline, several phase test scripts, and old two-tier streaming tests.
AI Engine — Schema, clients, helpers
apps/ai-engine/src/schemas.py, apps/ai-engine/src/minio_client.py, apps/ai-engine/src/db.py, apps/ai-engine/src/core/semantic_merger.py, apps/ai-engine/src/core/smart_aligner.py
Added detected_lang and segment_index to Sentence; canonical MinIO key helpers; DB progress clamping; homophone-correction API; SmartAligner sets detected_lang.
AI Engine — Tests & harness scripts
apps/ai-engine/tests/*, apps/ai-engine/src/scripts/test_v2_pipeline.py, apps/ai-engine/src/scripts/{convert_nllb,eval_nllb}.py
Added contract/unit tests validating artifact shapes and event ordering; added live/fake harness and NLLB conversion/eval scripts.
Backend — DB & queue contracts
apps/backend-api/prisma/schema.prisma, apps/backend-api/prisma/migrations/*, apps/backend-api/src/modules/queue/queue.types.ts
Removed ProcessingMode enum/column and dropped processingMode from queue payload types; added artifact_summary migration.
Backend — MinIO artifact discovery & DTOs
apps/backend-api/src/modules/minio/minio.service.ts, apps/backend-api/src/modules/media/dto/response.dto.ts, apps/backend-api/src/modules/media/media.service.ts, apps/backend-api/src/modules/media/media.controller.ts, apps/backend-api/src/modules/media/media-artifact-summary.ts
Added listProcessedArtifacts, artifact inventory types, artifact summary utils, GET /media/:id/artifacts, and service methods to record/merge artifact summaries.
Backend — Socket & event handling
apps/backend-api/src/modules/socket/*
Added media room join/leave handlers, MediaEvent parsing/validation, mapped socket event names, redaction, and artifact-cache refresh hooks; emit to user and media rooms.
Mobile — Types & artifacts
apps/mobile-app/src/types/*, apps/mobile-app/src/constants/endpoint.ts
Replaced ProcessingMode with MediaPipelineStage; added artifact types and MEDIA_ARTIFACTS endpoint; added detected_lang and segment_index to subtitle types; adjusted responses (jobId, expiresIn numeric).
Mobile — Hooks, services & UI
apps/mobile-app/src/hooks/*, apps/mobile-app/src/services/media.services.ts, apps/mobile-app/src/app/(app)/processing.tsx, apps/mobile-app/src/app/(app)/player.tsx, apps/mobile-app/src/components/media/*
Switched to artifact-driven hooks (useMediaArtifacts, updated useProcessingSubtitles); socket-first sync patches artifacts; new player placeholder; processing screen shows artifact summary; media card player-ready badge; removed SubtitlePreview re-export.
Infrastructure & docs
.github/*, .gsd/*, .gitignore, .bg-shell/manifest.json, apps/ai-engine/ARCHITECTURE_CONTEXT.md, checkpoint.md
Updated Copilot/instruction docs to require NMT, added GSD state/milestones/plans, extended .gitignore, and added architecture/instructionation for V2 pipeline.

Sequence Diagram(s)

sequenceDiagram
    participant Queue as BullMQ Queue
    participant Main as main.py
    participant AsyncPipe as async_pipeline.py
    participant Producer as SmartAligner (thread)
    participant Q as asyncio.Queue
    participant Consumer as NMT Consumer
    participant MinIO as MinIO
    participant DB as PostgreSQL
    participant Redis as Redis Events

    Queue->>Main: job(mediaId, targetLanguage)
    Main->>AsyncPipe: run_v2_pipeline_async(...)
    AsyncPipe->>AsyncPipe: prepare audio, VAD
    AsyncPipe->>Q: create bounded queue(maxsize=4)

    par Producer Thread
        Producer->>MinIO: upload Tier 1 chunk
        Producer->>DB: update detected_lang on first chunk
        Producer->>Redis: publish_progress (transcription)
        Producer->>Q: enqueue sentence chunks (backpressure)
        Producer->>Q: send sentinel (None)
    and Consumer Coroutine
        Consumer->>Consumer: prefetch NMTTranslator singleton
        loop per chunk
            alt CJK
                Consumer->>Consumer: buffer & correct_homophones
            end
            Consumer->>NMTTranslator: translate_batch()
            Consumer->>AsyncPipe: optional LLM refine via LLMProvider
            Consumer->>MinIO: upload translated batch (Tier 2)
            Consumer->>Redis: publish_batch_ready
        end
        Consumer->>MinIO: upload final.json
        Consumer->>Redis: publish_completed
        Consumer->>DB: update_media_status(COMPLETED)
    end

    AsyncPipe->>Main: return SubtitleOutput
Loading
sequenceDiagram
    participant Mobile as Mobile App
    participant Backend as Backend API
    participant MinIO as MinIO
    participant Redis as Redis Pub/Sub
    participant Socket as Socket.IO Gateway

    Mobile->>Backend: GET /media/:id/artifacts
    Backend->>MinIO: listProcessedArtifacts(mediaId)
    MinIO->>Backend: return inventory
    Backend->>Mobile: MediaArtifactsResponse

    Mobile->>Socket: media_join {mediaId}
    Socket->>Backend: validate ownership
    Socket->>Mobile: join media_{mediaId} ack

    Redis->>Socket: media_updates: batch_ready
    Socket->>Mobile: media_batch_ready (validated)
    Mobile->>Mobile: patchArtifacts (dedupe, sort by index)
    Mobile->>Mobile: render artifact summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~80 minutes

Possibly related PRs

Poem

🐰 Hopped into queues with a twitchy nose,

NMT hummed softly as CTranslate2 rose.
Chunks became batches, indices aligned,
Mobile found artifacts, progress defined.
A rabbit's small cheer for pipelines refined.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/rebuild-pipeline

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (19)
apps/ai-engine/src/core/semantic_merger.py (1)

176-180: Consider narrowing the blind Exception catch in batch handling.

Catching all exceptions here can hide coding errors; prefer specific expected failure types (or at least logger.exception) for better triage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/semantic_merger.py` around lines 176 - 180, The
catch-all except in the homophone batch handling (the block that logs "Homophone
batch {batch_idx + 1} failed" and extends result with
batch_sentences[:core_size]) is too broad; change it to catch only expected
failure types (e.g., ValueError, RuntimeError, KeyError or any specific
exception types raised by the homophone processing code) or, if you cannot
enumerate them, replace logger.error(...) with logger.exception(...) to capture
the traceback and then re-raise non-recoverable exceptions; update the except
clause around that block (referencing batch_idx, batch_sentences, core_size)
accordingly so unexpected errors are not silently swallowed.
.gsd/milestones/M001/slices/S02/S02-PLAN.md (1)

25-29: Optional: make verification commands cross-platform.

Using ./venv/Scripts/python.exe is Windows-specific; documenting a python -m ... form (or both variants) would improve contributor portability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gsd/milestones/M001/slices/S02/S02-PLAN.md around lines 25 - 29, Update the
commands in S02-PLAN.md to be cross-platform by replacing Windows-only
invocations like "./venv/Scripts/python.exe -m pytest" and
"./venv/Scripts/python.exe -m src.scripts.test_v2_pipeline ..." with a portable
form such as "python -m pytest ..." and "python -m src.scripts.test_v2_pipeline
demo_audio_3.mp3 ..." (which uses the currently activated venv), and optionally
add the alternative venv-specific paths for Unix ("./venv/bin/python") and
Windows ("./venv/Scripts/python.exe") as fallbacks so contributors on any OS can
run pytest, test_v2_pipeline, and the other listed commands.
apps/mobile-app/src/components/media/MediaCard.tsx (1)

106-127: Consider accessibility for the "player ready" badge.

The badge uses a 10px font and 10px icon, which may be difficult to read for some users. Consider slightly increasing the text size or ensuring sufficient color contrast is verified against WCAG guidelines.

♿ Optional: Increase badge readability
   readyBadge: {
     position: "absolute",
     top: 6,
     left: 6,
     flexDirection: "row",
     alignItems: "center",
     gap: 4,
     paddingHorizontal: 8,
     paddingVertical: 4,
     borderRadius: 999,
   },
   readyBadgeText: {
-    fontSize: 10,
+    fontSize: 11,
     fontWeight: "700",
   },

And for the icon:

               <Ionicons
                 name="play"
-                size={10}
+                size={12}
                 color={theme.colors.textOnPrimary}
               />

Also applies to: 254-268

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/components/media/MediaCard.tsx` around lines 106 - 127,
The "player ready" badge (rendered when isPlayerReady) is too small and may fail
accessibility/contrast guidelines; update styles.readyBadgeText and the Ionicons
size (and possibly styles.readyBadge padding) to increase legibility (e.g., bump
fontSize and icon size slightly) and verify theme.colors.primary vs
theme.colors.textOnPrimary meet WCAG contrast; also add an accessibilityLabel on
the badge container referencing t("library.playerReady") so screen readers
announce it (check the other occurrence around lines 254-268 and apply the same
changes).
apps/mobile-app/src/app/(app)/index.tsx (1)

39-44: Avoid as any type assertion — use proper route typing.

The as any cast bypasses TypeScript's type checking for expo-router navigation. This can hide routing errors at compile time.

🔧 Suggested fix using proper expo-router typing

If ROUTES.PROCESSING is a typed route constant, consider using expo-router's typed navigation pattern:

   const handleMediaPress = (item: MediaItem) => {
-    router.push({
-      pathname: ROUTES.PROCESSING,
-      params: { id: item.id },
-    } as any);
+    router.push(`${ROUTES.PROCESSING}?id=${item.id}`);
   };

Or define proper route types in your routes configuration to avoid the cast entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/app/`(app)/index.tsx around lines 39 - 44, The code uses
an unsafe as any when calling router.push in handleMediaPress; remove the cast
and supply a properly typed route object (or a typed path string) so TypeScript
can check params. Update handleMediaPress to call router.push with either a
typed route param object that matches your app's route definitions for
ROUTES.PROCESSING (create or import the Processing route params type and use it
instead of any), or construct a typed string path like
router.push(`${ROUTES.PROCESSING}/${item.id}`) to avoid the cast; ensure you
reference handleMediaPress, router.push, ROUTES.PROCESSING and MediaItem when
making the change.
apps/ai-engine/src/core/smart_aligner.py (3)

329-334: Remove duplicate import re statement.

re is already imported at module level (line 16), making this local import redundant.

♻️ Proposed fix
     def _split_cjk_words(self, sentence: Sentence):
         """Pillar 3: Split CJK words into characters."""
-        import re
-
         new_words = []
         cjk_pattern = re.compile(r"[\u4e00-\u9fff]")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/smart_aligner.py` around lines 329 - 334, The
_split_cjk_words method currently re-imports the re module locally; remove the
redundant "import re" inside def _split_cjk_words(self, sentence: Sentence) and
rely on the module-level import instead so the function uses the existing re
symbol (cjk_pattern = re.compile(...)) without duplicating the import.

155-156: Prefix unused sr variable with underscore.

The sample rate returned by librosa.load is not used. Static analysis flagged this as unused.

♻️ Proposed fix
-        audio_full, sr = librosa.load(str(path), sr=16000)
+        audio_full, _sr = librosa.load(str(path), sr=16000)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/smart_aligner.py` around lines 155 - 156, The
assignment from librosa.load currently binds an unused variable sr; change it to
a prefixed-underscore name (e.g., _sr) to satisfy static analysis. Locate the
librosa.load call that assigns audio_full and sr (in smart_aligner.py) and
update the tuple unpacking to audio_full, _sr = librosa.load(...), leaving the
rest of the function (references to audio_full and any use of librosa)
untouched.

240-241: Variable sent shadows outer loop variable.

The inner loop variable sent (line 240) shadows the outer sent from line 232, which can cause confusion during debugging.

♻️ Proposed fix
-                    for sent in split_sents:
-                        sent.detected_lang = detected_lang
+                    for s in split_sents:
+                        s.detected_lang = detected_lang
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/smart_aligner.py` around lines 240 - 241, The inner
loop uses the name `sent`, which shadows the outer `sent` and risks confusion;
rename the inner loop variable (e.g., to `sub_sent` or `split_sent`) in the loop
over `split_sents` and update all uses (the assignment of `detected_lang`)
accordingly so the outer `sent` remains unmodified and the intent is clear.
apps/mobile-app/src/app/(app)/player.tsx (2)

23-28: Consider typing route params properly instead of as any.

The as any assertion bypasses TypeScript's route param checking. If expo-router supports typed routes in this project, consider defining the route params type to maintain type safety.

♻️ Potential improvement

If using expo-router's typed routes feature:

-          router.replace({
-            pathname: ROUTES.PROCESSING,
-            params: { id },
-          } as any)
+          router.replace({
+            pathname: ROUTES.PROCESSING,
+            params: { id },
+          })

This requires ensuring ROUTES.PROCESSING and its params are properly typed in your route definitions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/app/`(app)/player.tsx around lines 23 - 28, Replace the
unsafe cast "as any" on the router.replace call by using proper typed route
params for ROUTES.PROCESSING: define the route param interface that includes id
(e.g., ProcessingRouteParams { id: string }) in your expo-router typed routes
config, then call router.replace with the typed params for ROUTES.PROCESSING
(instead of casting) so the invocation of router.replace({ pathname:
ROUTES.PROCESSING, params: { id } }) is checked by TypeScript; update any route
definitions/exported ROUTES typing to include the new params so
ROUTES.PROCESSING and its params are strongly typed.

21-31: Add accessibility labels to interactive elements.

Both Pressable components (back button and primary action button) lack accessibilityLabel and accessibilityRole props, which impacts screen reader users.

♿ Proposed accessibility improvements
         <Pressable
           style={styles.backBtn}
+          accessibilityRole="button"
+          accessibilityLabel={t("playerPlaceholder.backAccessibility")}
           onPress={() =>
         <Pressable
           style={[styles.button, { backgroundColor: theme.colors.primary }]}
+          accessibilityRole="button"
+          accessibilityLabel={t("playerPlaceholder.back")}
           onPress={() =>

Also applies to: 71-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/app/`(app)/player.tsx around lines 21 - 31, The Pressable
interactive elements (the back button Pressable and the primary action Pressable
used elsewhere) are missing accessibility props; update the Pressable wrapping
the Ionicons (and the primary action Pressable referenced around the primary
action) to include accessibilityRole="button" and a descriptive
accessibilityLabel (e.g., "Back" or "Go back to processing" for the back
Pressable, and an appropriate label like "Play" or "Start playback" for the
primary action Pressable) so screen readers can identify them; ensure you update
the Pressable components around the Ionicons and the primary action handler
function names (router.replace usage / the primary action Pressable)
accordingly.
apps/backend-api/prisma/schema.prisma (1)

193-193: Consider encoding pipeline stages as a Prisma enum instead of free-form String.

Now that stage names are explicitly defined in the comment, keeping currentStep as String? risks invalid values being persisted. Constraining it at schema level would harden cross-service contract safety.

♻️ Suggested schema direction
+enum PipelineStep {
+  AUDIO_PREP
+  INSPECTING
+  VAD
+  PROCESSING
+  TRANSLATING
+  EXPORTING
+}
...
-  currentStep            String?     `@map`("current_step")
+  currentStep            PipelineStep? `@map`("current_step")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/prisma/schema.prisma` at line 193, Replace the free-form
String currentStep with a Prisma enum to enforce allowed pipeline stages: add a
Prisma enum (e.g., enum PipelineStage { AUDIO_PREP INSPECTING VAD PROCESSING
TRANSLATING EXPORTING }) and change the model field currentStep to type
PipelineStage? while preserving the `@map`("current_step") attribute; then run the
Prisma migration to update the database and regenerate the client so application
code (types/usages of currentStep) aligns with the new enum.
.gsd/milestones/M001/slices/S03/tasks/T01-PLAN.md (1)

33-36: Use cross-platform test commands in plan steps.

The hardcoded Windows interpreter path makes the plan less portable. Prefer venv activation + python -m pytest commands so the instructions work across environments.

📝 Proposed wording
-7. Run `cd apps/ai-engine && ./venv/Scripts/python.exe -m pytest tests/test_event_discipline.py -v` and confirm all tests pass.
-8. Run `cd apps/ai-engine && ./venv/Scripts/python.exe -m pytest tests/test_first_batch_streaming.py tests/test_streaming_contracts.py -q` to confirm no regressions.
+7. Run `cd apps/ai-engine && python -m pytest tests/test_event_discipline.py -v` and confirm all tests pass.
+8. Run `cd apps/ai-engine && python -m pytest tests/test_first_batch_streaming.py tests/test_streaming_contracts.py -q` to confirm no regressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gsd/milestones/M001/slices/S03/tasks/T01-PLAN.md around lines 33 - 36,
Replace the hardcoded Windows interpreter path ("./venv/Scripts/python.exe -m
pytest") used in the two plan steps with cross-platform instructions: instruct
users to activate the virtual environment (e.g., "source venv/bin/activate" on
Unix or "venv\\Scripts\\activate" on Windows) and then run "python -m pytest
<tests...>" (or instruct to use the venv python via "./venv/bin/python -m
pytest" when suggesting a direct executable) so the steps that currently call
"./venv/Scripts/python.exe -m pytest tests/..." become portable and work on Unix
and Windows.
apps/backend-api/src/modules/socket/socket.types.ts (1)

74-76: Consider excluding arrays from the record check.

isRecord returns true for arrays since typeof [] === 'object'. While parseMediaEvent will fail on arrays due to field checks, explicitly excluding arrays makes the intent clearer.

🔧 Proposed improvement
 function isRecord(value: unknown): value is UnknownRecord {
-  return typeof value === 'object' && value !== null;
+  return typeof value === 'object' && value !== null && !Array.isArray(value);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/socket/socket.types.ts` around lines 74 - 76,
The isRecord type guard currently returns true for arrays because typeof [] ===
'object'; update isRecord to explicitly exclude arrays by returning false when
Array.isArray(value) is true so it only narrows to plain object records (adjust
the function isRecord in socket.types to check Array.isArray and null). This
keeps intent clear and prevents parseMediaEvent and other consumers from
treating arrays as records.
apps/ai-engine/tests/test_first_batch_streaming.py (1)

147-152: Annotate the mutable class attribute to silence the linter warning.

The static analysis tool flagged this mutable class attribute. Since this is intentional test infrastructure for tracking instances across tests, add a type annotation to document the intent and silence the warning.

🔧 Proposed fix
+from typing import ClassVar
+
 class FakeMinioClient:
-    instances: list["FakeMinioClient"] = []
+    instances: ClassVar[list["FakeMinioClient"]] = []
 
     def __init__(self) -> None:
         self.uploads: list[dict] = []
         FakeMinioClient.instances.append(self)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/tests/test_first_batch_streaming.py` around lines 147 - 152,
The linter flags the mutable class attribute "instances" on FakeMinioClient;
annotate it as a class variable to document intent and silence the warning by
adding a ClassVar type (e.g., ClassVar[List["FakeMinioClient"]]) and importing
ClassVar and List from typing (or typing_extensions if needed), leaving the
initializer and usage unchanged.
apps/mobile-app/src/app/(app)/processing.tsx (1)

88-99: Consider logging the error before showing the alert.

The catch block swallows the error details, which makes debugging harder in production. While the user-facing alert is appropriate, consider logging the error for diagnostics.

🔧 Proposed improvement
   const openFinalArtifact = async () => {
     const finalUrl = artifacts?.final?.url;
     if (!finalUrl) {
       return;
     }
 
     try {
       await Linking.openURL(finalUrl);
     } catch {
+      console.error("Failed to open final artifact URL:", finalUrl);
       Alert.alert(t("download.errorTitle"), t("download.errorMessage"));
     }
   };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/app/`(app)/processing.tsx around lines 88 - 99, The catch
block in openFinalArtifact currently swallows errors from Linking.openURL and
only shows Alert.alert; update it to capture the caught error (e) and log it
before showing the alert so diagnostics are preserved (e.g., call console.error
or the app's logger/Sentry with the error and context like finalUrl). Ensure you
reference the same function openFinalArtifact and keep the existing
Alert.alert(t("download.errorTitle"), t("download.errorMessage")) behavior after
logging.
apps/mobile-app/src/hooks/useProcessingSubtitles.ts (1)

17-24: Consider adding error context to the fetch helper.

The fetchArtifactJson helper throws a generic error with just the status code. For debugging, consider including the URL or more context in the error message.

🔧 Optional: Add URL context to error message
 async function fetchArtifactJson<T>(url: string): Promise<T> {
   const response = await fetch(url);
   if (!response.ok) {
-    throw new Error(`Failed to fetch artifact: ${response.status}`);
+    throw new Error(`Failed to fetch artifact from ${url}: ${response.status}`);
   }

   return (await response.json()) as T;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/hooks/useProcessingSubtitles.ts` around lines 17 - 24,
The fetchArtifactJson helper throws a generic error with only the status code;
update fetchArtifactJson to include more context (e.g., the requested url and
status or statusText) in the thrown Error so callers can debug which resource
failed—modify the function fetchArtifactJson(url: string) to construct an error
message containing the url and response.status/response.statusText (and
optionally response.url) before throwing.
apps/ai-engine/src/core/llm_provider.py (2)

178-178: Use explicit Optional type hint per PEP 484.

The parameter system_prompt: str = None uses implicit Optional, which is discouraged. Use explicit union syntax for clarity.

🔧 Proposed fix
-    def generate(self, prompt: str, system_prompt: str = None) -> str:
+    def generate(self, prompt: str, system_prompt: str | None = None) -> str:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/llm_provider.py` at line 178, Update the generate
function signature to use an explicit Optional type per PEP 484: import Optional
from typing (or use typing.Optional) and change the parameter annotation in
generate(prompt: str, system_prompt: Optional[str] = None) -> str; ensure the
import is added/merged with existing typing imports and update any references to
generate if necessary.

276-280: Consider adding strict=True to zip() for future safety.

While the length validation at lines 238-243 ensures equal lengths, adding strict=True to the zip() call provides an additional safety net if the validation logic changes in the future.

🔧 Optional: Add strict parameter
-        for i, (src, draft) in enumerate(zip(sources, nmt_translations)):
+        for i, (src, draft) in enumerate(zip(sources, nmt_translations, strict=True)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/llm_provider.py` around lines 276 - 280, The zip
between sources and nmt_translations when building numbered source/draft pairs
should use strict=True to guard against mismatched lengths in the future; update
the enumerate(zip(sources, nmt_translations)) call inside the block that builds
lines (the loop that appends f"[{i}] SOURCE: {src} / DRAFT: {draft}") to
enumerate(zip(sources, nmt_translations, strict=True)) so any length mismatch
raises immediately (ensure target runtime supports Python 3.10+).
apps/ai-engine/src/core/nmt_translator.py (1)

172-175: Consider replacing assert with explicit error handling for production.

The assert statement ensures 1:1 mapping between inputs and outputs, but assert statements can be disabled when Python runs with optimization flags (-O or -OO). For production code, an explicit check with a raised exception is safer.

🛡️ Proposed fix: Use explicit exception
-        assert len(results) == len(texts), (
-            f"NMTTranslator: 1:1 mapping broken — "
-            f"got {len(results)} results for {len(texts)} inputs"
-        )
+        if len(results) != len(texts):
+            raise RuntimeError(
+                f"NMTTranslator: 1:1 mapping broken — "
+                f"got {len(results)} results for {len(texts)} inputs"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/nmt_translator.py` around lines 172 - 175, Replace
the runtime-only assert in NMTTranslator that checks len(results) == len(texts)
with an explicit conditional that raises a clear exception (e.g., ValueError or
RuntimeError) when the lengths differ; locate the assertion comparing results
and texts in nmt_translator.py and change it to a guarded if that raises an
error containing the same diagnostic message (including the counts) so the check
cannot be disabled by Python optimizations.
apps/ai-engine/tests/test_streaming_contracts.py (1)

387-402: Make this test actually break positional assumptions.

all_segments is still built in ascending segment_index order, so a consumer that accidentally overlays by array position would also pass here. Reordering the final segments before filtering would make the test prove identity-based matching instead of incidental ordering.

Suggested fix
-    all_segments = [_make_sentence(j, segment_index=j) for j in range(10)]
+    all_segments = list(
+        reversed([_make_sentence(j, segment_index=j) for j in range(10)])
+    )

     # Consumer matching: find final segments that overlap the batch range
     batch_range = range(
         batch.first_segment_index,
         batch.first_segment_index + len(batch.segments),
     )
-    matched = [s for s in all_segments if s.segment_index in batch_range]
+    matched = sorted(
+        (s for s in all_segments if s.segment_index in batch_range),
+        key=lambda s: s.segment_index,
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/tests/test_streaming_contracts.py` around lines 387 - 402, The
test currently builds all_segments in ascending segment_index order so
positional-matching bugs can still pass; to force identity-based matching,
reorder all_segments (e.g., shuffle or reverse) before computing matched so the
array positions no longer align with segment_index. Update the test around
all_segments/_make_sentence and the subsequent filtering (batch_range, matched,
batch.segments) so matched is derived from a randomized final-segment ordering,
keeping the assertions the same to ensure matching relies on segment_index
rather than array position.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gsd/milestones/M001/slices/S03/tasks/T02-PLAN.md:
- Line 44: Update the phrasing in the listed bullet referencing
`apps/ai-engine/src/scripts/test_v2_pipeline.py`: replace the awkward phrase
"required automated gate" with the clearer wording "required automated
verification gate" (or optionally "required gate for milestone completion") so
the line reads e.g. "`apps/ai-engine/src/scripts/test_v2_pipeline.py` — any
interceptor support kept as an optional manual-inspection surface rather than a
required automated verification gate".

In @.gsd/STATE.md:
- Line 3: The milestone registry entry uses placeholder names (e.g., "M002:
M002", "M003", "M004"); replace each placeholder milestone identifier with a
meaningful, short description that reflects the milestone goal (for example
change "M002: M002" to "M002: Implement user onboarding flow", and similarly
update "M003" and "M004" with their intended deliverables). Ensure the updated
entries are concise, human-readable, and consistent with the project's milestone
naming convention so project tracking and references to these IDs are clear.

In `@apps/ai-engine/ARCHITECTURE_CONTEXT.md`:
- Around line 358-365: Update the "## 10. Verification and test surface" section
to include the missing test file by adding `tests/test_event_discipline.py` to
the list alongside `tests/test_streaming_contracts.py`; edit the paragraph under
that header (where the current automated test file is listed) so it enumerates
both `tests/test_streaming_contracts.py` and `tests/test_event_discipline.py` to
accurately reflect the maintained proof surface.

In `@apps/ai-engine/INSTRUCTION.md`:
- Around line 26-29: The MEDIUM profile in INSTRUCTION.md is out of sync with
the runtime (docs say "Beam Size = 5" but implementation uses 3); update the
MEDIUM entry to read "Beam Size = 3" so it matches the runtime config and keep
the rest (int8_float16 mix, Batch Size = 4) unchanged; ensure the "MEDIUM"
heading and its settings line are updated consistently.

In `@apps/ai-engine/src/async_pipeline.py`:
- Around line 430-431: The loop that pairs `sentences` and `final_translations`
currently uses zip which silently truncates if lengths differ; update the
pairing to use Python's strict zip by changing the iteration to use
zip(sentences, final_translations, strict=True) so a ValueError is raised on
length mismatch (allowing the issue to surface immediately); keep the assignment
s.translation = t and, if desired, wrap the loop in a try/except to log or
re-raise the ValueError for clearer diagnostics.

In `@apps/ai-engine/src/config.py`:
- Around line 132-135: Add validation to fail fast when env provides
non-positive values: update the settings for NMT_BEAM_SIZE and CHUNK_SIZE (the
Field declarations in the config BaseSettings) to enforce a minimum of 1 (e.g.,
use Field(..., ge=1) or gt=0). If the config class uses pydantic validators, add
a validator for "NMT_BEAM_SIZE" and "CHUNK_SIZE" that raises a ValueError when
value <= 0 so parsing fails early. Ensure the change is applied to the existing
NMT_BEAM_SIZE and CHUNK_SIZE symbols so invalid env values are rejected during
startup.
- Around line 127-131: Change the NMT_COMPUTE_TYPE annotation in config.py from
plain str to a typing.Literal that includes all supported compute-type strings
(e.g., "float16", "int8_float16", "int8", "float32", and "default") so
validation catches invalid values early; update the Field declaration for
NMT_COMPUTE_TYPE accordingly and ensure the new Literal matches the fallback
handled in nmt_translator.py and eval_nllb.py.

In `@apps/ai-engine/src/core/semantic_merger.py`:
- Around line 167-175: The homophone-only path in correct_homophones calls
self._process_batch with SAFE_MERGE_CJK_PROMPT and cjk=True which reuses merge
reconstruction and can collapse segments; fix by adding a no-merge flag (e.g.,
forbid_merge=True or merge_mode='none') to the _process_batch call in
correct_homophones and update _process_batch to honor that flag by skipping any
multi-index/merge reconstruction logic (or by returning error/unchanged segments
if multi-index outputs would occur). Alternatively, after receiving corrected
from _process_batch validate that each corrected entry preserves the original
segment boundaries (no multi-index/merged outputs) and raise/split if violations
detected; reference symbols: correct_homophones, _process_batch,
SAFE_MERGE_CJK_PROMPT, cjk.

In `@apps/ai-engine/src/db.py`:
- Around line 78-79: The SQL update currently uses GREATEST(COALESCE(progress,
0), %s) which can still allow negative persistence if the incoming progress is
negative; modify the clause construction in db.py (where set_clauses is appended
and values.append(progress) is used) to ensure the comparison is against 0 and
the supplied progress value (e.g., GREATEST(COALESCE(progress, 0), 0, %s)) so
the stored progress is never below 0; update the set_clauses entry for progress
and keep values.append(progress) as the parameter.

In `@apps/ai-engine/src/scripts/convert_nllb.py`:
- Around line 10-12: Update the prerequisites comment in convert_nllb.py to
include the ctranslate2 dependency so users install the CLI used by the
ct2-transformers-converter; specifically, add "ct2-transformers" or instruct to
install the ctranslate2 package/CLI alongside "transformers[torch] sentencepiece
protobuf" in the top docstring, ensuring it matches the tool named in the error
fallback that references "ct2-transformers-converter" so users won’t hit the
runtime fallback on missing ctranslate2.

In `@apps/ai-engine/src/scripts/eval_nllb.py`:
- Around line 355-357: The label currently uses the requested args.compute_type
but NLLBEvaluator may have fallen back to a different compute type; change the
call to use the evaluator's actual/effective compute type when building the
label (e.g., replace label=f"primary ({args.compute_type})" with label=f"primary
({evaluator.compute_type})" or label=f"primary
({evaluator.get_effective_compute_type()})"), and if NLLBEvaluator lacks a
public property/method exposing the chosen compute type, add a simple read-only
property (e.g., compute_type or get_effective_compute_type) to NLLBEvaluator
that returns the resolved compute type and use that in run_full_evaluation.

In `@apps/ai-engine/src/scripts/test_v2_pipeline.py`:
- Around line 404-406: The upload_chunk method currently stores a live reference
to the incoming data (self.artifacts[object_key] = data), which can be mutated
later; change it to snapshot the payload before storing by assigning a deep copy
(e.g., use copy.deepcopy(data)) to self.artifacts[object_key] and add the
necessary import for copy so the stored chunk is immutable against later
pipeline mutations.
- Around line 524-527: The connections returned by _connect() are not being
closed by using "with self._connect() as conn" because psycopg2 connections
don't implement transactional context exit, so convert _connect into a proper
contextmanager (e.g., add `@contextlib.contextmanager` to _connect, yield the
psycopg2 connection, and ensure conn.close() in a finally block) or
alternatively ensure every caller (including places that call snapshot_media_row
and the traced status update sites) wraps the connection with
contextlib.closing(...) or explicitly calls conn.close() in a finally; update
the _connect definition and all call sites that currently use "with
self._connect() as conn" to use the new safe context so connections are always
closed.
- Around line 1204-1208: The teardown in the finally block currently calls
redis_capture.close() and db_fixture.cleanup() directly so any exception there
will replace the original test failure; change the finally to preserve and
re-raise the original exception: wrap each teardown call (redis_capture.close
and db_fixture.cleanup) in its own try/except, capture and log/suppress teardown
exceptions (or accumulate them) but do not overwrite the primary exception from
the pipeline/contract; ensure that if an original exception exists it is
re-raised after teardown exceptions are handled, and only surface teardown
errors when no primary exception was raised.

In `@apps/backend-api/src/modules/media/media.service.ts`:
- Around line 249-254: getMediaStatus() and getUserMediaList() currently call
minioService.listProcessedArtifacts() per media row (using inventory.summary)
which causes N+1 MinIO scans on hot read paths; change these methods to return a
stored or cached artifact summary field from the media record (e.g.,
media.artifactSummary or artifactsSummary) instead of calling
listProcessedArtifacts(), and move full inventory lookups behind the dedicated
artifacts endpoint (or a background updater). Update getMediaStatus(),
getUserMediaList(), and any referenced code that uses inventory.summary to read
the persisted/cached summary, and implement or call the existing background job
to update that summary when artifacts change so real-time endpoints don't
perform MinIO scans (this also applies to the code block covering the later
257-289 logic).

In `@apps/backend-api/src/modules/socket/socket.gateway.ts`:
- Around line 104-123: Wrap the isMediaOwnedByUser call in a try/catch inside
the media_join handler so any thrown errors are caught and a deterministic
acknowledgement is returned; on catch, log the error (use this.logger.error or
warn) including client.id, userId and mediaId, and return { ok: false, error:
'Media ownership check failed' } (or a clear error string) instead of letting
the exception bubble—keep the existing successful flow (getMediaRoom,
client.join, logger.debug, return { ok: true, room: roomName }) unchanged.

In `@apps/mobile-app/src/i18n/locales/vi/processing.json`:
- Line 52: The JSON key artifactSummary.chunks currently uses the English string
"Chunk" in the Vietnamese locale; update the value to an appropriate Vietnamese
translation (e.g., "Phân đoạn") by replacing the value for the "chunks" key in
the vi locale file (the artifactSummary.chunks entry) so it matches the rest of
the Vietnamese translations.

---

Nitpick comments:
In @.gsd/milestones/M001/slices/S02/S02-PLAN.md:
- Around line 25-29: Update the commands in S02-PLAN.md to be cross-platform by
replacing Windows-only invocations like "./venv/Scripts/python.exe -m pytest"
and "./venv/Scripts/python.exe -m src.scripts.test_v2_pipeline ..." with a
portable form such as "python -m pytest ..." and "python -m
src.scripts.test_v2_pipeline demo_audio_3.mp3 ..." (which uses the currently
activated venv), and optionally add the alternative venv-specific paths for Unix
("./venv/bin/python") and Windows ("./venv/Scripts/python.exe") as fallbacks so
contributors on any OS can run pytest, test_v2_pipeline, and the other listed
commands.

In @.gsd/milestones/M001/slices/S03/tasks/T01-PLAN.md:
- Around line 33-36: Replace the hardcoded Windows interpreter path
("./venv/Scripts/python.exe -m pytest") used in the two plan steps with
cross-platform instructions: instruct users to activate the virtual environment
(e.g., "source venv/bin/activate" on Unix or "venv\\Scripts\\activate" on
Windows) and then run "python -m pytest <tests...>" (or instruct to use the venv
python via "./venv/bin/python -m pytest" when suggesting a direct executable) so
the steps that currently call "./venv/Scripts/python.exe -m pytest tests/..."
become portable and work on Unix and Windows.

In `@apps/ai-engine/src/core/llm_provider.py`:
- Line 178: Update the generate function signature to use an explicit Optional
type per PEP 484: import Optional from typing (or use typing.Optional) and
change the parameter annotation in generate(prompt: str, system_prompt:
Optional[str] = None) -> str; ensure the import is added/merged with existing
typing imports and update any references to generate if necessary.
- Around line 276-280: The zip between sources and nmt_translations when
building numbered source/draft pairs should use strict=True to guard against
mismatched lengths in the future; update the enumerate(zip(sources,
nmt_translations)) call inside the block that builds lines (the loop that
appends f"[{i}] SOURCE: {src} / DRAFT: {draft}") to enumerate(zip(sources,
nmt_translations, strict=True)) so any length mismatch raises immediately
(ensure target runtime supports Python 3.10+).

In `@apps/ai-engine/src/core/nmt_translator.py`:
- Around line 172-175: Replace the runtime-only assert in NMTTranslator that
checks len(results) == len(texts) with an explicit conditional that raises a
clear exception (e.g., ValueError or RuntimeError) when the lengths differ;
locate the assertion comparing results and texts in nmt_translator.py and change
it to a guarded if that raises an error containing the same diagnostic message
(including the counts) so the check cannot be disabled by Python optimizations.

In `@apps/ai-engine/src/core/semantic_merger.py`:
- Around line 176-180: The catch-all except in the homophone batch handling (the
block that logs "Homophone batch {batch_idx + 1} failed" and extends result with
batch_sentences[:core_size]) is too broad; change it to catch only expected
failure types (e.g., ValueError, RuntimeError, KeyError or any specific
exception types raised by the homophone processing code) or, if you cannot
enumerate them, replace logger.error(...) with logger.exception(...) to capture
the traceback and then re-raise non-recoverable exceptions; update the except
clause around that block (referencing batch_idx, batch_sentences, core_size)
accordingly so unexpected errors are not silently swallowed.

In `@apps/ai-engine/src/core/smart_aligner.py`:
- Around line 329-334: The _split_cjk_words method currently re-imports the re
module locally; remove the redundant "import re" inside def
_split_cjk_words(self, sentence: Sentence) and rely on the module-level import
instead so the function uses the existing re symbol (cjk_pattern =
re.compile(...)) without duplicating the import.
- Around line 155-156: The assignment from librosa.load currently binds an
unused variable sr; change it to a prefixed-underscore name (e.g., _sr) to
satisfy static analysis. Locate the librosa.load call that assigns audio_full
and sr (in smart_aligner.py) and update the tuple unpacking to audio_full, _sr =
librosa.load(...), leaving the rest of the function (references to audio_full
and any use of librosa) untouched.
- Around line 240-241: The inner loop uses the name `sent`, which shadows the
outer `sent` and risks confusion; rename the inner loop variable (e.g., to
`sub_sent` or `split_sent`) in the loop over `split_sents` and update all uses
(the assignment of `detected_lang`) accordingly so the outer `sent` remains
unmodified and the intent is clear.

In `@apps/ai-engine/tests/test_first_batch_streaming.py`:
- Around line 147-152: The linter flags the mutable class attribute "instances"
on FakeMinioClient; annotate it as a class variable to document intent and
silence the warning by adding a ClassVar type (e.g.,
ClassVar[List["FakeMinioClient"]]) and importing ClassVar and List from typing
(or typing_extensions if needed), leaving the initializer and usage unchanged.

In `@apps/ai-engine/tests/test_streaming_contracts.py`:
- Around line 387-402: The test currently builds all_segments in ascending
segment_index order so positional-matching bugs can still pass; to force
identity-based matching, reorder all_segments (e.g., shuffle or reverse) before
computing matched so the array positions no longer align with segment_index.
Update the test around all_segments/_make_sentence and the subsequent filtering
(batch_range, matched, batch.segments) so matched is derived from a randomized
final-segment ordering, keeping the assertions the same to ensure matching
relies on segment_index rather than array position.

In `@apps/backend-api/prisma/schema.prisma`:
- Line 193: Replace the free-form String currentStep with a Prisma enum to
enforce allowed pipeline stages: add a Prisma enum (e.g., enum PipelineStage {
AUDIO_PREP INSPECTING VAD PROCESSING TRANSLATING EXPORTING }) and change the
model field currentStep to type PipelineStage? while preserving the
`@map`("current_step") attribute; then run the Prisma migration to update the
database and regenerate the client so application code (types/usages of
currentStep) aligns with the new enum.

In `@apps/backend-api/src/modules/socket/socket.types.ts`:
- Around line 74-76: The isRecord type guard currently returns true for arrays
because typeof [] === 'object'; update isRecord to explicitly exclude arrays by
returning false when Array.isArray(value) is true so it only narrows to plain
object records (adjust the function isRecord in socket.types to check
Array.isArray and null). This keeps intent clear and prevents parseMediaEvent
and other consumers from treating arrays as records.

In `@apps/mobile-app/src/app/`(app)/index.tsx:
- Around line 39-44: The code uses an unsafe as any when calling router.push in
handleMediaPress; remove the cast and supply a properly typed route object (or a
typed path string) so TypeScript can check params. Update handleMediaPress to
call router.push with either a typed route param object that matches your app's
route definitions for ROUTES.PROCESSING (create or import the Processing route
params type and use it instead of any), or construct a typed string path like
router.push(`${ROUTES.PROCESSING}/${item.id}`) to avoid the cast; ensure you
reference handleMediaPress, router.push, ROUTES.PROCESSING and MediaItem when
making the change.

In `@apps/mobile-app/src/app/`(app)/player.tsx:
- Around line 23-28: Replace the unsafe cast "as any" on the router.replace call
by using proper typed route params for ROUTES.PROCESSING: define the route param
interface that includes id (e.g., ProcessingRouteParams { id: string }) in your
expo-router typed routes config, then call router.replace with the typed params
for ROUTES.PROCESSING (instead of casting) so the invocation of router.replace({
pathname: ROUTES.PROCESSING, params: { id } }) is checked by TypeScript; update
any route definitions/exported ROUTES typing to include the new params so
ROUTES.PROCESSING and its params are strongly typed.
- Around line 21-31: The Pressable interactive elements (the back button
Pressable and the primary action Pressable used elsewhere) are missing
accessibility props; update the Pressable wrapping the Ionicons (and the primary
action Pressable referenced around the primary action) to include
accessibilityRole="button" and a descriptive accessibilityLabel (e.g., "Back" or
"Go back to processing" for the back Pressable, and an appropriate label like
"Play" or "Start playback" for the primary action Pressable) so screen readers
can identify them; ensure you update the Pressable components around the
Ionicons and the primary action handler function names (router.replace usage /
the primary action Pressable) accordingly.

In `@apps/mobile-app/src/app/`(app)/processing.tsx:
- Around line 88-99: The catch block in openFinalArtifact currently swallows
errors from Linking.openURL and only shows Alert.alert; update it to capture the
caught error (e) and log it before showing the alert so diagnostics are
preserved (e.g., call console.error or the app's logger/Sentry with the error
and context like finalUrl). Ensure you reference the same function
openFinalArtifact and keep the existing Alert.alert(t("download.errorTitle"),
t("download.errorMessage")) behavior after logging.

In `@apps/mobile-app/src/components/media/MediaCard.tsx`:
- Around line 106-127: The "player ready" badge (rendered when isPlayerReady) is
too small and may fail accessibility/contrast guidelines; update
styles.readyBadgeText and the Ionicons size (and possibly styles.readyBadge
padding) to increase legibility (e.g., bump fontSize and icon size slightly) and
verify theme.colors.primary vs theme.colors.textOnPrimary meet WCAG contrast;
also add an accessibilityLabel on the badge container referencing
t("library.playerReady") so screen readers announce it (check the other
occurrence around lines 254-268 and apply the same changes).

In `@apps/mobile-app/src/hooks/useProcessingSubtitles.ts`:
- Around line 17-24: The fetchArtifactJson helper throws a generic error with
only the status code; update fetchArtifactJson to include more context (e.g.,
the requested url and status or statusText) in the thrown Error so callers can
debug which resource failed—modify the function fetchArtifactJson(url: string)
to construct an error message containing the url and
response.status/response.statusText (and optionally response.url) before
throwing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 308bd984-78c1-4e64-8c1f-df7b68cc8693

📥 Commits

Reviewing files that changed from the base of the PR and between 8f7b3b7 and 46daa0f.

📒 Files selected for processing (73)
  • .bg-shell/manifest.json
  • .github/copilot-instructions.md
  • .github/instructions/ai-engine.instructions.md
  • .gitignore
  • .gsd/STATE.md
  • .gsd/milestones/M001/slices/S02/S02-PLAN.md
  • .gsd/milestones/M001/slices/S02/tasks/T01-SUMMARY.md
  • .gsd/milestones/M001/slices/S03/S03-PLAN.md
  • .gsd/milestones/M001/slices/S03/tasks/T01-PLAN.md
  • .gsd/milestones/M001/slices/S03/tasks/T02-PLAN.md
  • apps/ai-engine/ARCHITECTURE_CONTEXT.md
  • apps/ai-engine/INSTRUCTION.md
  • apps/ai-engine/requirements.txt
  • apps/ai-engine/src/async_pipeline.py
  • apps/ai-engine/src/config.py
  • apps/ai-engine/src/core/llm_provider.py
  • apps/ai-engine/src/core/nmt_translator.py
  • apps/ai-engine/src/core/pipeline.py
  • apps/ai-engine/src/core/prompts.py
  • apps/ai-engine/src/core/semantic_merger.py
  • apps/ai-engine/src/core/smart_aligner.py
  • apps/ai-engine/src/core/translator_engine.py
  • apps/ai-engine/src/db.py
  • apps/ai-engine/src/incremental_pipeline.py
  • apps/ai-engine/src/main.py
  • apps/ai-engine/src/minio_client.py
  • apps/ai-engine/src/pipelines.py
  • apps/ai-engine/src/schemas.py
  • apps/ai-engine/src/scripts/convert_nllb.py
  • apps/ai-engine/src/scripts/eval_nllb.py
  • apps/ai-engine/src/scripts/test_phase1.py
  • apps/ai-engine/src/scripts/test_phase2.py
  • apps/ai-engine/src/scripts/test_phase2_real.py
  • apps/ai-engine/src/scripts/test_v2_pipeline.py
  • apps/ai-engine/tests/test_event_discipline.py
  • apps/ai-engine/tests/test_first_batch_streaming.py
  • apps/ai-engine/tests/test_streaming_contracts.py
  • apps/ai-engine/tests/test_two_tier_streaming.py
  • apps/backend-api/prisma/migrations/20260321103000_remove_processing_mode/migration.sql
  • apps/backend-api/prisma/schema.prisma
  • apps/backend-api/src/common/constants/enums.ts
  • apps/backend-api/src/modules/media/dto/request.dto.ts
  • apps/backend-api/src/modules/media/dto/response.dto.ts
  • apps/backend-api/src/modules/media/media.controller.ts
  • apps/backend-api/src/modules/media/media.service.ts
  • apps/backend-api/src/modules/media/workers/media.processor.ts
  • apps/backend-api/src/modules/minio/minio.service.ts
  • apps/backend-api/src/modules/queue/queue.types.ts
  • apps/backend-api/src/modules/socket/socket.gateway.ts
  • apps/backend-api/src/modules/socket/socket.module.ts
  • apps/backend-api/src/modules/socket/socket.service.ts
  • apps/backend-api/src/modules/socket/socket.types.ts
  • apps/mobile-app/src/app/(app)/index.tsx
  • apps/mobile-app/src/app/(app)/player.tsx
  • apps/mobile-app/src/app/(app)/processing.tsx
  • apps/mobile-app/src/components/index.ts
  • apps/mobile-app/src/components/media/MediaCard.tsx
  • apps/mobile-app/src/components/media/SubtitlePreview.tsx
  • apps/mobile-app/src/constants/endpoint.ts
  • apps/mobile-app/src/constants/pipeline.ts
  • apps/mobile-app/src/hooks/index.ts
  • apps/mobile-app/src/hooks/useMedia.ts
  • apps/mobile-app/src/hooks/useProcessingSubtitles.ts
  • apps/mobile-app/src/hooks/useSocketSync.ts
  • apps/mobile-app/src/i18n/locales/en/common.json
  • apps/mobile-app/src/i18n/locales/en/processing.json
  • apps/mobile-app/src/i18n/locales/vi/common.json
  • apps/mobile-app/src/i18n/locales/vi/processing.json
  • apps/mobile-app/src/services/media.services.ts
  • apps/mobile-app/src/services/socket.service.ts
  • apps/mobile-app/src/types/media.ts
  • apps/mobile-app/src/types/subtitle.ts
  • checkpoint.md
💤 Files with no reviewable changes (9)
  • apps/mobile-app/src/components/index.ts
  • apps/mobile-app/src/hooks/index.ts
  • apps/backend-api/src/modules/queue/queue.types.ts
  • apps/ai-engine/src/scripts/test_phase2_real.py
  • apps/ai-engine/src/scripts/test_phase2.py
  • apps/ai-engine/src/scripts/test_phase1.py
  • apps/ai-engine/tests/test_two_tier_streaming.py
  • apps/ai-engine/src/incremental_pipeline.py
  • apps/ai-engine/src/core/translator_engine.py


## Expected Output

- `apps/ai-engine/src/scripts/test_v2_pipeline.py` — any interceptor support kept as an optional manual-inspection surface rather than a required automated gate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Polish wording: “required automated gate” is awkward.

Use “required automated verification gate” (or “required gate for milestone completion”) for clearer phrasing.

🧰 Tools
🪛 LanguageTool

[style] ~44-~44: The double modal “required automated” is nonstandard (only accepted in certain dialects). Consider “to be automated”.
Context: ...spection surface rather than a required automated gate

(NEEDS_FIXED)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gsd/milestones/M001/slices/S03/tasks/T02-PLAN.md at line 44, Update the
phrasing in the listed bullet referencing
`apps/ai-engine/src/scripts/test_v2_pipeline.py`: replace the awkward phrase
"required automated gate" with the clearer wording "required automated
verification gate" (or optionally "required gate for milestone completion") so
the line reads e.g. "`apps/ai-engine/src/scripts/test_v2_pipeline.py` — any
interceptor support kept as an optional manual-inspection surface rather than a
required automated verification gate".

Comment thread .gsd/STATE.md
@@ -0,0 +1,21 @@
# GSD State

**Active Milestone:** M002: M002
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Placeholder milestone names should be filled in.

The milestone registry contains placeholder names (M002: M002, M003, M004) that should be updated with actual milestone descriptions to provide meaningful project tracking.

📝 Suggested improvement
-**Active Milestone:** M002: M002
+**Active Milestone:** M002: <descriptive milestone name>

 ## Milestone Registry
 - ✅ **M001:** AI Engine real-time core
-- 🔄 **M002:** M002
-- ⬜ **M003:** M003
-- ⬜ **M004:** M004
+- 🔄 **M002:** <describe milestone 2>
+- ⬜ **M003:** <describe milestone 3>
+- ⬜ **M004:** <describe milestone 4>

Also applies to: 9-12

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gsd/STATE.md at line 3, The milestone registry entry uses placeholder names
(e.g., "M002: M002", "M003", "M004"); replace each placeholder milestone
identifier with a meaningful, short description that reflects the milestone goal
(for example change "M002: M002" to "M002: Implement user onboarding flow", and
similarly update "M003" and "M004" with their intended deliverables). Ensure the
updated entries are concise, human-readable, and consistent with the project's
milestone naming convention so project tracking and references to these IDs are
clear.

Comment on lines +358 to +365
## 10. Verification and test surface

Automated pytest coverage is minimal and contract-focused.

Current automated test file:

- `tests/test_streaming_contracts.py`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

List tests/test_event_discipline.py in this section too.

This section currently says tests/test_streaming_contracts.py is the automated test file, but the same document also relies on tests/test_event_discipline.py as part of the maintained proof surface. Leaving it out here underreports the actual guard rails around the pipeline contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/ARCHITECTURE_CONTEXT.md` around lines 358 - 365, Update the
"## 10. Verification and test surface" section to include the missing test file
by adding `tests/test_event_discipline.py` to the list alongside
`tests/test_streaming_contracts.py`; edit the paragraph under that header (where
the current automated test file is listed) so it enumerates both
`tests/test_streaming_contracts.py` and `tests/test_event_discipline.py` to
accurately reflect the maintained proof surface.

Comment on lines +26 to +29
2. **MEDIUM (Balanced - DEFAULT)**:
- Target: Standard usage, optimal VRAM usage.
- Settings: `int8_float16` mix, Batch Size = 4, Beam Size = 5.
3. **HIGH (Max Speed)**:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align MEDIUM beam-size docs with runtime config.

The instruction says MEDIUM uses Beam Size 5, but the current settings implementation uses 3 for MEDIUM. Please sync this to avoid operator confusion.

📝 Suggested doc correction
-    - Settings: `int8_float16` mix, Batch Size = 4, Beam Size = 5.
+    - Settings: `int8_float16` mix, Batch Size = 4, Beam Size = 3.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
2. **MEDIUM (Balanced - DEFAULT)**:
- Target: Standard usage, optimal VRAM usage.
- Settings: `int8_float16` mix, Batch Size = 4, Beam Size = 5.
3. **HIGH (Max Speed)**:
2. **MEDIUM (Balanced - DEFAULT)**:
- Target: Standard usage, optimal VRAM usage.
- Settings: `int8_float16` mix, Batch Size = 4, Beam Size = 3.
3. **HIGH (Max Speed)**:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/INSTRUCTION.md` around lines 26 - 29, The MEDIUM profile in
INSTRUCTION.md is out of sync with the runtime (docs say "Beam Size = 5" but
implementation uses 3); update the MEDIUM entry to read "Beam Size = 3" so it
matches the runtime config and keep the rest (int8_float16 mix, Batch Size = 4)
unchanged; ensure the "MEDIUM" heading and its settings line are updated
consistently.

Comment on lines +430 to +431
for s, t in zip(sentences, final_translations):
s.translation = t
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add strict=True to detect length mismatches.

If sentences and final_translations have different lengths (e.g., LLM returns wrong count), silent truncation could cause data loss. Using strict=True raises ValueError on mismatch, making the bug immediately visible.

🐛 Proposed fix
-            for s, t in zip(sentences, final_translations):
+            for s, t in zip(sentences, final_translations, strict=True):
                 s.translation = t
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for s, t in zip(sentences, final_translations):
s.translation = t
for s, t in zip(sentences, final_translations, strict=True):
s.translation = t
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 430-430: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/async_pipeline.py` around lines 430 - 431, The loop that
pairs `sentences` and `final_translations` currently uses zip which silently
truncates if lengths differ; update the pairing to use Python's strict zip by
changing the iteration to use zip(sentences, final_translations, strict=True) so
a ValueError is raised on length mismatch (allowing the issue to surface
immediately); keep the assignment s.translation = t and, if desired, wrap the
loop in a try/except to log or re-raise the ValueError for clearer diagnostics.

Comment on lines +524 to +527
def _connect(self):
conn = psycopg2.connect(self.dsn)
conn.autocommit = True
return conn
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Close psycopg2 connections explicitly.

with self._connect() as conn only wraps the transaction; it does not close the psycopg2 connection. Because snapshot_media_row() is called after traced status updates, repeated runs can leak connections until Postgres starts refusing new ones.

Suggested fix
+from contextlib import contextmanager
+
 class LiveDbFixture:
-    def _connect(self):
+    `@contextmanager`
+    def _connect(self):
         conn = psycopg2.connect(self.dsn)
         conn.autocommit = True
-        return conn
+        try:
+            yield conn
+        finally:
+            conn.close()

Also applies to: 530-531, 560-561, 608-609, 645-646

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/scripts/test_v2_pipeline.py` around lines 524 - 527, The
connections returned by _connect() are not being closed by using "with
self._connect() as conn" because psycopg2 connections don't implement
transactional context exit, so convert _connect into a proper contextmanager
(e.g., add `@contextlib.contextmanager` to _connect, yield the psycopg2
connection, and ensure conn.close() in a finally block) or alternatively ensure
every caller (including places that call snapshot_media_row and the traced
status update sites) wraps the connection with contextlib.closing(...) or
explicitly calls conn.close() in a finally; update the _connect definition and
all call sites that currently use "with self._connect() as conn" to use the new
safe context so connections are always closed.

Comment on lines +1204 to +1208
finally:
if redis_capture is not None:
redis_capture.close()
if db_fixture is not None:
db_fixture.cleanup()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep teardown failures from hiding the real failure.

If redis_capture.close() or db_fixture.cleanup() raises inside finally, it replaces the pipeline/contract exception you actually need to debug.

Suggested fix
     finally:
         if redis_capture is not None:
-            redis_capture.close()
+            try:
+                redis_capture.close()
+            except Exception:
+                logger.exception("Failed to close RedisEventCapture")
         if db_fixture is not None:
-            db_fixture.cleanup()
+            try:
+                db_fixture.cleanup()
+            except Exception:
+                logger.exception("Failed to clean LiveDbFixture rows")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/scripts/test_v2_pipeline.py` around lines 1204 - 1208, The
teardown in the finally block currently calls redis_capture.close() and
db_fixture.cleanup() directly so any exception there will replace the original
test failure; change the finally to preserve and re-raise the original
exception: wrap each teardown call (redis_capture.close and db_fixture.cleanup)
in its own try/except, capture and log/suppress teardown exceptions (or
accumulate them) but do not overwrite the primary exception from the
pipeline/contract; ensure that if an original exception exists it is re-raised
after teardown exceptions are handled, and only surface teardown errors when no
primary exception was raised.

Comment thread apps/backend-api/src/modules/media/media.service.ts Outdated
Comment thread apps/backend-api/src/modules/socket/socket.gateway.ts Outdated
"artifactSummary": {
"title": "Tóm tắt đầu ra",
"sourceLanguage": "Ngôn ngữ nguồn",
"chunks": "Chunk",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Localize artifactSummary.chunks for Vietnamese consistency.

"Chunk" stands out as untranslated in the vi locale. Consider a Vietnamese label (for example, “Phân đoạn”).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/i18n/locales/vi/processing.json` at line 52, The JSON key
artifactSummary.chunks currently uses the English string "Chunk" in the
Vietnamese locale; update the value to an appropriate Vietnamese translation
(e.g., "Phân đoạn") by replacing the value for the "chunks" key in the vi locale
file (the artifactSummary.chunks entry) so it matches the rest of the Vietnamese
translations.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
apps/backend-api/src/modules/socket/socket.gateway.ts (1)

131-147: Consider adding authentication check for consistency.

handleLeaveMedia doesn't verify authentication, unlike handleJoinMedia. While leaving a non-joined room is harmless, the inconsistency may confuse clients/maintainers.

🔧 Suggested consistency fix
 `@SubscribeMessage`('media_leave')
 async handleLeaveMedia(
   `@ConnectedSocket`() client: Socket,
   `@MessageBody`() payload: MediaRoomPayload,
 ): Promise<RoomSubscriptionAck> {
+  const userId = this.getAuthenticatedUserId(client);
+  if (!userId) {
+    return { ok: false, error: 'Unauthenticated socket' };
+  }
+
   const mediaId = payload.mediaId?.trim();
   if (!mediaId) {
     return { ok: false, error: 'mediaId is required' };
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/socket/socket.gateway.ts` around lines 131 -
147, handleLeaveMedia lacks the same authentication guard as handleJoinMedia;
update handleLeaveMedia to perform the identical auth check used by
handleJoinMedia (e.g., validate the socket's authenticated user/session stored
on the socket or via the same auth service), and if the check fails return a
consistent unauthorized response (e.g., { ok: false, error: 'unauthorized' })
instead of proceeding to leave the room; keep the existing mediaId trim,
roomName resolution (getMediaRoom), and client.leave behavior when authenticated
so behavior remains the same for valid clients.
apps/backend-api/src/modules/media/media.service.ts (1)

403-424: Race condition is mitigated by idempotent merge operations.

The read-then-write pattern has a theoretical race between findUnique and persistArtifactSummary. However, since all merge functions use Math.max operations, concurrent updates will converge to the correct state. For high-throughput scenarios, consider using a database-level UPDATE ... SET = GREATEST(...) or optimistic locking if write amplification becomes an issue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/media/media.service.ts` around lines 403 - 424,
The updater is given the raw DB value instead of the normalized summary used for
comparison, causing inconsistency: in updateArtifactSummary call updater with
the normalized current summary (use
normalizeArtifactSummary(item.artifactSummary)) so updater receives the same
ProcessedArtifactSummary shape you compare with artifactSummariesEqual; keep the
subsequent equality check and call to persistArtifactSummary unchanged and
ensure the updater function signature/type aligns with ProcessedArtifactSummary
(references: updateArtifactSummary, normalizeArtifactSummary,
artifactSummariesEqual, persistArtifactSummary).
apps/backend-api/src/modules/media/media-artifact-summary.ts (2)

117-128: Validate finalObjectKey is non-empty to avoid inconsistent state.

Passing an empty string would set hasFinal: true with finalObjectKey: '', which conflicts with normalizeArtifactSummary semantics where empty strings become null.

🛡️ Suggested guard
 export function mergeFinalArtifactSummary(
   current: unknown,
   finalObjectKey: string,
 ): ProcessedArtifactSummary {
   const normalized = normalizeArtifactSummary(current);
+  const trimmedKey = finalObjectKey.trim();
+  if (!trimmedKey) {
+    return normalized;
+  }

   return {
     ...normalized,
     hasFinal: true,
-    finalObjectKey,
+    finalObjectKey: trimmedKey,
   };
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/media/media-artifact-summary.ts` around lines
117 - 128, mergeFinalArtifactSummary currently sets hasFinal: true even if
finalObjectKey is empty, conflicting with normalizeArtifactSummary (which maps
empty strings to null); add a guard in mergeFinalArtifactSummary that validates
finalObjectKey is a non-empty string (e.g., trim and check length) and either
throw a clear error or return normalized without hasFinal if invalid. Reference
the function mergeFinalArtifactSummary and call to normalizeArtifactSummary when
implementing the check so the finalObjectKey matches normalizeArtifactSummary
semantics.

6-8: Consider excluding arrays from record check.

isRecord([]) returns true since arrays are objects. While downstream property access gracefully handles this (returning undefined), an explicit check improves clarity.

🔧 Suggested refinement
 function isRecord(value: unknown): value is UnknownRecord {
-  return typeof value === 'object' && value !== null;
+  return typeof value === 'object' && value !== null && !Array.isArray(value);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/media/media-artifact-summary.ts` around lines 6
- 8, The isRecord type guard currently treats arrays as records (isRecord([])
=== true); update the function isRecord(value: unknown): value is UnknownRecord
to exclude arrays by adding an Array.isArray check (e.g., return typeof value
=== 'object' && value !== null && !Array.isArray(value)); keep the signature and
UnknownRecord type as-is so callers that rely on the guard behave the same
except for arrays being excluded.
apps/backend-api/src/modules/socket/socket.service.ts (1)

59-68: JSON parse errors are caught by the processing chain, but consider clearer error messages.

The JSON.parse at line 60 can throw, but the error is caught by the outer .catch() in messageProcessingChain. For better diagnostics, consider wrapping the parse specifically.

🔧 Optional: More specific error handling
 private async handleMediaUpdate(message: string): Promise<void> {
-  const parsedPayload: unknown = JSON.parse(message);
+  let parsedPayload: unknown;
+  try {
+    parsedPayload = JSON.parse(message);
+  } catch {
+    this.logger.warn(`Received malformed JSON in media_updates: ${message.slice(0, 100)}`);
+    return;
+  }
   const event = parseMediaEvent(parsedPayload);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/socket/socket.service.ts` around lines 59 - 68,
The JSON.parse call in handleMediaUpdate can throw and should be wrapped in a
small try/catch to provide a clearer diagnostic; surround const parsedPayload:
unknown = JSON.parse(message) with a try/catch, and in the catch log a
descriptive message via this.logger.warn or this.logger.error that includes the
raw message (or a redacted version using redactMediaPayload) and the parse
error, then rethrow the error (so messageProcessingChain can still handle it) or
return early if you prefer to swallow it; keep the rest of handleMediaUpdate
(parseMediaEvent, redactMediaPayload) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/backend-api/src/modules/media/media-artifact-summary.ts`:
- Around line 117-128: mergeFinalArtifactSummary currently sets hasFinal: true
even if finalObjectKey is empty, conflicting with normalizeArtifactSummary
(which maps empty strings to null); add a guard in mergeFinalArtifactSummary
that validates finalObjectKey is a non-empty string (e.g., trim and check
length) and either throw a clear error or return normalized without hasFinal if
invalid. Reference the function mergeFinalArtifactSummary and call to
normalizeArtifactSummary when implementing the check so the finalObjectKey
matches normalizeArtifactSummary semantics.
- Around line 6-8: The isRecord type guard currently treats arrays as records
(isRecord([]) === true); update the function isRecord(value: unknown): value is
UnknownRecord to exclude arrays by adding an Array.isArray check (e.g., return
typeof value === 'object' && value !== null && !Array.isArray(value)); keep the
signature and UnknownRecord type as-is so callers that rely on the guard behave
the same except for arrays being excluded.

In `@apps/backend-api/src/modules/media/media.service.ts`:
- Around line 403-424: The updater is given the raw DB value instead of the
normalized summary used for comparison, causing inconsistency: in
updateArtifactSummary call updater with the normalized current summary (use
normalizeArtifactSummary(item.artifactSummary)) so updater receives the same
ProcessedArtifactSummary shape you compare with artifactSummariesEqual; keep the
subsequent equality check and call to persistArtifactSummary unchanged and
ensure the updater function signature/type aligns with ProcessedArtifactSummary
(references: updateArtifactSummary, normalizeArtifactSummary,
artifactSummariesEqual, persistArtifactSummary).

In `@apps/backend-api/src/modules/socket/socket.gateway.ts`:
- Around line 131-147: handleLeaveMedia lacks the same authentication guard as
handleJoinMedia; update handleLeaveMedia to perform the identical auth check
used by handleJoinMedia (e.g., validate the socket's authenticated user/session
stored on the socket or via the same auth service), and if the check fails
return a consistent unauthorized response (e.g., { ok: false, error:
'unauthorized' }) instead of proceeding to leave the room; keep the existing
mediaId trim, roomName resolution (getMediaRoom), and client.leave behavior when
authenticated so behavior remains the same for valid clients.

In `@apps/backend-api/src/modules/socket/socket.service.ts`:
- Around line 59-68: The JSON.parse call in handleMediaUpdate can throw and
should be wrapped in a small try/catch to provide a clearer diagnostic; surround
const parsedPayload: unknown = JSON.parse(message) with a try/catch, and in the
catch log a descriptive message via this.logger.warn or this.logger.error that
includes the raw message (or a redacted version using redactMediaPayload) and
the parse error, then rethrow the error (so messageProcessingChain can still
handle it) or return early if you prefer to swallow it; keep the rest of
handleMediaUpdate (parseMediaEvent, redactMediaPayload) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 443fff35-c284-4c75-9447-7674a8ad6635

📥 Commits

Reviewing files that changed from the base of the PR and between 46daa0f and f255355.

📒 Files selected for processing (8)
  • apps/backend-api/package.json
  • apps/backend-api/prisma/migrations/20260322125429_add_artifact_summary_cache/migration.sql
  • apps/backend-api/prisma/schema.prisma
  • apps/backend-api/src/modules/media/dto/response.dto.ts
  • apps/backend-api/src/modules/media/media-artifact-summary.ts
  • apps/backend-api/src/modules/media/media.service.ts
  • apps/backend-api/src/modules/socket/socket.gateway.ts
  • apps/backend-api/src/modules/socket/socket.service.ts
✅ Files skipped from review due to trivial changes (1)
  • apps/backend-api/prisma/migrations/20260322125429_add_artifact_summary_cache/migration.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/backend-api/prisma/schema.prisma

@sondoannam sondoannam merged commit 9009cc2 into refactor/processing-pipeline Mar 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant