Skip to content

feat(mobile-app): ✨ build processing screen#6

Merged
sondoannam merged 37 commits intomainfrom
feat/processing-screen
Mar 23, 2026
Merged

feat(mobile-app): ✨ build processing screen#6
sondoannam merged 37 commits intomainfrom
feat/processing-screen

Conversation

@sondoannam
Copy link
Copy Markdown
Contributor

@sondoannam sondoannam commented Mar 11, 2026

Summary by CodeRabbit

  • New Features

    • Real-time processing via WebSocket (live progress, chunk/batch/final events)
    • Device media picker for selecting/uploading audio & video
    • Processing screen with animated progress ring, ETA, pipeline stepper, and subtitle preview
    • Artifacts inventory and one-click download/open of final processed media
    • Player placeholder reachable from processing view
  • Improvements

    • Socket-first sync for instant status updates across sessions
    • Media library permissions & media-library plugin added for smoother uploads

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 11, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b75b8400-23d2-45fc-9ec2-5b853e67522c

📥 Commits

Reviewing files that changed from the base of the PR and between a2eb90d and 952592a.

📒 Files selected for processing (5)
  • .github/instructions/ai-engine.instructions.md
  • apps/ai-engine/src/config.py
  • apps/ai-engine/src/core/audio_inspector.py
  • apps/ai-engine/src/main.py
  • apps/ai-engine/tests/test_prewarm_startup.py

📝 Walkthrough

Walkthrough

Replaces the AI engine V1 pipeline with a V2 async NMT-first producer/consumer pipeline, adds durable artifact uploads/events/DB helpers, introduces backend WebSocket/Redis wiring, extends media artifact/ETA fields and DTOs, and implements socket-first mobile UI and artifact retrieval endpoints.

Changes

Cohort / File(s) Summary
AI Engine: pipeline & orchestration
apps/ai-engine/src/async_pipeline.py, apps/ai-engine/src/pipelines.py, apps/ai-engine/src/core/nmt_translator.py, apps/ai-engine/src/core/semantic_merger.py, apps/ai-engine/src/core/llm_provider.py, apps/ai-engine/src/core/prompts.py
New V2 async producer/consumer pipeline (run_v2_pipeline_async/run_v2_pipeline), NMT translator (CTranslate2/NLLB), language-aware semantic merging, LLM refinement surface, prompt/parse updates.
AI Engine: I/O, schema, events & DB
apps/ai-engine/src/minio_client.py, apps/ai-engine/src/schemas.py, apps/ai-engine/src/events.py, apps/ai-engine/src/db.py
Canonical MinIO key helpers, presigned GETs, translated-batch/final uploads, expanded Pydantic subtitle schemas, Redis publish helpers, and Postgres update helpers (update_media_status/mark_quota_counted).
AI Engine: entrypoint, config & tests/scripts
apps/ai-engine/src/main.py, apps/ai-engine/src/config.py, apps/ai-engine/src/scripts/*, apps/ai-engine/tests/*
Main now calls run_v2_pipeline, added prewarm, new settings (NMT/model prewarm), extensive test harnesses and scripts (convert/eval/test_v2_pipeline) and event-discipline/contract tests.
AI Engine: removed/cleaned V1
apps/ai-engine/src/core/translator_engine.py, apps/ai-engine/src/core/translator_engine.py (deleted), apps/ai-engine/last_llm_response.txt (deleted)
Removed old TranslatorEngine, V1 process_video artifacts, and legacy mapping file.
Backend: DB schema & migrations
apps/backend-api/prisma/schema.prisma, apps/backend-api/prisma/migrations/*
Removed ProcessingMode/processing_mode, added artifact_summary JSONB, current_step, and estimated_time_remaining columns with corresponding migrations.
Backend: media model, DTOs & service
apps/backend-api/src/modules/media/media.service.ts, apps/backend-api/src/modules/media/dto/*.ts, apps/backend-api/src/modules/media/media.controller.ts, apps/backend-api/src/modules/media/media-artifact-summary.ts
Persist and serve artifact summaries, add artifact-list and download endpoints, new DTOs for artifacts/final URL, record helpers to update artifact summary, export MediaService.
Backend: socket & event forwarding
apps/backend-api/src/modules/socket/*, apps/backend-api/src/modules/socket/socket.types.ts, apps/backend-api/src/modules/minio/minio.service.ts
New SocketModule/Gateway/Service; Redis subscriber forwards media_updates to user/media rooms; typed media event parsing, processed-bucket listing and presigned GET generation.
Backend: queue types & wiring
apps/backend-api/src/modules/queue/queue.types.ts, apps/backend-api/src/modules/media/workers/media.processor.ts
Removed processingMode from job payloads; added optional targetLanguage propagated into AI jobs.
Backend: auth & module wiring
apps/backend-api/src/modules/auth/auth.service.ts, apps/backend-api/src/app.module.ts, apps/backend-api/package.json, apps/backend-api/scripts/sync-schema.ts
Added verifyAccessToken, integrated SocketModule into AppModule, added socket deps and schema sync script.
Mobile app: socket, services & types
apps/mobile-app/src/services/socket.service.ts, apps/mobile-app/src/services/media.services.ts, apps/mobile-app/src/types/socket-events.ts, apps/mobile-app/src/types/subtitle.ts, apps/mobile-app/package.json
Client-side SocketService singleton, typed socket events, new subtitle/artifact types, mediaService.getArtifacts and updated confirm/YouTube responses, socket.io-client dependency.
Mobile app: UI, hooks & components
apps/mobile-app/src/app/(app)/processing.tsx, apps/mobile-app/src/app/(app)/media-picker.tsx, apps/mobile-app/src/components/media/*, apps/mobile-app/src/hooks/*, apps/mobile-app/src/i18n/*
Processing and MediaPicker screens, PipelineStepper, ProcessingProgressRing, SubtitlePreview, UploadModals, useSocketSync and artifact/hooks, i18n processing namespace and translations.
Mobile app: navigation/layout & UX adjustments
apps/mobile-app/src/app/(app)/_layout.tsx, apps/mobile-app/src/app/(app)/upload.tsx, apps/mobile-app/src/app/_layout.tsx, apps/mobile-app/src/app/(app)/index.tsx
Replaced old upload UI, added media-picker route, root language hydration, always-route-to-processing from library.
Mobile app: supporting constants/exports
apps/mobile-app/src/constants/*, apps/mobile-app/src/components/index.ts, apps/mobile-app/src/hooks/index.ts
Added PIPELINE_STEPS, endpoints/routes for artifacts, re-exported PipelineStepper and hydrateLanguagePreference/useThrottle.
Repo docs & misc
.github/*, .gsd/*, .gitignore, checkpoint.md, .bg-shell/manifest.json
New copilot/instruction docs, GSD state and milestone plans, updated .gitignore and checkpoint reflecting V2 refactor.

Sequence Diagram(s)

sequenceDiagram
    participant Mobile as Mobile Client
    participant SocketGW as Backend Socket Gateway
    participant SocketSvc as Backend Socket Service
    participant Redis as Redis Pub/Sub
    participant AI as AI Engine Worker
    participant MinIO as MinIO
    participant DB as Postgres

    Mobile->>SocketGW: open WS (Bearer token)
    SocketGW->>SocketSvc: subscribe user/media rooms (on connect)
    AI->>Redis: PUBLISH media_updates (progress/chunk/batch/completed/failed)
    Redis->>SocketSvc: message(media_updates)
    SocketSvc->>DB: optional refresh artifact summary/status
    SocketSvc->>SocketGW: emit event to user_{userId} / media_{mediaId}
    SocketGW->>Mobile: socket event (media_progress / chunk_ready / batch_ready / completed / failed)
    Mobile->>Mobile: update local React Query caches and UI
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through queues and threaded streams,

Chunks and batches stitched like dreams,
Sockets whisper progress bright,
Final.json lands in light,
Rabbity cheers: "Pipeline, take flight!" ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/processing-screen

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
apps/mobile-app/src/hooks/useMedia.ts (1)

92-105: ⚠️ Potential issue | 🟠 Major

Guard confirmUpload behind successful file reads and PUTs.

This flow confirms the upload even if the local file fetch or the presigned PUT fails. That can create a media record for an object that was never stored.

Proposed fix
       // Step 2 — Upload file blob directly to MinIO (bypasses our API server)
       const fileResponse = await fetch(file.uri);
+      if (!fileResponse.ok) {
+        throw new Error("Failed to read local file for upload");
+      }
       const fileBlob = await fileResponse.blob();
-      await fetch(uploadUrl, {
+      const uploadResponse = await fetch(uploadUrl, {
         method: "PUT",
         body: fileBlob,
         headers: { "Content-Type": file.mimeType },
       });
+      if (!uploadResponse.ok) {
+        throw new Error("Failed to upload file to storage");
+      }
 
       // Step 3 — Confirm upload → creates MediaItem + dispatches queue job
       const title = file.name.replace(/\.[^.]+$/, ""); // strip file extension for title
       return mediaService.confirmUpload({
         title,
-        objectKey: objectKey,
+        objectKey,
         processingMode: "TRANSCRIBE_TRANSLATE",
       });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/hooks/useMedia.ts` around lines 92 - 105, The code calls
mediaService.confirmUpload regardless of whether the local fetch or the
presigned PUT succeeded; ensure confirmUpload is only invoked after successful
file read and a successful PUT response. Specifically, check fileResponse.ok
after fetch(file.uri) and the PUT response.ok after await fetch(uploadUrl, {
method: "PUT", ... }); throw or return early on failure (or propagate the error)
so that mediaService.confirmUpload({ title, objectKey, processingMode:
"TRANSCRIBE_TRANSLATE", ... }) is only executed when fileResponse and the PUT
response are both successful; include the objectKey/title variables in the same
scope so they are unchanged when calling confirmUpload.
apps/mobile-app/src/services/api.ts (1)

35-39: ⚠️ Potential issue | 🟡 Minor

Normalize loopback hosts even for https URLs.

The startsWith("https") branch skips normalizeApiBaseUrl(), so EXPO_PUBLIC_API_URL=https://localhost:3000/api or https://127.0.0.1:3000/api still points at the emulator itself on Android. Now that this constant is exported for sockets too, the same bad host breaks both transports.

Suggested fix
 export const API_BASE_URL = envApiBaseUrl
-  ? envApiBaseUrl.startsWith("https")
-    ? envApiBaseUrl
-    : normalizeApiBaseUrl(envApiBaseUrl)
+  ? normalizeApiBaseUrl(envApiBaseUrl)
   : DEFAULT_API_BASE_URL;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/services/api.ts` around lines 35 - 39, The exported
API_BASE_URL currently bypasses normalizeApiBaseUrl when envApiBaseUrl
startsWith("https"), which leaves loopback hosts like https://localhost or
https://127.0.0.1 un-normalized and breaks Android emulator/socket connectivity;
update the logic so that when envApiBaseUrl is present you always call
normalizeApiBaseUrl(envApiBaseUrl) (falling back to DEFAULT_API_BASE_URL only
when envApiBaseUrl is absent) and keep using the same API_BASE_URL constant for
sockets and HTTP so both transports get the normalized host.
apps/mobile-app/src/stores/media.store.ts (1)

46-60: ⚠️ Potential issue | 🟡 Minor

Update refreshItem() to persist step/ETA fields.

This patch path still overwrites only status, progress, and failReason. Any code path that calls getStatus() will leave currentStep and estimatedTimeRemaining stale, so the new processing screen can show outdated pipeline data.

Suggested fix
       set((state) => ({
         items: state.items.map((item) =>
           item.id === id
             ? {
                 ...item,
                 status: statusRes.status,
                 progress: statusRes.progress,
+                currentStep: statusRes.currentStep,
+                estimatedTimeRemaining: statusRes.estimatedTimeRemaining,
                 failReason: statusRes.failReason,
               }
             : item,
         ),
       }));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/stores/media.store.ts` around lines 46 - 60, The
refreshItem function updates only status, progress, and failReason from
mediaService.getStatus; modify the state update inside refreshItem to also copy
currentStep and estimatedTimeRemaining from the statusRes into the matched item
so those pipeline fields are persisted (i.e., when mapping items in refreshItem,
set currentStep: statusRes.currentStep and estimatedTimeRemaining:
statusRes.estimatedTimeRemaining in addition to status/progress/failReason).
🧹 Nitpick comments (3)
apps/ai-engine/src/main.py (1)

334-340: Throttle chunk-level status pushes before they become control-plane churn.

Each on_chunk callback now does a Postgres update plus Redis publish, and apps/backend-api/src/modules/socket/socket.service.ts:57-66 immediately turns that into another DB read and websocket emit. Coalescing these updates by time or meaningful progress delta is usually enough for a smooth UX without hammering the write/read fan-out path.

Also applies to: 416-422

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/main.py` around lines 334 - 340, The on_chunk flow is
currently calling update_media_status (e.g., update_media_status(media_id,
user_id=user_id, progress=progress, current_step="TRANSCRIBING",
estimated_time_remaining=_eta(progress))) on every chunk which triggers DB
writes and Redis publishes; modify this to throttle/coalesce updates per
media_id by either (a) emitting only when a configurable time interval has
passed (debounce every N seconds) or (b) when progress changes by a meaningful
delta (e.g., >= 1–5%), and fall back to a final forced update at completion;
implement the throttle in the on_chunk handler (or wrap update_media_status)
using a per-media_id last-update timestamp and last-progress value so
intermediate chunk callbacks skip calling update_media_status until the time or
delta threshold is reached.
apps/mobile-app/src/app/(app)/upload.tsx (1)

37-48: Trigger the next action from sheet-close completion, not a fixed 200ms delay.

These flows are now coupled to a magic duration and already drift from the BottomSheet close timing in apps/mobile-app/src/components/BottomSheet.tsx:57-70. If that animation changes, the YouTube modal or media-picker navigation can race the sheet and cause overlap/flicker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/app/`(app)/upload.tsx around lines 37 - 48, The handlers
handleSelectYouTube and handleSelectDevice rely on a hardcoded 200ms timeout;
replace that with waiting for the BottomSheet close completion instead. Change
onClose to return a promise or provide a completion callback (or expose a
closeCompleted event in the BottomSheet component referenced in
BottomSheet.tsx), then update handleSelectYouTube to await onClose() (or use
onClose().then(...)) before calling setYtVisible(true), and update
handleSelectDevice to await onClose() before router.push("/media-picker");
ensure the BottomSheet's close flow (close(), closeCompleted promise/event) is
implemented so these handlers no longer use setTimeout.
apps/mobile-app/src/hooks/useSocketSync.ts (1)

35-40: Consider adding type safety to the cache update.

Using any type loses type safety. Consider typing the callback parameter for better maintainability.

Suggested improvement
+import type { MediaItem } from "@/types/media";
+
      // 2. Update the global media list so the Home library updates in background without fetching
-     queryClient.setQueryData(mediaKeys.all, (oldData: any) => {
-       if (!oldData || !Array.isArray(oldData)) return oldData;
-       return oldData.map((item: any) =>
+     queryClient.setQueryData<MediaItem[] | undefined>(mediaKeys.all, (oldData) => {
+       if (!oldData) return oldData;
+       return oldData.map((item) =>
          item.id === media.id ? { ...item, ...media } : item,
        );
      });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/hooks/useSocketSync.ts` around lines 35 - 40, The cache
update using queryClient.setQueryData(mediaKeys.all, ...) currently types the
callback with any and loses type safety; update the callback signature to use
the correct cached type (e.g., Array<MediaType> or MediaType[]), replace
(oldData: any) with a properly typed parameter and adjust the map callback from
(item: any) to (item: MediaType) so comparisons with media.id and the merged
object { ...item, ...media } are type-checked; ensure MediaType defines id and
other fields used and update imports/types where queryClient.setQueryData and
mediaKeys.all are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ai-engine/src/main.py`:
- Around line 141-151: The Redis publish should be wrapped in its own try/except
so Redis failures are logged separately from DB errors: after the DB commit and
where redis_client.publish("media_updates", ...) is called, move that publish
into a nested try block and catch Exception as e (or pub_exc) and call
logger.error with a distinct message like "Redis publish failed for media
{media_id}, user {user_id}: {e}" while leaving the outer DB exception handling
intact; reference the redis_client.publish call, the logger.error for DB update,
and the media_id/user_id variables to locate where to add the new try/except.

In `@apps/backend-api/prisma/schema.prisma`:
- Around line 199-200: The currentStep column is defined as an unconstrained
String; change it to a Prisma enum and update DB migration and TypeScript DTOs
to use the closed PipelineStage set: add a Prisma enum PipelineStage with values
AUDIO_PREP, INSPECTING, VAD, TRANSCRIBING, MERGING, TRANSLATING, EXPORTING,
change the model field currentStep from String? to PipelineStage? (preserve
`@map`("current_step")), create and apply a migration that converts the existing
TEXT column to the enum type (migrate safely handling existing values),
regenerate the Prisma client, and update any DTO/type declarations so
currentStep is PipelineStage | null instead of string | null.

In `@apps/backend-api/src/modules/auth/auth.service.ts`:
- Around line 300-313: verifyAccessToken is verifying with JWT_ACCESS_SECRET
while tokens are signed with JWT_SECRET, causing all WebSocket auth to fail; fix
by making secrets consistent: either change verifyAccessToken to call
this.configService.getOrThrow('JWT_SECRET') so verification uses the same secret
as JwtModule, or update the token creation flow in generateTokens to explicitly
sign access tokens with JWT_ACCESS_SECRET (and ensure JwtModule/secret config
matches whichever choice). Update the JwtModule/secret config and any references
to JWT_SECRET/JWT_ACCESS_SECRET so JwtModule, generateTokens, and
verifyAccessToken all use the same secret symbol.

In `@apps/mobile-app/.gitignore`:
- Line 50: Add a short comment above the /stitch gitignore entry explaining what
the /stitch directory is, which tool or process creates it (e.g., "stitcher"
build step, local dev script, CI artifact, or third-party tool), and why it
should be ignored (generated artifacts, large binaries, or sensitive run-time
files), and if applicable mention the command that produces it (e.g., npm run
stitch or CI job name) so future maintainers know its origin and whether it
should be committed or recreated as part of the build/test workflow.

In `@apps/mobile-app/src/app/`(app)/media-picker.tsx:
- Around line 197-206: The CTA currently always calls requestPermission() even
when permissionResponse.canAskAgain is false; change the Pressable's onPress
handler and label logic so it checks permissionResponse.canAskAgain and if true
calls requestPermission(), otherwise opens the app settings (e.g., via
Linking.openSettings or a platform-specific Settings helper) and update the
button text accordingly (e.g., "Grant Permission" vs "Open Settings"); locate
the Pressable and its Text label in media-picker.tsx (the Pressable using
requestPermission and the permissionResponse.canAskAgain check) and implement
the conditional onPress and label change.
- Around line 75-85: The current fallback logic in media-picker.tsx sets
mimeType too coarsely (variables mimeType, asset, ext), causing
.mov/.webm/.aac/.flac/.ogg/etc. to be mislabeled; replace the simple mediaType
overrides with a small, explicit extension-to-MIME mapping (include mp4,
mov->video/quicktime, m4v, webm->video/webm, mp3->audio/mpeg, m4a->audio/mp4,
aac->audio/aac, flac->audio/flac, wav->audio/wav, ogg->audio/ogg,
opus->audio/opus, etc.), prefer using a MIME lookup on asset.filename/asset.uri
when available (or expo FileSystem/getAssetInfoAsync result) and only fall back
to application/octet-stream; update the code around the mimeType/ext logic to
consult this mapping so presign/content-type metadata are correct.

In `@apps/mobile-app/src/app/`(app)/processing.tsx:
- Around line 110-128: The component currently falls through to the normal
screen when media is undefined (so getStatusLabel maps to QUEUED); change the
rendering logic in processing.tsx to detect when the route param `id` is missing
or when `useMediaStatus(id ?? null)` returns no `media` and `isLoading` is
false, and render an explicit error/empty state (instead of the queued UI)
showing a helpful message and actions (e.g., back-to-library via `goToLibrary`)
— update the conditional that now only handles `isLoading && !media` to also
handle `!isLoading && !media` (or missing `id`) so the footer/actions and status
label aren’t incorrectly shown as QUEUED. Ensure this uses the same variables
(`media`, `isLoading`, `id`, `goToLibrary`, `goToPlayer`, and any
`getStatusLabel` usage) so the fix is localized.
- Line 110: The media status screen needs a low-frequency fallback refetch while
media is still active: update the component using useMediaStatus(id ?? null) to
trigger periodic refetches (e.g., via the hook's returned refetch function or
the underlying query's refetch) at a low interval when media?.status is not
"COMPLETED" or "FAILED" and not isLoading, and stop/clear the interval when
status reaches a terminal state or on unmount; this complements useSocketSync
(which invalidates cache on socket events) so the UI recovers if the socket
misses events.

In `@apps/mobile-app/src/components/media/PipelineStepper.tsx`:
- Around line 20-33: The stepper currently marks failed jobs as "completed";
update the StepState union to include "failed" and change getStepState so that
when status === "FAILED" it returns "completed" for stepIndex < activeIndex,
"failed" for stepIndex === activeIndex (the step that errored), and "pending"
for later steps; keep the existing logic for other statuses. Also update the
component rendering that consumes StepState (the PipelineStepper step icon/label
renderer) to treat "failed" with error styling instead of a success checkmark.

In `@apps/mobile-app/src/i18n/locales/vi/processing.json`:
- Around line 4-6: The translations for the keys "queued", "validating", and
"processing" in processing.json currently use the same string for "queued" and
"validating"; change the "validating" value to a distinct Vietnamese phrase that
reflects validation has started (e.g., "Đang xác thực..." or "Đang kiểm tra...")
so users can distinguish validation from queued and processing states; locate
the "queued", "validating", and "processing" keys in the file and replace only
the "validating" string with the new text.

In `@apps/mobile-app/src/services/socket.service.ts`:
- Around line 28-52: The socket is opened with a possibly stale access token
from getTokens and never refreshed, so on auth rejection it keeps retrying with
the same credential; update the connect flow in the SocketService (the code
around getTokens, this.getSocketUrl, this.socket creation and the
"connect_error" handler) to obtain a fresh token before connecting—reuse the
same refresh logic used in apps/mobile-app/src/services/api.ts (call the refresh
routine or refreshTokens helper prior to creating io(...)), and on
"connect_error" when the server rejects auth attempt, trigger a token refresh
and retry connecting (replace or update this.socket auth payload and reconnect)
instead of only logging the error.
- Around line 25-40: connect() currently only checks this.socket?.connected so
concurrent calls can create multiple sockets; add a guard to return early if a
connection is already in-flight or exists by introducing a boolean (e.g.,
this.connecting) and check if (this.socket || this.connecting) return at the top
of connect(); set this.connecting = true before calling io(...), clear it to
false on successful connection, on "connect" and on error/close handlers, and
ensure disconnect() clears both this.socket and this.connecting so subsequent
connect() calls work correctly.
- Around line 55-60: The concise arrow inside the "media_updated" handler is
returning listener(media) and triggering the lint rule; change the forEach
callback to a block-bodied function so it does not return a value — in the
media_updated event handler (the this.socket.on("media_updated", (media:
MediaItem) => { ... }) block) replace this.listeners.forEach((listener) =>
listener(media)) with a block-bodied callback such as
this.listeners.forEach((listener) => { listener(media); }) to satisfy linting.

In `@apps/mobile-app/src/types/media.ts`:
- Around line 25-26: The MediaItem type declares estimatedTimeRemaining as
number | null but the backend list API can omit this field (yielding undefined);
update the MediaItem declaration (and any related client DTO types) so
estimatedTimeRemaining is optional and accepts undefined as well (e.g.,
estimatedTimeRemaining?: number | null) and audit usages of
estimatedTimeRemaining in the codebase to safely handle undefined/null (look for
the MediaItem type and references to estimatedTimeRemaining and align with
MediaListItemDto from the backend).

---

Outside diff comments:
In `@apps/mobile-app/src/hooks/useMedia.ts`:
- Around line 92-105: The code calls mediaService.confirmUpload regardless of
whether the local fetch or the presigned PUT succeeded; ensure confirmUpload is
only invoked after successful file read and a successful PUT response.
Specifically, check fileResponse.ok after fetch(file.uri) and the PUT
response.ok after await fetch(uploadUrl, { method: "PUT", ... }); throw or
return early on failure (or propagate the error) so that
mediaService.confirmUpload({ title, objectKey, processingMode:
"TRANSCRIBE_TRANSLATE", ... }) is only executed when fileResponse and the PUT
response are both successful; include the objectKey/title variables in the same
scope so they are unchanged when calling confirmUpload.

In `@apps/mobile-app/src/services/api.ts`:
- Around line 35-39: The exported API_BASE_URL currently bypasses
normalizeApiBaseUrl when envApiBaseUrl startsWith("https"), which leaves
loopback hosts like https://localhost or https://127.0.0.1 un-normalized and
breaks Android emulator/socket connectivity; update the logic so that when
envApiBaseUrl is present you always call normalizeApiBaseUrl(envApiBaseUrl)
(falling back to DEFAULT_API_BASE_URL only when envApiBaseUrl is absent) and
keep using the same API_BASE_URL constant for sockets and HTTP so both
transports get the normalized host.

In `@apps/mobile-app/src/stores/media.store.ts`:
- Around line 46-60: The refreshItem function updates only status, progress, and
failReason from mediaService.getStatus; modify the state update inside
refreshItem to also copy currentStep and estimatedTimeRemaining from the
statusRes into the matched item so those pipeline fields are persisted (i.e.,
when mapping items in refreshItem, set currentStep: statusRes.currentStep and
estimatedTimeRemaining: statusRes.estimatedTimeRemaining in addition to
status/progress/failReason).

---

Nitpick comments:
In `@apps/ai-engine/src/main.py`:
- Around line 334-340: The on_chunk flow is currently calling
update_media_status (e.g., update_media_status(media_id, user_id=user_id,
progress=progress, current_step="TRANSCRIBING",
estimated_time_remaining=_eta(progress))) on every chunk which triggers DB
writes and Redis publishes; modify this to throttle/coalesce updates per
media_id by either (a) emitting only when a configurable time interval has
passed (debounce every N seconds) or (b) when progress changes by a meaningful
delta (e.g., >= 1–5%), and fall back to a final forced update at completion;
implement the throttle in the on_chunk handler (or wrap update_media_status)
using a per-media_id last-update timestamp and last-progress value so
intermediate chunk callbacks skip calling update_media_status until the time or
delta threshold is reached.

In `@apps/mobile-app/src/app/`(app)/upload.tsx:
- Around line 37-48: The handlers handleSelectYouTube and handleSelectDevice
rely on a hardcoded 200ms timeout; replace that with waiting for the BottomSheet
close completion instead. Change onClose to return a promise or provide a
completion callback (or expose a closeCompleted event in the BottomSheet
component referenced in BottomSheet.tsx), then update handleSelectYouTube to
await onClose() (or use onClose().then(...)) before calling setYtVisible(true),
and update handleSelectDevice to await onClose() before
router.push("/media-picker"); ensure the BottomSheet's close flow (close(),
closeCompleted promise/event) is implemented so these handlers no longer use
setTimeout.

In `@apps/mobile-app/src/hooks/useSocketSync.ts`:
- Around line 35-40: The cache update using
queryClient.setQueryData(mediaKeys.all, ...) currently types the callback with
any and loses type safety; update the callback signature to use the correct
cached type (e.g., Array<MediaType> or MediaType[]), replace (oldData: any) with
a properly typed parameter and adjust the map callback from (item: any) to
(item: MediaType) so comparisons with media.id and the merged object { ...item,
...media } are type-checked; ensure MediaType defines id and other fields used
and update imports/types where queryClient.setQueryData and mediaKeys.all are
used.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8ea4ce7c-1708-49ea-b482-d8b2564b43e1

📥 Commits

Reviewing files that changed from the base of the PR and between 06664fc and d032cb2.

⛔ Files ignored due to path filters (2)
  • apps/backend-api/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
  • apps/mobile-app/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (39)
  • apps/ai-engine/last_llm_response.txt
  • apps/ai-engine/src/main.py
  • apps/backend-api/package.json
  • apps/backend-api/prisma/migrations/20260306075830_add_eta_and_step/migration.sql
  • apps/backend-api/prisma/schema.prisma
  • apps/backend-api/src/app.module.ts
  • apps/backend-api/src/common/constants/enums.ts
  • apps/backend-api/src/common/constants/index.ts
  • apps/backend-api/src/modules/auth/auth.service.ts
  • apps/backend-api/src/modules/media/dto/response.dto.ts
  • apps/backend-api/src/modules/media/media.module.ts
  • apps/backend-api/src/modules/media/media.service.ts
  • apps/backend-api/src/modules/socket/socket.gateway.ts
  • apps/backend-api/src/modules/socket/socket.module.ts
  • apps/backend-api/src/modules/socket/socket.service.ts
  • apps/mobile-app/.gitignore
  • apps/mobile-app/app.json
  • apps/mobile-app/package.json
  • apps/mobile-app/src/app/(app)/_layout.tsx
  • apps/mobile-app/src/app/(app)/media-picker.tsx
  • apps/mobile-app/src/app/(app)/processing.tsx
  • apps/mobile-app/src/app/(app)/upload.tsx
  • apps/mobile-app/src/components/StatusBadge.tsx
  • apps/mobile-app/src/components/index.ts
  • apps/mobile-app/src/components/media/PipelineStepper.tsx
  • apps/mobile-app/src/components/media/YouTubeModal.tsx
  • apps/mobile-app/src/constants/pipeline.ts
  • apps/mobile-app/src/hooks/useMedia.ts
  • apps/mobile-app/src/hooks/useSocketSync.ts
  • apps/mobile-app/src/i18n/i18n.ts
  • apps/mobile-app/src/i18n/locales/en/common.json
  • apps/mobile-app/src/i18n/locales/en/processing.json
  • apps/mobile-app/src/i18n/locales/vi/common.json
  • apps/mobile-app/src/i18n/locales/vi/processing.json
  • apps/mobile-app/src/services/api.ts
  • apps/mobile-app/src/services/media.services.ts
  • apps/mobile-app/src/services/socket.service.ts
  • apps/mobile-app/src/stores/media.store.ts
  • apps/mobile-app/src/types/media.ts
💤 Files with no reviewable changes (3)
  • apps/mobile-app/src/i18n/locales/en/common.json
  • apps/mobile-app/src/i18n/locales/vi/common.json
  • apps/ai-engine/last_llm_response.txt

Comment thread apps/ai-engine/src/main.py Outdated
Comment on lines 141 to 151

# Publish update event to Redis for NestJS WebSockets
if user_id:
logger.debug(f"Publishing media_updated event for {media_id}")
redis_client.publish("media_updates", json.dumps({
"mediaId": media_id,
"userId": user_id
}))

except Exception as e:
logger.error(f"DB update failed for media {media_id}: {e}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Separate Redis publish failures from database write failures.

The row is already committed before redis_client.publish(...) runs. If Redis is unavailable, this path logs a DB failure even though persistence succeeded, and the realtime update just disappears. Give publish its own try/except so the failure mode is accurate and debuggable.

Suggested fix
         with conn:
             with conn.cursor() as cur:
                 cur.execute(sql, values)
         conn.close()
-        
-        # Publish update event to Redis for NestJS WebSockets
-        if user_id:
-            logger.debug(f"Publishing media_updated event for {media_id}")
-            redis_client.publish("media_updates", json.dumps({
-                "mediaId": media_id,
-                "userId": user_id
-            }))
+
+        if user_id:
+            try:
+                logger.debug(f"Publishing media_updated event for {media_id}")
+                redis_client.publish(
+                    "media_updates",
+                    json.dumps({"mediaId": media_id, "userId": user_id}),
+                )
+            except Exception as e:
+                logger.error(f"Redis publish failed for media {media_id}: {e}")
🧰 Tools
🪛 Ruff (0.15.5)

[warning] 150-150: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/main.py` around lines 141 - 151, The Redis publish should
be wrapped in its own try/except so Redis failures are logged separately from DB
errors: after the DB commit and where redis_client.publish("media_updates", ...)
is called, move that publish into a nested try block and catch Exception as e
(or pub_exc) and call logger.error with a distinct message like "Redis publish
failed for media {media_id}, user {user_id}: {e}" while leaving the outer DB
exception handling intact; reference the redis_client.publish call, the
logger.error for DB update, and the media_id/user_id variables to locate where
to add the new try/except.

Comment thread apps/backend-api/prisma/schema.prisma Outdated
Comment on lines +199 to +200
currentStep String? @map("current_step") // Active pipeline stage: AUDIO_PREP, VAD, TRANSCRIBING, etc.
estimatedTimeRemaining Int? @map("estimated_time_remaining") // Seconds remaining (ETA), null when idle/done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

❓ Verification inconclusive

Script executed:

#!/bin/bash
rg -n 'enum PipelineStage|currentStep|current_step' \
  apps/backend-api/prisma/schema.prisma \
  apps/backend-api/src \
  apps/mobile-app/src

Repository: KMA-Final-Project/project


Repository: KMA-Final-Project/project
Exit code: 0

stdout:

apps/backend-api/prisma/schema.prisma:199:  currentStep            String?     `@map`("current_step") // Active pipeline stage: AUDIO_PREP, VAD, TRANSCRIBING, etc.
apps/mobile-app/src/types/media.ts:25:  currentStep: string | null; // e.g. 'TRANSCRIBING', null when idle/completed
apps/mobile-app/src/types/media.ts:77:  currentStep: string | null;
apps/mobile-app/src/components/media/PipelineStepper.tsx:5: * Maps the AI engine's `currentStep` string constants to user-friendly labels.
apps/mobile-app/src/components/media/PipelineStepper.tsx:16:  currentStep: string | null;
apps/mobile-app/src/components/media/PipelineStepper.tsx:36:export function PipelineStepper({ currentStep, status }: PipelineStepperProps) {
apps/mobile-app/src/components/media/PipelineStepper.tsx:40:  const activeIndex = currentStep
apps/mobile-app/src/components/media/PipelineStepper.tsx:41:    ? PIPELINE_STEPS.findIndex((s) => s.key === currentStep)
apps/backend-api/src/modules/media/media.service.ts:260:        currentStep: true,
apps/backend-api/src/modules/media/media.service.ts:293:        currentStep: true,
apps/backend-api/src/common/constants/enums.ts:1:export enum PipelineStage {
apps/backend-api/src/modules/media/dto/response.dto.ts:142:  currentStep: string | null;
apps/backend-api/src/modules/media/dto/response.dto.ts:199:  currentStep: string | null;
apps/mobile-app/src/app/(app)/processing.tsx:282:              currentStep={isDone ? "EXPORTING" : (media?.currentStep ?? null)}

Script executed:

cat -n apps/backend-api/src/common/constants/enums.ts | head -30

Repository: KMA-Final-Project/project


Repository: KMA-Final-Project/project
Exit code: 0

stdout:

     1	export enum PipelineStage {
     2	  AUDIO_PREP = 'AUDIO_PREP',
     3	  INSPECTING = 'INSPECTING',
     4	  VAD = 'VAD',
     5	  TRANSCRIBING = 'TRANSCRIBING',
     6	  MERGING = 'MERGING',
     7	  TRANSLATING = 'TRANSLATING',
     8	  EXPORTING = 'EXPORTING',
     9	}

Script executed:

rg -n 'currentStep' apps/backend-api/src -A 3 -B 1 | head -50

Repository: KMA-Final-Project/project


Repository: KMA-Final-Project/project
Exit code: 0

stdout:

apps/backend-api/src/modules/media/dto/response.dto.ts-141-  })
apps/backend-api/src/modules/media/dto/response.dto.ts:142:  currentStep: string | null;
apps/backend-api/src/modules/media/dto/response.dto.ts-143-
apps/backend-api/src/modules/media/dto/response.dto.ts-144-  `@ApiProperty`({
apps/backend-api/src/modules/media/dto/response.dto.ts-145-    description:
--
apps/backend-api/src/modules/media/dto/response.dto.ts-198-  })
apps/backend-api/src/modules/media/dto/response.dto.ts:199:  currentStep: string | null;
apps/backend-api/src/modules/media/dto/response.dto.ts-200-
apps/backend-api/src/modules/media/dto/response.dto.ts-201-  `@ApiProperty`()
apps/backend-api/src/modules/media/dto/response.dto.ts-202-  createdAt: Date;
--
apps/backend-api/src/modules/media/media.service.ts-259-        originType: true,
apps/backend-api/src/modules/media/media.service.ts:260:        currentStep: true,
apps/backend-api/src/modules/media/media.service.ts-261-        estimatedTimeRemaining: true,
apps/backend-api/src/modules/media/media.service.ts-262-        transcriptS3Key: true,
apps/backend-api/src/modules/media/media.service.ts-263-        subtitleS3Key: true,
--
apps/backend-api/src/modules/media/media.service.ts-292-        durationSeconds: true,
apps/backend-api/src/modules/media/media.service.ts:293:        currentStep: true,
apps/backend-api/src/modules/media/media.service.ts-294-        createdAt: true,
apps/backend-api/src/modules/media/media.service.ts-295-      },
apps/backend-api/src/modules/media/media.service.ts-296-      orderBy: { createdAt: 'desc' },

Script executed:

cat -n apps/backend-api/prisma/schema.prisma | grep -A 30 "enum\|MediaItem"

Repository: KMA-Final-Project/project


Repository: KMA-Final-Project/project
Exit code: 0

stdout:

    18	enum Role {
    19	  USER
    20	  ADMIN
    21	}
    22	
    23	enum SubscriptionStatus {
    24	  ACTIVE
    25	  EXPIRED
    26	  CANCELLED
    27	}
    28	
    29	enum MediaOriginType {
    30	  LOCAL
    31	  YOUTUBE
    32	}
    33	
    34	enum MediaStatus {
    35	  QUEUED
    36	  VALIDATING
    37	  PROCESSING
    38	  COMPLETED
    39	  FAILED
    40	}
    41	
    42	enum ProcessingMode {
    43	  TRANSCRIBE
    44	  TRANSCRIBE_TRANSLATE
    45	}
    46	
    47	enum OtpType {
    48	  REGISTER
    49	  FORGOT_PASSWORD
    50	}
    51	
    52	// --- MODELS ---
    53	
    54	model User {
    55	  id            String  `@id` `@default`(uuid())
    56	  email         String  `@unique`
    57	  fullName      String  `@map`("full_name")
    58	  passwordHash  String  `@map`("password_hash")
    59	  emailVerified Boolean `@default`(false) `@map`("email_verified")
    60	
    61	  // Subscription Info
    62	  currentSubscriptionId         String?  `@map`("current_subscription_id")
    63	  quotaUsageCurrentMonth        Int      `@default`(0) `@map`("quota_usage_current_month")
    64	  quotaUsageCurrentMonthSeconds Int      `@default`(0) `@map`("quota_usage_current_month_seconds") // Track duration
    65	  lastQuotaResetDate            DateTime `@default`(now()) `@map`("last_quota_reset_date")
    66	  role                          Role     `@default`(USER)
    67	
    68	  // Relationships
    69	  currentSubscription Subscription?    `@relation`("UserCurrentSub", fields: [currentSubscriptionId], references: [id])
    70	  subscriptions       Subscription[]   `@relation`("UserHistorySub")
    71	  usageHistories      UsageHistory[]
    72	  mediaItems          MediaItem[]
    73	  vocabularies        UserVocabulary[]
    74	  otps                Otp[]
    75	  refreshTokens       RefreshToken[]
    76	
    77	  createdAt DateTime `@default`(now()) `@map`("created_at")
    78	  updatedAt DateTime `@updatedAt` `@map`("updated_at")
    79	
    80	  @@map("users")
    81	}
    82	
    83	model SubscriptionPlan {
    84	  id          String  `@id` // e.g. "FREE", "PRO_MONTHLY"
    85	  code        String  `@unique`
    86	  name        String
    87	  description String? `@db.Text`
    88	
    89	  // Marketing & Display
    90	  features  Json?   `@map`("features") // ["No Ads", "High Quality", "Fast Processing", etc]
    91	  tierLevel Int     `@default`(1)
    92	  isActive  Boolean `@default`(true) `@map`("is_active")
    93	
    94	  variants PlanVariant[]
    95	
    96	  createdAt DateTime `@default`(now()) `@map`("created_at")
    97	  updatedAt DateTime `@updatedAt` `@map`("updated_at")
    98	
    99	  @@map("subscription_plans")
   100	}
   101	
   102	model PlanVariant {
--
   127	enum BillingCycleType {
   128	  MONTHLY
   129	  SIX_MONTHS
   130	  YEARLY
   131	  LIFETIME
   132	}
   133	
   134	model Subscription {
   135	  id     String `@id` `@default`(uuid())
   136	  userId String `@map`("user_id")
   137	
   138	  variantId String `@map`("variant_id")
   139	
   140	  startDate DateTime `@map`("start_date")
   141	  endDate   DateTime `@map`("end_date")
   142	
   143	  status SubscriptionStatus `@default`(ACTIVE)
   144	
   145	  // Snapshot
   146	  priceSnapshot               Decimal `@map`("price_snapshot") `@db.Decimal`(10, 2)
   147	  maxDurationPerFileSnapshot  Int     `@map`("max_duration_per_file_snapshot")
   148	  monthlyQuotaSecondsSnapshot Int     `@map`("monthly_quota_seconds_snapshot")
   149	
   150	  // Relationships
   151	  user    User        `@relation`("UserHistorySub", fields: [userId], references: [id])
   152	  users   User[]      `@relation`("UserCurrentSub")
   153	  variant PlanVariant `@relation`(fields: [variantId], references: [id])
   154	
   155	  usageHistories UsageHistory[]
   156	
   157	  createdAt DateTime `@default`(now()) `@map`("created_at")
--
   179	model MediaItem {
   180	  id     String `@id` `@default`(uuid())
   181	  userId String `@map`("user_id")
   182	
   183	  title      String
   184	  originType MediaOriginType `@map`("origin_type")
   185	  originUrl  String?         `@map`("origin_url") // Youtube URL or null if LOCAL
   186	
   187	  // Processing Configuration
   188	  processingMode ProcessingMode `@default`(TRANSCRIBE) `@map`("processing_mode")
   189	  sourceLanguage String?        `@map`("source_language") // Auto-detected: "en", "zh", "vi"
   190	
   191	  // Storage Keys (MinIO/S3 Paths)
   192	  audioS3Key      String  `@map`("audio_s3_key")
   193	  transcriptS3Key String? `@map`("transcript_s3_key") // Transcription-only JSON path
   194	  subtitleS3Key   String? `@map`("subtitle_s3_key") // Translated subtitle JSON path
   195	
   196	  durationSeconds        Int         `@default`(0) `@map`("duration_seconds")
   197	  status                 MediaStatus `@default`(QUEUED)
   198	  progress               Float       `@default`(0) `@map`("progress") // 0.0 to 1.0
   199	  currentStep            String?     `@map`("current_step") // Active pipeline stage: AUDIO_PREP, VAD, TRANSCRIBING, etc.
   200	  estimatedTimeRemaining Int?        `@map`("estimated_time_remaining") // Seconds remaining (ETA), null when idle/done
   201	  failReason             String?     `@map`("fail_reason") // Human-readable error
   202	
   203	  // Logic flag
   204	  countedInQuota Boolean `@default`(false) `@map`("counted_in_quota")
   205	
   206	  // Relationships
   207	  user             User             `@relation`(fields: [userId], references: [id])
   208	  userVocabularies UserVocabulary[]
   209	
--
   239	  mediaItem  MediaItem? `@relation`(fields: [mediaItemId], references: [id])
   240	
   241	  @@unique([userId, vocabularyId]) // Prevent duplicate saves per user
   242	  @@map("user_vocabularies")
   243	}
   244	
   245	model Otp {
   246	  id     String  `@id` `@default`(uuid())
   247	  email  String // Always required for email delivery
   248	  userId String? `@map`("user_id") // Optional - null for REGISTER, set for FORGOT_PASSWORD
   249	  code   String // 6-digit OTP code
   250	  type   OtpType
   251	
   252	  expiresAt DateTime `@map`("expires_at")
   253	  verified  Boolean  `@default`(false)
   254	  createdAt DateTime `@default`(now()) `@map`("created_at")
   255	
   256	  user User? `@relation`(fields: [userId], references: [id])
   257	
   258	  @@index([email, type])
   259	  @@index([userId, type])
   260	  @@map("otps")
   261	}
   262	
   263	model RefreshToken {
   264	  id         String   `@id` `@default`(uuid())
   265	  token      String   `@unique`
   266	  userId     String   `@map`("user_id")
   267	  deviceInfo String?  `@map`("device_info") // User Agent
   268	  ip         String? // Client IP address
   269	  expiresAt  DateTime `@map`("expires_at")

Script executed:

rg -n 'PipelineStage\|validate.*currentStep\|currentStep.*valid' apps/backend-api/src --type ts -B 2 -A 2

Repository: KMA-Final-Project/project


Repository: KMA-Final-Project/project
Exit code: 1


Script executed:

rg -n 'currentStep|PipelineStage' apps/backend-api/src/modules/media/media.service.ts -B 3 -A 5 | head -80

Repository: KMA-Final-Project/project


Repository: KMA-Final-Project/project
Exit code: 0

stdout:

257-        durationSeconds: true,
258-        failReason: true,
259-        originType: true,
260:        currentStep: true,
261-        estimatedTimeRemaining: true,
262-        transcriptS3Key: true,
263-        subtitleS3Key: true,
264-        createdAt: true,
265-      },
--
290-        originType: true,
291-        originUrl: true,
292-        durationSeconds: true,
293:        currentStep: true,
294-        createdAt: true,
295-      },
296-      orderBy: { createdAt: 'desc' },
297-    });
298-  }

Script executed:

find apps/backend-api -name "*migration*" -o -name "*prisma*" -type f | grep -E "\.(sql|ts)$" | head -20

Repository: KMA-Final-Project/project


Repository: KMA-Final-Project/project
Exit code: 0

stdout:

apps/backend-api/prisma.config.ts
apps/backend-api/prisma/migrations/20260201140422_init/migration.sql
apps/backend-api/prisma/migrations/20260204140449_add_otp_model/migration.sql
apps/backend-api/prisma/migrations/20260205162132_add_refresh_token/migration.sql
apps/backend-api/prisma/migrations/20260208142341_update_subcriptions/migration.sql
apps/backend-api/prisma/migrations/20260211192356_add_processing_fields/migration.sql
apps/backend-api/prisma/migrations/20260306075830_add_eta_and_step/migration.sql
apps/backend-api/src/prisma/prisma.module.ts
apps/backend-api/src/prisma/prisma.service.ts

Script executed:

rg -n 'currentStep|current_step' apps/backend-api/prisma -t sql -t ts -B 2 -A 2 2>/dev/null | head -50

Repository: KMA-Final-Project/project


Repository: KMA-Final-Project/project
Exit code: 0

stdout:

apps/backend-api/prisma/migrations/20260306075830_add_eta_and_step/migration.sql-1--- AlterTable
apps/backend-api/prisma/migrations/20260306075830_add_eta_and_step/migration.sql:2:ALTER TABLE "media_items" ADD COLUMN     "current_step" TEXT,
apps/backend-api/prisma/migrations/20260306075830_add_eta_and_step/migration.sql-3-ADD COLUMN     "estimated_time_remaining" INTEGER;

Persist currentStep as a Prisma enum, not free-form text.

The codebase defines PipelineStage as a closed set of values in TypeScript (AUDIO_PREP, INSPECTING, VAD, TRANSCRIBING, MERGING, TRANSLATING, EXPORTING), but the Prisma schema allows any string here. Without type safety at the database layer, invalid values can be stored permanently and sent to clients that only recognize the supported stages. Currently, the migration creates current_step as unconstrained TEXT with no validation preventing bad data.

Suggested fix
 enum MediaStatus {
   QUEUED
   VALIDATING
   PROCESSING
   COMPLETED
   FAILED
 }
+
+enum PipelineStage {
+  AUDIO_PREP
+  INSPECTING
+  VAD
+  TRANSCRIBING
+  MERGING
+  TRANSLATING
+  EXPORTING
+}
@@
-  currentStep            String?     `@map`("current_step") // Active pipeline stage: AUDIO_PREP, VAD, TRANSCRIBING, etc.
+  currentStep            PipelineStage? `@map`("current_step") // Active pipeline stage

Update the corresponding migration to use the enum type, and consider updating the DTOs to type currentStep as PipelineStage | null instead of string | null to enforce the contract end-to-end.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/prisma/schema.prisma` around lines 199 - 200, The
currentStep column is defined as an unconstrained String; change it to a Prisma
enum and update DB migration and TypeScript DTOs to use the closed PipelineStage
set: add a Prisma enum PipelineStage with values AUDIO_PREP, INSPECTING, VAD,
TRANSCRIBING, MERGING, TRANSLATING, EXPORTING, change the model field
currentStep from String? to PipelineStage? (preserve `@map`("current_step")),
create and apply a migration that converts the existing TEXT column to the enum
type (migrate safely handling existing values), regenerate the Prisma client,
and update any DTO/type declarations so currentStep is PipelineStage | null
instead of string | null.

Comment thread apps/backend-api/src/modules/auth/auth.service.ts

/my-scripts

/stitch No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify the purpose of the /stitch directory.

The addition of /stitch to the gitignore file lacks context. It's unclear what this directory is for or how it relates to the PR's objectives (real-time media processing progress tracking). Please clarify:

  • What generates or uses the /stitch directory?
  • Why should it be excluded from version control?
  • Is this related to a specific tool, build process, or development workflow?

Consider adding a comment above the entry to document its purpose for future maintainers.

📝 Suggested improvement with documentation
 /my-scripts
 
+# stitch - [Add explanation here: e.g., "generated by XYZ tool" or "local development artifacts"]
 /stitch
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/.gitignore` at line 50, Add a short comment above the /stitch
gitignore entry explaining what the /stitch directory is, which tool or process
creates it (e.g., "stitcher" build step, local dev script, CI artifact, or
third-party tool), and why it should be ignored (generated artifacts, large
binaries, or sensitive run-time files), and if applicable mention the command
that produces it (e.g., npm run stitch or CI job name) so future maintainers
know its origin and whether it should be committed or recreated as part of the
build/test workflow.

Comment on lines +75 to +85
// Determine an appropriate mimeType fallback since getAssetInfoAsync doesn't always return it
let mimeType = "application/octet-stream";
if (asset.mediaType === "video") mimeType = "video/mp4";
if (asset.mediaType === "audio") mimeType = "audio/mpeg";

// Extension fallback
const ext = asset.filename.split(".").pop()?.toLowerCase();
if (ext === "mp3") mimeType = "audio/mpeg";
if (ext === "m4a") mimeType = "audio/mp4";
if (ext === "wav") mimeType = "audio/wav";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Broaden MIME detection before uploading device files.

Lines 77-85 collapse almost every video to video/mp4 and almost every audio file to audio/mpeg. Common library assets like .mov, .aac, .flac, or .ogg will be mislabeled here, which can break presign validation or store the wrong content type metadata.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/app/`(app)/media-picker.tsx around lines 75 - 85, The
current fallback logic in media-picker.tsx sets mimeType too coarsely (variables
mimeType, asset, ext), causing .mov/.webm/.aac/.flac/.ogg/etc. to be mislabeled;
replace the simple mediaType overrides with a small, explicit extension-to-MIME
mapping (include mp4, mov->video/quicktime, m4v, webm->video/webm,
mp3->audio/mpeg, m4a->audio/mp4, aac->audio/aac, flac->audio/flac,
wav->audio/wav, ogg->audio/ogg, opus->audio/opus, etc.), prefer using a MIME
lookup on asset.filename/asset.uri when available (or expo
FileSystem/getAssetInfoAsync result) and only fall back to
application/octet-stream; update the code around the mimeType/ext logic to
consult this mapping so presign/content-type metadata are correct.

Comment on lines +4 to +6
"queued": "Đang chờ bắt đầu...",
"validating": "Đang chờ bắt đầu...",
"processing": "Đang tạo phụ đề...",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Give validating its own copy.

queued and validating currently render the same text, so users never see that validation has actually started.

Suggested fix
   "status": {
     "queued": "Đang chờ bắt đầu...",
-    "validating": "Đang chờ bắt đầu...",
+    "validating": "Đang kiểm tra tệp...",
     "processing": "Đang tạo phụ đề...",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"queued": "Đang chờ bắt đầu...",
"validating": "Đang chờ bắt đầu...",
"processing": "Đang tạo phụ đề...",
"queued": "Đang chờ bắt đầu...",
"validating": "Đang kiểm tra tệp...",
"processing": "Đang tạo phụ đề...",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/i18n/locales/vi/processing.json` around lines 4 - 6, The
translations for the keys "queued", "validating", and "processing" in
processing.json currently use the same string for "queued" and "validating";
change the "validating" value to a distinct Vietnamese phrase that reflects
validation has started (e.g., "Đang xác thực..." or "Đang kiểm tra...") so users
can distinguish validation from queued and processing states; locate the
"queued", "validating", and "processing" keys in the file and replace only the
"validating" string with the new text.

Comment on lines +25 to +40
async connect() {
if (this.socket?.connected) return;

const tokens = await getTokens();
if (!tokens?.accessToken) {
console.warn("[Socket] Cannot connect without access token");
return;
}

const socketUrl = this.getSocketUrl();

this.socket = io(socketUrl, {
auth: { token: `Bearer ${tokens.accessToken}` },
reconnection: true,
transports: ["websocket"],
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard connect() against duplicate in-flight connections.

Line 26 only checks this.socket?.connected, so a second connect() call before the first handshake finishes will create another client. The older socket then keeps running outside this singleton and every media_updated event can be delivered twice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/services/socket.service.ts` around lines 25 - 40,
connect() currently only checks this.socket?.connected so concurrent calls can
create multiple sockets; add a guard to return early if a connection is already
in-flight or exists by introducing a boolean (e.g., this.connecting) and check
if (this.socket || this.connecting) return at the top of connect(); set
this.connecting = true before calling io(...), clear it to false on successful
connection, on "connect" and on error/close handlers, and ensure disconnect()
clears both this.socket and this.connecting so subsequent connect() calls work
correctly.

Comment on lines +28 to +52
const tokens = await getTokens();
if (!tokens?.accessToken) {
console.warn("[Socket] Cannot connect without access token");
return;
}

const socketUrl = this.getSocketUrl();

this.socket = io(socketUrl, {
auth: { token: `Bearer ${tokens.accessToken}` },
reconnection: true,
transports: ["websocket"],
});

this.socket.on("connect", () => {
console.log("[Socket] Connected to server", this.socket?.id);
});

this.socket.on("disconnect", (reason) => {
console.log("[Socket] Disconnected:", reason);
});

this.socket.on("connect_error", (error) => {
console.error("[Socket] Connection error:", error.message);
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't open the socket with a stale access token.

This path reads the stored token directly, but unlike apps/mobile-app/src/services/api.ts:126-142 it never refreshes it. Once apps/backend-api/src/modules/socket/socket.gateway.ts:33-66 rejects an expired token, this service just logs connect_error and keeps retrying with the same bad credential, so live processing updates silently stop.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/services/socket.service.ts` around lines 28 - 52, The
socket is opened with a possibly stale access token from getTokens and never
refreshed, so on auth rejection it keeps retrying with the same credential;
update the connect flow in the SocketService (the code around getTokens,
this.getSocketUrl, this.socket creation and the "connect_error" handler) to
obtain a fresh token before connecting—reuse the same refresh logic used in
apps/mobile-app/src/services/api.ts (call the refresh routine or refreshTokens
helper prior to creating io(...)), and on "connect_error" when the server
rejects auth attempt, trigger a token refresh and retry connecting (replace or
update this.socket auth payload and reconnect) instead of only logging the
error.

Comment on lines +55 to +60
this.socket.on("media_updated", (media: MediaItem) => {
console.log(
`[Socket] Media ${media.id} updated to status ${media.status}`,
);
this.listeners.forEach((listener) => listener(media));
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a block-bodied callback here.

Biome is right on Line 59: the concise arrow returns listener(media), which trips lint/suspicious/useIterableCallbackReturn and can keep this file failing lint.

♻️ Proposed fix
-      this.listeners.forEach((listener) => listener(media));
+      this.listeners.forEach((listener) => {
+        listener(media);
+      });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
this.socket.on("media_updated", (media: MediaItem) => {
console.log(
`[Socket] Media ${media.id} updated to status ${media.status}`,
);
this.listeners.forEach((listener) => listener(media));
});
this.socket.on("media_updated", (media: MediaItem) => {
console.log(
`[Socket] Media ${media.id} updated to status ${media.status}`,
);
this.listeners.forEach((listener) => {
listener(media);
});
});
🧰 Tools
🪛 Biome (2.4.6)

[error] 59-59: This callback passed to forEach() iterable method should not return a value.

(lint/suspicious/useIterableCallbackReturn)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/services/socket.service.ts` around lines 55 - 60, The
concise arrow inside the "media_updated" handler is returning listener(media)
and triggering the lint rule; change the forEach callback to a block-bodied
function so it does not return a value — in the media_updated event handler (the
this.socket.on("media_updated", (media: MediaItem) => { ... }) block) replace
this.listeners.forEach((listener) => listener(media)) with a block-bodied
callback such as this.listeners.forEach((listener) => { listener(media); }) to
satisfy linting.

Comment thread apps/mobile-app/src/types/media.ts Outdated
sondoannam and others added 24 commits March 12, 2026 22:10
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Phase A (B1) — Backend presigned GET URL infrastructure:
- MinioService: add `bucketProcessed` field + `generatePresignedGetUrl()`
- MediaService: add `getProcessedFileUrl()` method
- MediaController: add `GET :id/download-url` endpoint
- response.dto.ts: add `DownloadUrlResponseDto`

Phase B (A1+A2) — AI Engine presigned URLs + rich Redis payloads:
- config.py: add `MINIO_PUBLIC_ENDPOINT` setting
- minio_client.py: add `get_presigned_url()`, update `upload_chunk()`,
  `upload_translated_batch()`, and `upload_final_result()` to return
  `(s3_key, presigned_url)` tuples
- main.py: remove legacy `{mediaId, userId}` Redis publish from
  `update_media_status()`; add typed publishers (publish_progress,
  publish_chunk_ready, publish_batch_ready, publish_completed,
  publish_failed); wire them throughout both pipeline modes

Phase C (A3) — SocketService refactored to thin event router:
- socket.types.ts: new file with MediaEvent discriminated union
- socket.service.ts: switch on event.type, emit separated event names
  (media_progress, media_chunk_ready, media_batch_ready, media_completed,
  media_failed); remove DB query on every tick; remove MediaService dep
- socket.module.ts: remove MediaModule import

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: sondoannam <80660666+sondoannam@users.noreply.github.com>
Co-authored-by: sondoannam <80660666+sondoannam@users.noreply.github.com>
…ructions-file

Add repository-level Copilot onboarding instructions for the monorepo
- apps/ai-engine/tests/test_streaming_contracts.py
- apps/ai-engine/src/minio_client.py
- apps/ai-engine/src/schemas.py
- apps/ai-engine/ARCHITECTURE_CONTEXT.md
- .gsd/milestones/M001/slices/S02/tasks/T01-SUMMARY.md
- .gsd/milestones/M001/slices/S02/S02-PLAN.md
- .gsd/STATE.md
- apps/ai-engine/src/schemas.py
- apps/ai-engine/src/async_pipeline.py
- apps/mobile-app/src/types/subtitle.ts
- apps/ai-engine/tests/test_streaming_contracts.py
- apps/mobile-app/src/types/subtitle.ts
- apps/mobile-app/src/hooks/useProcessingSubtitles.ts
- apps/ai-engine/ARCHITECTURE_CONTEXT.md
- apps/ai-engine/src/scripts/test_v2_pipeline.py
- apps/ai-engine/tests/test_first_batch_streaming.py
- apps/ai-engine/ARCHITECTURE_CONTEXT.md
- .gsd/KNOWLEDGE.md
- .gsd/DECISIONS.md
- apps/ai-engine/tests/test_event_discipline.py
- .gsd/milestones/M001/slices/S03/tasks/T01-PLAN.md
- apps/ai-engine/src/scripts/test_v2_pipeline.py
- .gsd/KNOWLEDGE.md
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/mobile-app/src/hooks/useMedia.ts (1)

157-170: ⚠️ Potential issue | 🔴 Critical

Add response validation to both fetch calls before proceeding to confirmUpload().

The code lacks HTTP status checks on both fetch operations. fetch() resolves on all HTTP status codes—only network errors throw. Without validating response.ok, a failed local file read (404/403) or failed MinIO PUT will still proceed to confirmUpload(), creating a media record for a file that was never stored.

🛠️ Suggested guard
       // Step 2 — Upload file blob directly to MinIO (bypasses our API server)
       const fileResponse = await fetch(file.uri);
+      if (!fileResponse.ok) {
+        throw new Error(`Failed to read local file: ${fileResponse.status}`);
+      }
       const fileBlob = await fileResponse.blob();
-      await fetch(uploadUrl, {
+      const uploadResponse = await fetch(uploadUrl, {
         method: "PUT",
         body: fileBlob,
         headers: { "Content-Type": file.mimeType },
       });
+      if (!uploadResponse.ok) {
+        throw new Error(`Upload failed: ${uploadResponse.status}`);
+      }

       // Step 3 — Confirm upload → creates MediaItem + dispatches queue job
       const title = file.name.replace(/\.[^.]+$/, ""); // strip file extension for title
       return mediaService.confirmUpload({
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/hooks/useMedia.ts` around lines 157 - 170, Add HTTP
response validation for the two fetch calls before calling
mediaService.confirmUpload: after the initial fetch(file.uri) check
fileResponse.ok (and handle/throw/log a descriptive error referencing file.uri
or file.name) and similarly check the PUT response from fetch(uploadUrl, {
method: "PUT", ... }) for ok before proceeding; if either response is not ok,
abort and do not call confirmUpload (the code around fileResponse, fileBlob,
uploadUrl, and mediaService.confirmUpload should perform these guards and
surface the error).
♻️ Duplicate comments (2)
apps/mobile-app/src/app/(app)/processing.tsx (1)

73-75: ⚠️ Potential issue | 🟠 Major

Handle missing media before rendering the queued state.

If id is absent or the status query finishes without media, this component falls through with status === undefined and shows the normal queued UI with the wrong footer/actions. Return an explicit empty/error state for !id || (!isLoading && !media) instead of using the processing layout.

Also applies to: 115-121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/app/`(app)/processing.tsx around lines 73 - 75, When
rendering the processing UI, guard against missing id or missing media by
returning an explicit empty/error state when id is falsy or when the status
query finished without media (i.e., !id || (!isLoading && !media)); update the
logic around useMediaStatus and useMediaArtifacts so you don't fall through to
the queued UI when status is undefined, and apply the same guard to the other
processing render branch that currently shows the queued footer/actions (the
second block that uses status to decide queued vs processing). Ensure you check
id and media before using status and render a clear empty/error component
instead of the processing layout.
apps/mobile-app/src/hooks/useMedia.ts (1)

60-65: ⚠️ Potential issue | 🟠 Major

Keep a recovery path for missed socket updates.

staleTime: Infinity plus all refetch triggers disabled makes useMediaStatus() and useMediaArtifacts() effectively socket-only. If the app backgrounds, reconnects, or misses an invalidation, the processing screen can freeze on stale data. Keep a low-frequency refetch for non-terminal items, or at least re-enable reconnect/focus refetch.

Also applies to: 85-100

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/hooks/useMedia.ts` around lines 60 - 65,
socketFirstQueryOptions currently disables all refetching (staleTime: Infinity
and all refetch triggers false), making useMediaStatus() and useMediaArtifacts()
rely solely on sockets and risk freezing on missed updates; modify
socketFirstQueryOptions to allow a low-frequency recovery refetch (e.g., set
staleTime to a finite value like 30_000 ms) or re-enable
refetchOnReconnect/refetchOnWindowFocus (or both), and if possible make the
shorter staleTime or re-enable triggers conditional for non-terminal items only
so terminal items remain stable; apply the same change pattern where
socketFirstQueryOptions is reused (also referenced in the other block around
useMediaStatus/useMediaArtifacts).
🧹 Nitpick comments (26)
apps/mobile-app/src/hooks/useLanguagePreference.ts (1)

57-69: Unnecessary i18n in dependency array.

The effect body only calls hydrateLanguagePreference() which uses the module-level i18nInstance, not the i18n from useTranslation(). Having i18n in the dependency array could cause the effect to re-run unnecessarily if the i18n instance reference changes.

♻️ Proposed fix
   useEffect(() => {
     let isMounted = true;

     hydrateLanguagePreference().finally(() => {
       if (isMounted) {
         setIsLoading(false);
       }
     });

     return () => {
       isMounted = false;
     };
-  }, [i18n]);
+  }, []);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/hooks/useLanguagePreference.ts` around lines 57 - 69, The
effect in useEffect currently lists i18n in its dependency array but the effect
only calls hydrateLanguagePreference() (which relies on the module-level
i18nInstance) and therefore should not re-run when the useTranslation() i18n ref
changes; remove i18n from the dependency array and use an empty array [] so
hydrateLanguagePreference() runs once on mount (keep the isMounted cleanup logic
as-is), referencing the useEffect, hydrateLanguagePreference, and i18n symbols
to locate the change.
apps/backend-api/src/modules/socket/socket.types.ts (1)

74-76: isRecord returns true for arrays.

typeof [] === 'object' and [] !== null, so arrays pass this check. While not a bug (subsequent field checks will fail for arrays), consider tightening if you want explicit rejection.

♻️ Optional: exclude arrays explicitly
 function isRecord(value: unknown): value is UnknownRecord {
-  return typeof value === 'object' && value !== null;
+  return typeof value === 'object' && value !== null && !Array.isArray(value);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/socket/socket.types.ts` around lines 74 - 76,
isRecord currently returns true for arrays because typeof [] === 'object';
update the isRecord type guard to explicitly reject arrays by adding an
Array.isArray check (e.g., return typeof value === 'object' && value !== null &&
!Array.isArray(value)) so that isRecord(value: unknown): value is UnknownRecord
only passes for plain objects; adjust any callers relying on arrays being
accepted if needed.
apps/mobile-app/src/hooks/useThrottle.ts (1)

57-57: Consider warning about bypassFn stability in JSDoc.

Including bypassFn in the dependency array is correct, but if callers pass an inline function, it will cause the effect to re-run on every render (potentially resetting the throttle state). The JSDoc should advise callers to memoize bypassFn with useCallback, or you can wrap it internally with useRef.

📝 Update JSDoc to warn about stability
  * `@param` interval  Minimum ms between updates (e.g. 1500)
  * `@param` bypassFn  Optional predicate — when true the value is emitted
- *                  immediately regardless of throttle (e.g. terminal progress)
+ *                  immediately regardless of throttle (e.g. terminal progress).
+ *                  Should be memoized with useCallback to avoid resetting throttle state.
  */
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/hooks/useThrottle.ts` at line 57, The JSDoc for
useThrottle should warn that the bypassFn dependency must be stable to avoid
re-running the effect on every render; update the comment for the useThrottle
hook (and mention bypassFn) to advise callers to memoize bypassFn with
useCallback or that the hook could internally stabilize it (e.g., store it in a
ref) to prevent unintended throttle resets. Include a short example suggestion:
"memoize bypassFn with useCallback" and note the current dependency array
([value, interval, bypassFn]) relies on bypassFn stability.
apps/backend-api/src/modules/media/workers/media.processor.ts (1)

117-123: Consider documenting or validating the targetLanguage default behavior.

The targetLanguage field is passed directly from the job payload without validation or a default value here. According to queue.types.ts, it "defaults to 'vi' if omitted", but this default appears to be applied downstream in the AI engine rather than here.

This is acceptable if the AI engine handles the default, but consider adding a brief comment clarifying where the default is applied to avoid confusion during future maintenance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/media/workers/media.processor.ts` around lines
117 - 123, The aiPayload creation is passing job.data.targetLanguage directly
without validation or local default; either validate/normalize it here or add a
clear inline comment stating that the "targetLanguage" default ('vi') is applied
downstream in the AI engine/queue types; update the AiProcessingJobPayload
population around aiPayload (and reference job.data.targetLanguage) to either
set a local default/validation or add the comment indicating where the default
originates (queue.types.ts / AI engine) so future maintainers know why it's
omitted here.
apps/ai-engine/src/config.py (1)

119-122: Consider validating NMT_MODEL_DIR exists at startup.

NMT_MODEL_DIR is typed as Path with a default of "temp/models/nllb-200-3.3B-ct2", but unlike TEMP_DIR and OUTPUT_DIR, it's not created by create_dirs(). If this directory doesn't exist (e.g., model not downloaded), the NMT translator will fail at runtime.

Consider either:

  1. Adding validation at startup to check if the model directory exists
  2. Documenting the model download step prominently in the instructions
  3. Adding a clear error message in the translator initialization if the model is missing
💡 Option: Add validation in create_dirs or a separate check
def create_dirs(self):
    """Initialize directories"""
    os.makedirs(self.TEMP_DIR, exist_ok=True)
    os.makedirs(self.OUTPUT_DIR, exist_ok=True)

def validate_nmt_model(self):
    """Check NMT model directory exists"""
    if not self.NMT_MODEL_DIR.exists():
        from loguru import logger
        logger.warning(
            f"NMT model directory not found: {self.NMT_MODEL_DIR}. "
            "Translation will fail until the model is downloaded."
        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/config.py` around lines 119 - 122, Add startup validation
for the NMT model directory so runtime failures are avoided: in config.py update
create_dirs() or add a new validate_nmt_model() method to check Path
NMT_MODEL_DIR.exists() and, if missing, log a clear error/warning (using the
project logger) and either raise a ConfigError or return a failure state so
translator initialization (where the NMT translator is constructed) can surface
a user-friendly message; reference NMT_MODEL_DIR, create_dirs(), and the
translator initialization code path to ensure the check runs before the
translator is instantiated.
apps/mobile-app/src/app/(app)/player.tsx (1)

11-89: Clarify the purpose of this placeholder screen.

Both the back button (Lines 23-28) and the primary action button (Lines 73-78) navigate to the same destination (ROUTES.PROCESSING). This makes the screen effectively a redirect/informational screen.

If this is intentional (e.g., a temporary placeholder for a future player implementation), consider adding a code comment explaining its purpose. Otherwise, users might be confused by a screen that immediately redirects them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/app/`(app)/player.tsx around lines 11 - 89,
PlayerPlaceholderScreen currently has both the back Pressable and the primary
action Pressable calling router.replace to ROUTES.PROCESSING, making the screen
behave like an immediate redirect or purely informational; if this is
intentional (a temporary stub), add a clear code comment above the
PlayerPlaceholderScreen function explaining it's a placeholder for a future
player and why both controls return to ROUTES.PROCESSING, otherwise update one
or both handlers (the two onPress callbacks calling router.replace with
ROUTES.PROCESSING) to perform the correct distinct actions (e.g., navigate back,
open player, or show a toast) so the UI behavior matches user expectations.
apps/mobile-app/src/components/media/UploadModals.tsx (3)

52-55: Remove commented-out navigation code.

The commented router.replace(ROUTES.HOME) should be removed if the current behavior (just closing the modal) is intentional.

Suggested fix
   /** Closes the YouTube modal and goes back to library */
   const handleCloseYT = () => {
     setYtVisible(false);
-    // router.replace(ROUTES.HOME);
   };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/components/media/UploadModals.tsx` around lines 52 - 55,
The commented-out navigation line inside handleCloseYT should be removed to
avoid dead code; edit the handleCloseYT function (which currently calls
setYtVisible(false)) and delete the commented `router.replace(ROUTES.HOME)` so
the handler only closes the modal without unused/commented navigation code.

25-26: Remove commented-out code.

Leftover commented code (sheetVisible state) should be removed if no longer needed.

Suggested fix
-  // const [sheetVisible, setSheetVisible] = useState(true);
   const [ytVisible, setYtVisible] = useState(false);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/components/media/UploadModals.tsx` around lines 25 - 26,
Remove the leftover commented-out state declaration for sheetVisible to keep the
component clean: delete the line "// const [sheetVisible, setSheetVisible] =
useState(true);" in UploadModals.tsx so only the active state declarations like
"const [ytVisible, setYtVisible] = useState(false);" remain; no other changes
are required.

64-67: Avoid as any type assertion.

The as any cast bypasses type checking for route params. Consider defining proper route param types for type safety.

Suggested approach

Define route params in your route types:

// In your route types file
export type ProcessingRouteParams = {
  id: string;
};

Then use typed navigation:

       router.replace({
         pathname: ROUTES.PROCESSING,
         params: { id: newItem.id },
-      } as any);
+      });

This requires ensuring your route configuration properly types the ROUTES.PROCESSING route params.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/components/media/UploadModals.tsx` around lines 64 - 67,
Remove the unsafe "as any" on the router.replace call and instead declare and
use a proper typed route params type for the processing route (e.g., declare a
ProcessingRouteParams = { id: string } in your route types), update your route
configuration to associate ROUTES.PROCESSING with that params type, and then
call router.replace with the correctly typed params object (router.replace({
pathname: ROUTES.PROCESSING, params: { id: newItem.id } })) so TypeScript
enforces that newItem.id matches the expected string id type; keep references to
ROUTES.PROCESSING, router.replace and newItem.id when applying the change.
apps/ai-engine/src/core/smart_aligner.py (3)

329-334: Remove redundant import re.

The re module is already imported at the module level (line 16). This local import is unnecessary.

Suggested fix
     def _split_cjk_words(self, sentence: Sentence):
         """Pillar 3: Split CJK words into characters."""
-        import re
-
         new_words = []
         cjk_pattern = re.compile(r"[\u4e00-\u9fff]")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/smart_aligner.py` around lines 329 - 334, The local
import of the regular expression module inside _split_cjk_words is redundant;
remove the line "import re" from the _split_cjk_words method so the function
uses the module-level re already imported, leaving the cjk_pattern =
re.compile(...) and rest of the function unchanged.

154-156: Unused variable sr from librosa.load.

The sample rate is returned but never used. Use underscore prefix to indicate intentional discard.

Suggested fix
-        audio_full, sr = librosa.load(str(path), sr=16000)
+        audio_full, _sr = librosa.load(str(path), sr=16000)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/smart_aligner.py` around lines 154 - 156, The call to
librosa.load returns a sample rate into variable sr which is never used; replace
the unused variable name sr with _sr (or prefix it with an underscore) where
librosa.load is called so intent to discard is clear — locate the librosa.load
invocation in smart_aligner.py (the line assigning audio_full, sr) and change
the second assigned variable to _sr.

236-244: Variable shadowing: sent reused in nested loop.

The inner loop reuses sent from the outer for sent in res_sentences loop. While functionally correct, this shadows the outer variable and reduces readability.

Suggested fix for clarity
                 if split_sents:
                     previous_text_context += " " + split_sents[-1].text
                     detected_lang = transcription_result["info"].language
                     self._add_phonemes(split_sents, detected_lang)
-                    for sent in split_sents:
-                        sent.detected_lang = detected_lang
+                    for s in split_sents:
+                        s.detected_lang = detected_lang
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/smart_aligner.py` around lines 236 - 244, The inner
loop reuses the variable name "sent" and shadows the outer "sent" from the
surrounding loop; to fix, rename the inner-loop iteration variable (e.g., to
"sub_sent" or "split_sent") in the block that assigns detected_lang and sets
attributes (the for sent in split_sents loop) and update references inside that
loop accordingly, leaving the outer loop variable intact; this preserves
readability while keeping logic in self._add_phonemes(split_sents,
detected_lang), the assignments sent.detected_lang,
sentences.extend(split_sents), and pending_chunk.extend(split_sents unchanged.
apps/ai-engine/src/events.py (1)

46-47: Broad exception catching is intentional but could be more specific.

The static analysis flags except Exception as too broad (Ruff BLE001). While the defensive approach is reasonable for fire-and-forget event publishing (preventing pipeline crashes), consider catching (redis.RedisError, TypeError, ValueError) to be more explicit about expected failures (Redis errors and JSON serialization issues).

♻️ Optional: More specific exception handling
-    except Exception as e:
+    except (redis.RedisError, TypeError, ValueError) as e:
         logger.error(f"Failed to publish progress event for {media_id}: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/events.py` around lines 46 - 47, Replace the broad "except
Exception as e" in the publish/progress event publishing block with a targeted
exception tuple like "except (redis.RedisError, TypeError, ValueError) as e" to
satisfy the linter and explicitly handle Redis and JSON/serialization errors;
also ensure "redis" is imported at top of the module and keep the existing
logger.error call (referencing logger and media_id) unchanged so the error
details are still logged.
apps/ai-engine/src/scripts/test_v2_pipeline.py (1)

786-799: Clarify why module reload is necessary after patching.

The pattern of importlib.reload(async_mod) after patching db_mod and events_mod is necessary because the async_pipeline module imports these at load time. Consider adding a brief comment explaining this for maintainers.

     events_mod.publish_batch_ready = _fake_publish_batch_ready

+    # Reload async_mod so it picks up the patched publish_* functions
+    # (it imports them at module load time)
     async_mod = importlib.import_module("src.async_pipeline")
     async_mod = importlib.reload(async_mod)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/scripts/test_v2_pipeline.py` around lines 786 - 799, Add a
brief inline comment above the importlib.reload(async_mod) call explaining that
reload is necessary because src.async_pipeline (referenced here as async_mod)
imports db_mod and events_mod at module load time, so after patching those
modules we must reload async_mod to pick up the patched objects; mention
importlib.reload(async_mod), db_mod, and events_mod by name so future
maintainers understand the dependency and ordering.
apps/backend-api/src/modules/socket/socket.gateway.ts (2)

131-147: Verify whether unauthenticated sockets should be allowed to leave rooms.

The media_leave handler does not check getAuthenticatedUserId(client) before allowing the socket to leave. While leaving a room is generally safe, an unauthenticated socket shouldn't have been able to join a media room in the first place. Consider adding an auth check for consistency with media_join.

🛡️ Optional: Add authentication check for consistency
 `@SubscribeMessage`('media_leave')
 async handleLeaveMedia(
   `@ConnectedSocket`() client: Socket,
   `@MessageBody`() payload: MediaRoomPayload,
 ): Promise<RoomSubscriptionAck> {
+  const userId = this.getAuthenticatedUserId(client);
+  if (!userId) {
+    return { ok: false, error: 'Unauthenticated socket' };
+  }
+
   const mediaId = payload.mediaId?.trim();
   if (!mediaId) {
     return { ok: false, error: 'mediaId is required' };
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/socket/socket.gateway.ts` around lines 131 -
147, The media_leave handler (handleLeaveMedia) currently lets any socket leave
a room without verifying authentication; mirror the media_join behavior by
calling getAuthenticatedUserId(client) at the start of handleLeaveMedia and
return { ok: false, error: 'unauthenticated' } (or the existing error shape) if
no user id is present, then proceed to compute room via getMediaRoom(mediaId),
await client.leave(roomName), log and return; this keeps join/leave auth
consistent and uses the same error handling flow as media_join.

26-32: Consider restricting CORS origin for production.

Setting origin: '*' allows connections from any domain. For a production deployment, this should be restricted to your trusted frontend domains to prevent unauthorized WebSocket connections.

🔒 Proposed configuration using environment variable
 `@WebSocketGateway`({
   cors: {
-    origin: '*',
+    origin: process.env.CORS_ORIGINS?.split(',') || '*',
   },
   pingInterval: 10000,
   pingTimeout: 5000,
 })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/socket/socket.gateway.ts` around lines 26 - 32,
The WebSocketGateway decorator currently sets cors.origin: '*' which is insecure
for production; update the WebSocketGateway configuration in socket.gateway.ts
(the `@WebSocketGateway`({ ... }) block) to read allowed origins from an
environment variable (e.g., process.env.ALLOWED_WS_ORIGINS) and use that value
(or a parsed array / whitelist check or function that returns true only for
matching origins) instead of '*', falling back to a safe default for
development; ensure the cors.origin field is set to the parsed whitelist or
origin-check function so only trusted frontend domains can connect.
apps/ai-engine/tests/test_first_batch_streaming.py (1)

147-152: Use ClassVar annotation or instance-level tracking for mutable class attribute.

The class-level instances: list["FakeMinioClient"] = [] is a mutable default shared across all instances. While the test at line 248 clears it, this pattern is fragile and can cause test pollution if tests run in unexpected order or parallel.

🛠️ Proposed fix using ClassVar annotation
+from typing import ClassVar
+
 class FakeMinioClient:
-    instances: list["FakeMinioClient"] = []
+    instances: ClassVar[list["FakeMinioClient"]] = []
 
     def __init__(self) -> None:
         self.uploads: list[dict] = []
         FakeMinioClient.instances.append(self)

The ClassVar annotation makes the intent explicit and satisfies Ruff's RUF012 rule. Alternatively, consider using a pytest fixture to manage instance tracking per test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/tests/test_first_batch_streaming.py` around lines 147 - 152,
The class attribute instances on FakeMinioClient is a mutable list shared across
instances which can cause test pollution; update the declaration to explicitly
annotate it as a ClassVar (e.g., instances: ClassVar[list["FakeMinioClient"]])
to indicate module-level/stateful intent and satisfy RUF012, or alternatively
remove the shared class list and manage per-test instance tracking (e.g., make
instances an instance attribute or use a pytest fixture) while keeping
references to FakeMinioClient and its instances attribute so the tests still
clear or inspect it appropriately.
apps/ai-engine/src/schemas.py (1)

116-165: Consider consolidating ContextAnalysisResult and ContextAnalysis.

These two models serve similar purposes (context analysis output) with slightly different field structures. Having both may cause confusion for maintainers.

If ContextAnalysisResult is for backward compatibility with existing LLM prompts, consider adding a comment explaining when to use each. Otherwise, consider deprecating one in favor of the other.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/schemas.py` around lines 116 - 165, ContextAnalysisResult
and ContextAnalysis overlap and should be consolidated or clarified: either
deprecate ContextAnalysisResult (add a deprecation comment/DeprecationWarning in
its docstring) and provide a conversion helper (e.g.,
ContextAnalysis.from_result or ContextAnalysisResult.to_context_analysis) that
maps detected_style→detected_style, summary→summary, keywords→keywords, and
detected_pronouns→language_specific["pronouns"]; or keep both but add a clear
comment above ContextAnalysisResult explaining it exists only for backward
compatibility with LLM prompts and show the explicit field mapping to
ContextAnalysis so maintainers know when to use each.
checkpoint.md (1)

447-458: Add language specifier to fenced code block.

The MinIO storage convention code block should have a language specifier for proper rendering.

📝 Proposed fix
-```
+```text
 processed/{mediaId}/
 ├── chunks/                    # Tier 1: raw transcription (from SmartAligner)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@checkpoint.md` around lines 447 - 458, Update the fenced code block showing
the MinIO storage convention to include a language specifier by changing the
opening fence from ``` to ```text so it renders properly; locate the block that
begins with the tree starting "processed/{mediaId}/" (contains "chunks/",
"translated_batches/", "final.json") and add "text" after the opening backticks.
apps/mobile-app/src/services/socket.service.ts (2)

100-105: disconnect() should clear connection state completely.

If the duplicate-connection guard fix is applied, ensure disconnect() also resets the connecting flag.

♻️ Proposed fix
   disconnect() {
     if (this.socket) {
       this.socket.disconnect();
       this.socket = null;
     }
+    this.connecting = false;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/services/socket.service.ts` around lines 100 - 105, The
disconnect() method currently nulls this.socket but doesn't reset the connection
state; update disconnect() (in the same class where this.socket and the
connecting flag are defined) to also set this.connecting = false (and clear any
related connection state such as pending reconnection timers or listeners if
present) so the duplicate-connection guard can work correctly after a
disconnect.

79-97: Consider block-bodied callbacks to avoid lint warnings.

The concise arrow (cb) => cb(event) returns a value which can trigger useIterableCallbackReturn lint rules. Using block-bodied callbacks is more explicit.

♻️ Proposed fix
     this.socket.on("media_progress", (event: MediaProgressEvent) => {
-      this.progressListeners.forEach((cb) => cb(event));
+      this.progressListeners.forEach((cb) => { cb(event); });
     });

     this.socket.on("media_chunk_ready", (event: MediaChunkReadyEvent) => {
-      this.chunkReadyListeners.forEach((cb) => cb(event));
+      this.chunkReadyListeners.forEach((cb) => { cb(event); });
     });
     // ... apply same pattern to other handlers
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/services/socket.service.ts` around lines 79 - 97, The
inline arrow callbacks in the socket.on handlers (e.g., the handlers for
"media_progress", "media_chunk_ready", "media_batch_ready", "media_completed",
"media_failed") return the result of cb(event) which triggers the lint rule;
change each listener iteration from a concise arrow (cb => cb(event)) to a
block-bodied callback that calls cb(event) without returning a value (e.g., (cb)
=> { cb(event); }) for progressListeners, chunkReadyListeners,
batchReadyListeners, completedListeners, and failedListeners to satisfy
useIterableCallbackReturn.
apps/ai-engine/src/core/llm_provider.py (1)

147-147: Prefer next(iter(...)) over single element slice.

More efficient for extracting the first key.

♻️ Proposed fix
-                first_key = list(parsed[0].keys())[0]
+                first_key = next(iter(parsed[0].keys()))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/llm_provider.py` at line 147, Replace the
single-element list allocation used to get the first key (first_key =
list(parsed[0].keys())[0]) with an iterator-based access to avoid building a new
list; e.g., use next(iter(parsed[0].keys())) or simply next(iter(parsed[0])) in
the same scope where parsed is used (e.g., inside the function or method that
contains first_key) so the first key is extracted without unnecessary
allocation.
apps/ai-engine/src/scripts/eval_nllb.py (2)

88-94: Hardcoded debug batch path may not exist on other environments.

The path references a specific UUID that likely only exists on your development machine. While load_real_sentences handles the missing file gracefully with a warning, consider making this configurable via CLI argument or environment variable for broader usability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/scripts/eval_nllb.py` around lines 88 - 94,
BATCH_JSON_PATH is hardcoded to a single debug UUID path; make this configurable
by adding a CLI argument (e.g., --batch-json) or reading an environment variable
(e.g., BATCH_JSON_PATH) and falling back to the current default value so other
environments can supply their own file; update the code that defines
BATCH_JSON_PATH and the code path that calls load_real_sentences to use the
provided value (or default) and keep the existing warning behavior if the file
is missing.

226-226: Consider adding strict=True to zip() for defensive consistency.

While translate_batch already asserts 1:1 mapping, adding strict=True makes the contract explicit at the call site and aligns with modern Python best practices.

♻️ Proposed fix
-    for idx, (src, tgt) in enumerate(zip(sentences, translations), start=1):
+    for idx, (src, tgt) in enumerate(zip(sentences, translations, strict=True), start=1):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/scripts/eval_nllb.py` at line 226, The for-loop that
iterates over zip(sentences, translations) in the evaluation loop should use
zip(..., strict=True) to enforce the 1:1 mapping at the call site (even though
translate_batch asserts it); update the loop in the function handling evaluation
(the enumerate over sentences and translations) to call zip(sentences,
translations, strict=True) so mismatched lengths raise immediately and make the
contract explicit.
apps/ai-engine/tests/test_streaming_contracts.py (1)

398-402: Consider adding strict=True to the final zip() for defensive testing.

While the test already validates len(matched) == 3, adding strict=True would make the test fail immediately if the segment counts diverge, providing a clearer error message.

♻️ Proposed fix
-    for batch_seg, matched_seg in zip(batch.segments, matched):
+    for batch_seg, matched_seg in zip(batch.segments, matched, strict=True):
         assert batch_seg.segment_index == matched_seg.segment_index
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/tests/test_streaming_contracts.py` around lines 398 - 402, Add
the strict=True parameter to the zip used in the loop that compares
batch.segments and matched so the loop becomes zip(batch.segments, matched,
strict=True); this enforces equal lengths and will fail fast if counts
diverge—update the for loop that currently reads for batch_seg, matched_seg in
zip(batch.segments, matched): to use strict=True on the zip call.
apps/mobile-app/src/types/media.ts (1)

33-54: Extract a shared artifact base interface.

These three DTOs repeat objectKey, url, size, and lastModified. Pulling those into one base shape will reduce drift if the backend contract changes later.

♻️ Suggested refactor
+interface MediaArtifactBase {
+  objectKey: string;
+  url: string;
+  size: number;
+  lastModified: string | null;
+}
+
-export interface MediaChunkArtifact {
+export interface MediaChunkArtifact extends MediaArtifactBase {
   chunkIndex: number;
-  objectKey: string;
-  url: string;
-  size: number;
-  lastModified: string | null;
 }
 
-export interface MediaTranslatedBatchArtifact {
+export interface MediaTranslatedBatchArtifact extends MediaArtifactBase {
   batchIndex: number;
-  objectKey: string;
-  url: string;
-  size: number;
-  lastModified: string | null;
 }
 
-export interface MediaFinalArtifact {
-  objectKey: string;
-  url: string;
-  size: number;
-  lastModified: string | null;
-}
+export interface MediaFinalArtifact extends MediaArtifactBase {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/types/media.ts` around lines 33 - 54, Extract the
repeated fields into a new base interface (e.g., MediaArtifactBase) containing
objectKey, url, size, and lastModified, and have MediaChunkArtifact,
MediaTranslatedBatchArtifact, and MediaFinalArtifact extend that base; update
MediaChunkArtifact to only declare chunkIndex, MediaTranslatedBatchArtifact to
only declare batchIndex, and remove the duplicated fields from those three
interfaces so they inherit the shared shape.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/ai-engine/src/async_pipeline.py`:
- Around line 390-431: The loop that applies translations uses zip(sentences,
final_translations) which can silently drop items if the lengths differ; update
the loop to use zip(..., strict=True) so it raises a ValueError immediately on
length mismatch between sentences and final_translations (reference: the for
loop pairing sentences and final_translations, and variables sentences,
final_translations, and zip), and ensure any surrounding try/except (if present)
lets that error surface or is handled where appropriate so mismatches are caught
early.

In `@apps/ai-engine/src/core/llm_provider.py`:
- Line 60: The assignment to user_msg uses an unnecessary f-string prefix:
change the line that sets user_msg (user_msg = f"Text Samples:\n" +
"\n".join(text_samples)) to use a plain string literal without the f prefix so
it becomes a normal concatenation (i.e., remove the leading "f" from the string
literal assigned to user_msg).
- Line 178: The generate signature uses a default None for system_prompt but
lacks an explicit Optional annotation; update the function signature for
generate in llm_provider.py to annotate system_prompt as Optional[str] (or str |
None for Python 3.10+) so the type reflects the None default, e.g., change def
generate(self, prompt: str, system_prompt: str = None) -> str to use the
explicit Optional/union type and adjust any imports (typing.Optional) if needed.

In `@apps/ai-engine/src/core/nmt_translator.py`:
- Around line 141-153: The code currently falls back to passing raw
source_lang/target_lang strings to NLLB_LANG_MAP, which can silently allow
invalid Flores-200 tokens; update NMTTranslator to normalize and validate
language codes before mapping: implement normalization of source_lang and
target_lang (e.g., lowercasing and stripping regional variants like splitting on
'-' or '_', mapping "en-US" -> "en"), then lookup normalized codes in
NLLB_LANG_MAP (referencing NLLB_LANG_MAP, source_lang, target_lang, src_code,
tgt_code) and if a normalized code is not present raise a clear exception (e.g.,
ValueError) or return a descriptive error instead of passing the raw token to
the model so failures are explicit and early.

In `@apps/ai-engine/src/core/prompts.py`:
- Around line 124-132: The refinement prompt has a contradiction between the
“preserve proper nouns and key terms” rule and the strict ban on source-language
characters; update the prompt template so rule 6 allows unchanged proper
nouns/brand names and key terms when they appear verbatim and use the same
script as the target language while still forbidding other source-language
content. Concretely, modify the Rules block (the prompt string that contains
"Rules:" and the {pronoun_rule} placeholder) to change rule 6 into a conditional
allowance like “6. Output must be PURE target language — except unchanged proper
nouns, brand names, and key terms that appear verbatim in the same script as the
target language; all other source-language characters are prohibited.” Ensure
the change references the existing preservation rule (rule 4) so the two rules
are consistent and update any tests or examples that assert the previous
absolute ban.

In `@apps/ai-engine/src/db.py`:
- Around line 108-119: The try/except in update_media_status is swallowing DB
write failures (the block using _get_db_pool(), pool.getconn(), cur.execute(sql,
values), media_id and logger) so callers may broadcast terminal events despite
Postgres not being updated; change this block to either implement a small retry
loop with backoff around cur.execute(sql, values) (and still putconn in the
finally) or, if retries fail, re-raise the exception after logging so the caller
sees the failure instead of continuing; ensure update_media_status surfaces
errors (raises) so the worker can avoid emitting COMPLETED/FAILED when the DB
write failed.

In `@apps/backend-api/scripts/sync-schema.ts`:
- Around line 5-9: The path resolution for sourcePath and targetPath currently
uses process.cwd(), which is fragile; change the script to compute a scriptDir
anchored to the file's location (e.g., via import.meta.url or
fileURLToPath(import.meta.url) + path.dirname) and then build sourcePath and
targetPath relative to that scriptDir instead of process.cwd(), updating any
references to sourcePath/targetPath to use the new anchored paths.

In `@apps/backend-api/src/modules/media/media-artifact-summary.ts`:
- Around line 91-96: Replace the manual guards that compute
safeChunkIndex/safeBatchIndex with the existing toNonNegativeInteger() helper to
avoid propagating NaN/Infinity into summaries: wherever chunkIndex or batchIndex
is currently normalized via Math.trunc/Math.max (e.g., the computation of
safeChunkIndex and subsequent uses updating chunkCount and latestChunkIndex, and
the analogous block around lines 105-113 for batchIndex), call
toNonNegativeInteger(chunkIndex) / toNonNegativeInteger(batchIndex) and use
those results to compute chunkCount, latestChunkIndex, batchCount, and
latestBatchIndex so non-finite inputs are converted to 0 and cannot corrupt the
summary object.

In `@apps/backend-api/src/modules/socket/socket.service.ts`:
- Around line 59-68: handleMediaUpdate currently calls JSON.parse(message) which
can throw a SyntaxError; wrap the parse in a try/catch inside handleMediaUpdate,
and on catch call this.logger.warn (or error) with a descriptive message
including the redacted payload via redactMediaPayload(message or parsed attempt)
and the error details, then return early so parseMediaEvent is not called;
ensure references to parseMediaEvent and redactMediaPayload remain unchanged.

In `@apps/mobile-app/src/components/media/ProcessingProgressRing.tsx`:
- Around line 30-36: withAlpha currently only appends an alpha hex to
7-character hex strings and will produce invalid colors for other formats;
update withAlpha (used by ProcessingProgressRing) to safely handle formats:
detect and expand 4-char hex (`#RGB` -> `#RRGGBB`) then append/replace alpha for 6-
or 8-char hex, preserve rgb()/rgba()/hsl() strings (or convert using a color
library), and return the original color unchanged if unsupported; alternatively
replace the helper with a small color utility (e.g., tinycolor2 or polished) to
parse any CSS color, set the desired alpha, and return a valid CSS color string.

---

Outside diff comments:
In `@apps/mobile-app/src/hooks/useMedia.ts`:
- Around line 157-170: Add HTTP response validation for the two fetch calls
before calling mediaService.confirmUpload: after the initial fetch(file.uri)
check fileResponse.ok (and handle/throw/log a descriptive error referencing
file.uri or file.name) and similarly check the PUT response from
fetch(uploadUrl, { method: "PUT", ... }) for ok before proceeding; if either
response is not ok, abort and do not call confirmUpload (the code around
fileResponse, fileBlob, uploadUrl, and mediaService.confirmUpload should perform
these guards and surface the error).

---

Duplicate comments:
In `@apps/mobile-app/src/app/`(app)/processing.tsx:
- Around line 73-75: When rendering the processing UI, guard against missing id
or missing media by returning an explicit empty/error state when id is falsy or
when the status query finished without media (i.e., !id || (!isLoading &&
!media)); update the logic around useMediaStatus and useMediaArtifacts so you
don't fall through to the queued UI when status is undefined, and apply the same
guard to the other processing render branch that currently shows the queued
footer/actions (the second block that uses status to decide queued vs
processing). Ensure you check id and media before using status and render a
clear empty/error component instead of the processing layout.

In `@apps/mobile-app/src/hooks/useMedia.ts`:
- Around line 60-65: socketFirstQueryOptions currently disables all refetching
(staleTime: Infinity and all refetch triggers false), making useMediaStatus()
and useMediaArtifacts() rely solely on sockets and risk freezing on missed
updates; modify socketFirstQueryOptions to allow a low-frequency recovery
refetch (e.g., set staleTime to a finite value like 30_000 ms) or re-enable
refetchOnReconnect/refetchOnWindowFocus (or both), and if possible make the
shorter staleTime or re-enable triggers conditional for non-terminal items only
so terminal items remain stable; apply the same change pattern where
socketFirstQueryOptions is reused (also referenced in the other block around
useMediaStatus/useMediaArtifacts).

---

Nitpick comments:
In `@apps/ai-engine/src/config.py`:
- Around line 119-122: Add startup validation for the NMT model directory so
runtime failures are avoided: in config.py update create_dirs() or add a new
validate_nmt_model() method to check Path NMT_MODEL_DIR.exists() and, if
missing, log a clear error/warning (using the project logger) and either raise a
ConfigError or return a failure state so translator initialization (where the
NMT translator is constructed) can surface a user-friendly message; reference
NMT_MODEL_DIR, create_dirs(), and the translator initialization code path to
ensure the check runs before the translator is instantiated.

In `@apps/ai-engine/src/core/llm_provider.py`:
- Line 147: Replace the single-element list allocation used to get the first key
(first_key = list(parsed[0].keys())[0]) with an iterator-based access to avoid
building a new list; e.g., use next(iter(parsed[0].keys())) or simply
next(iter(parsed[0])) in the same scope where parsed is used (e.g., inside the
function or method that contains first_key) so the first key is extracted
without unnecessary allocation.

In `@apps/ai-engine/src/core/smart_aligner.py`:
- Around line 329-334: The local import of the regular expression module inside
_split_cjk_words is redundant; remove the line "import re" from the
_split_cjk_words method so the function uses the module-level re already
imported, leaving the cjk_pattern = re.compile(...) and rest of the function
unchanged.
- Around line 154-156: The call to librosa.load returns a sample rate into
variable sr which is never used; replace the unused variable name sr with _sr
(or prefix it with an underscore) where librosa.load is called so intent to
discard is clear — locate the librosa.load invocation in smart_aligner.py (the
line assigning audio_full, sr) and change the second assigned variable to _sr.
- Around line 236-244: The inner loop reuses the variable name "sent" and
shadows the outer "sent" from the surrounding loop; to fix, rename the
inner-loop iteration variable (e.g., to "sub_sent" or "split_sent") in the block
that assigns detected_lang and sets attributes (the for sent in split_sents
loop) and update references inside that loop accordingly, leaving the outer loop
variable intact; this preserves readability while keeping logic in
self._add_phonemes(split_sents, detected_lang), the assignments
sent.detected_lang, sentences.extend(split_sents), and
pending_chunk.extend(split_sents unchanged.

In `@apps/ai-engine/src/events.py`:
- Around line 46-47: Replace the broad "except Exception as e" in the
publish/progress event publishing block with a targeted exception tuple like
"except (redis.RedisError, TypeError, ValueError) as e" to satisfy the linter
and explicitly handle Redis and JSON/serialization errors; also ensure "redis"
is imported at top of the module and keep the existing logger.error call
(referencing logger and media_id) unchanged so the error details are still
logged.

In `@apps/ai-engine/src/schemas.py`:
- Around line 116-165: ContextAnalysisResult and ContextAnalysis overlap and
should be consolidated or clarified: either deprecate ContextAnalysisResult (add
a deprecation comment/DeprecationWarning in its docstring) and provide a
conversion helper (e.g., ContextAnalysis.from_result or
ContextAnalysisResult.to_context_analysis) that maps
detected_style→detected_style, summary→summary, keywords→keywords, and
detected_pronouns→language_specific["pronouns"]; or keep both but add a clear
comment above ContextAnalysisResult explaining it exists only for backward
compatibility with LLM prompts and show the explicit field mapping to
ContextAnalysis so maintainers know when to use each.

In `@apps/ai-engine/src/scripts/eval_nllb.py`:
- Around line 88-94: BATCH_JSON_PATH is hardcoded to a single debug UUID path;
make this configurable by adding a CLI argument (e.g., --batch-json) or reading
an environment variable (e.g., BATCH_JSON_PATH) and falling back to the current
default value so other environments can supply their own file; update the code
that defines BATCH_JSON_PATH and the code path that calls load_real_sentences to
use the provided value (or default) and keep the existing warning behavior if
the file is missing.
- Line 226: The for-loop that iterates over zip(sentences, translations) in the
evaluation loop should use zip(..., strict=True) to enforce the 1:1 mapping at
the call site (even though translate_batch asserts it); update the loop in the
function handling evaluation (the enumerate over sentences and translations) to
call zip(sentences, translations, strict=True) so mismatched lengths raise
immediately and make the contract explicit.

In `@apps/ai-engine/src/scripts/test_v2_pipeline.py`:
- Around line 786-799: Add a brief inline comment above the
importlib.reload(async_mod) call explaining that reload is necessary because
src.async_pipeline (referenced here as async_mod) imports db_mod and events_mod
at module load time, so after patching those modules we must reload async_mod to
pick up the patched objects; mention importlib.reload(async_mod), db_mod, and
events_mod by name so future maintainers understand the dependency and ordering.

In `@apps/ai-engine/tests/test_first_batch_streaming.py`:
- Around line 147-152: The class attribute instances on FakeMinioClient is a
mutable list shared across instances which can cause test pollution; update the
declaration to explicitly annotate it as a ClassVar (e.g., instances:
ClassVar[list["FakeMinioClient"]]) to indicate module-level/stateful intent and
satisfy RUF012, or alternatively remove the shared class list and manage
per-test instance tracking (e.g., make instances an instance attribute or use a
pytest fixture) while keeping references to FakeMinioClient and its instances
attribute so the tests still clear or inspect it appropriately.

In `@apps/ai-engine/tests/test_streaming_contracts.py`:
- Around line 398-402: Add the strict=True parameter to the zip used in the loop
that compares batch.segments and matched so the loop becomes zip(batch.segments,
matched, strict=True); this enforces equal lengths and will fail fast if counts
diverge—update the for loop that currently reads for batch_seg, matched_seg in
zip(batch.segments, matched): to use strict=True on the zip call.

In `@apps/backend-api/src/modules/media/workers/media.processor.ts`:
- Around line 117-123: The aiPayload creation is passing job.data.targetLanguage
directly without validation or local default; either validate/normalize it here
or add a clear inline comment stating that the "targetLanguage" default ('vi')
is applied downstream in the AI engine/queue types; update the
AiProcessingJobPayload population around aiPayload (and reference
job.data.targetLanguage) to either set a local default/validation or add the
comment indicating where the default originates (queue.types.ts / AI engine) so
future maintainers know why it's omitted here.

In `@apps/backend-api/src/modules/socket/socket.gateway.ts`:
- Around line 131-147: The media_leave handler (handleLeaveMedia) currently lets
any socket leave a room without verifying authentication; mirror the media_join
behavior by calling getAuthenticatedUserId(client) at the start of
handleLeaveMedia and return { ok: false, error: 'unauthenticated' } (or the
existing error shape) if no user id is present, then proceed to compute room via
getMediaRoom(mediaId), await client.leave(roomName), log and return; this keeps
join/leave auth consistent and uses the same error handling flow as media_join.
- Around line 26-32: The WebSocketGateway decorator currently sets cors.origin:
'*' which is insecure for production; update the WebSocketGateway configuration
in socket.gateway.ts (the `@WebSocketGateway`({ ... }) block) to read allowed
origins from an environment variable (e.g., process.env.ALLOWED_WS_ORIGINS) and
use that value (or a parsed array / whitelist check or function that returns
true only for matching origins) instead of '*', falling back to a safe default
for development; ensure the cors.origin field is set to the parsed whitelist or
origin-check function so only trusted frontend domains can connect.

In `@apps/backend-api/src/modules/socket/socket.types.ts`:
- Around line 74-76: isRecord currently returns true for arrays because typeof
[] === 'object'; update the isRecord type guard to explicitly reject arrays by
adding an Array.isArray check (e.g., return typeof value === 'object' && value
!== null && !Array.isArray(value)) so that isRecord(value: unknown): value is
UnknownRecord only passes for plain objects; adjust any callers relying on
arrays being accepted if needed.

In `@apps/mobile-app/src/app/`(app)/player.tsx:
- Around line 11-89: PlayerPlaceholderScreen currently has both the back
Pressable and the primary action Pressable calling router.replace to
ROUTES.PROCESSING, making the screen behave like an immediate redirect or purely
informational; if this is intentional (a temporary stub), add a clear code
comment above the PlayerPlaceholderScreen function explaining it's a placeholder
for a future player and why both controls return to ROUTES.PROCESSING, otherwise
update one or both handlers (the two onPress callbacks calling router.replace
with ROUTES.PROCESSING) to perform the correct distinct actions (e.g., navigate
back, open player, or show a toast) so the UI behavior matches user
expectations.

In `@apps/mobile-app/src/components/media/UploadModals.tsx`:
- Around line 52-55: The commented-out navigation line inside handleCloseYT
should be removed to avoid dead code; edit the handleCloseYT function (which
currently calls setYtVisible(false)) and delete the commented
`router.replace(ROUTES.HOME)` so the handler only closes the modal without
unused/commented navigation code.
- Around line 25-26: Remove the leftover commented-out state declaration for
sheetVisible to keep the component clean: delete the line "// const
[sheetVisible, setSheetVisible] = useState(true);" in UploadModals.tsx so only
the active state declarations like "const [ytVisible, setYtVisible] =
useState(false);" remain; no other changes are required.
- Around line 64-67: Remove the unsafe "as any" on the router.replace call and
instead declare and use a proper typed route params type for the processing
route (e.g., declare a ProcessingRouteParams = { id: string } in your route
types), update your route configuration to associate ROUTES.PROCESSING with that
params type, and then call router.replace with the correctly typed params object
(router.replace({ pathname: ROUTES.PROCESSING, params: { id: newItem.id } })) so
TypeScript enforces that newItem.id matches the expected string id type; keep
references to ROUTES.PROCESSING, router.replace and newItem.id when applying the
change.

In `@apps/mobile-app/src/hooks/useLanguagePreference.ts`:
- Around line 57-69: The effect in useEffect currently lists i18n in its
dependency array but the effect only calls hydrateLanguagePreference() (which
relies on the module-level i18nInstance) and therefore should not re-run when
the useTranslation() i18n ref changes; remove i18n from the dependency array and
use an empty array [] so hydrateLanguagePreference() runs once on mount (keep
the isMounted cleanup logic as-is), referencing the useEffect,
hydrateLanguagePreference, and i18n symbols to locate the change.

In `@apps/mobile-app/src/hooks/useThrottle.ts`:
- Line 57: The JSDoc for useThrottle should warn that the bypassFn dependency
must be stable to avoid re-running the effect on every render; update the
comment for the useThrottle hook (and mention bypassFn) to advise callers to
memoize bypassFn with useCallback or that the hook could internally stabilize it
(e.g., store it in a ref) to prevent unintended throttle resets. Include a short
example suggestion: "memoize bypassFn with useCallback" and note the current
dependency array ([value, interval, bypassFn]) relies on bypassFn stability.

In `@apps/mobile-app/src/services/socket.service.ts`:
- Around line 100-105: The disconnect() method currently nulls this.socket but
doesn't reset the connection state; update disconnect() (in the same class where
this.socket and the connecting flag are defined) to also set this.connecting =
false (and clear any related connection state such as pending reconnection
timers or listeners if present) so the duplicate-connection guard can work
correctly after a disconnect.
- Around line 79-97: The inline arrow callbacks in the socket.on handlers (e.g.,
the handlers for "media_progress", "media_chunk_ready", "media_batch_ready",
"media_completed", "media_failed") return the result of cb(event) which triggers
the lint rule; change each listener iteration from a concise arrow (cb =>
cb(event)) to a block-bodied callback that calls cb(event) without returning a
value (e.g., (cb) => { cb(event); }) for progressListeners, chunkReadyListeners,
batchReadyListeners, completedListeners, and failedListeners to satisfy
useIterableCallbackReturn.

In `@apps/mobile-app/src/types/media.ts`:
- Around line 33-54: Extract the repeated fields into a new base interface
(e.g., MediaArtifactBase) containing objectKey, url, size, and lastModified, and
have MediaChunkArtifact, MediaTranslatedBatchArtifact, and MediaFinalArtifact
extend that base; update MediaChunkArtifact to only declare chunkIndex,
MediaTranslatedBatchArtifact to only declare batchIndex, and remove the
duplicated fields from those three interfaces so they inherit the shared shape.

In `@checkpoint.md`:
- Around line 447-458: Update the fenced code block showing the MinIO storage
convention to include a language specifier by changing the opening fence from
``` to ```text so it renders properly; locate the block that begins with the
tree starting "processed/{mediaId}/" (contains "chunks/", "translated_batches/",
"final.json") and add "text" after the opening backticks.

Comment on lines +390 to +431
# ── Step B: Context analysis (once per job, on first chunk) ──
if not context_analyzed:
context_analyzed = True
try:
sample = texts[:10]
ctx = llm.analyze_context(sample, tgt)
context_result = ctx
logger.info(
f"📊 Context analysis: style={ctx.detected_style}, "
f"pronouns={ctx.detected_pronouns}"
)
except Exception as e:
logger.warning(f"Context analysis failed (continuing without): {e}")
context_result = None

# ── Step C: LLM refinement (optional, best-effort) ──
final_translations = nmt_translations
if context_result is not None and src != tgt:
try:
t1 = _time.time()
refined = llm.refine_batch(
texts, nmt_translations, context_result, tgt
)
if refined is not None:
final_translations = refined
logger.info(
f"✨ LLM refinement batch {batch_index}: "
f"{len(sentences)} segments in {_time.time() - t1:.2f}s"
)
else:
logger.debug(
f"⏭️ Refinement returned None for batch {batch_index} "
f"— using NMT output"
)
except Exception as e:
logger.warning(
f"Refinement failed for batch {batch_index} — using NMT: {e}"
)

# Apply translations to sentences
for s, t in zip(sentences, final_translations):
s.translation = t
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add strict=True to zip() to catch length mismatches early.

The zip(sentences, final_translations) at line 430 should use strict=True to fail fast if the NMT or LLM refinement returns a different number of translations than input sentences.

🐛 Proposed fix
             # Apply translations to sentences
-            for s, t in zip(sentences, final_translations):
+            for s, t in zip(sentences, final_translations, strict=True):
                 s.translation = t

The broad Exception catches at lines 401 and 424 are acceptable here since they're for optional best-effort operations (context analysis, LLM refinement) with proper logging and fallback to NMT output.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# ── Step B: Context analysis (once per job, on first chunk) ──
if not context_analyzed:
context_analyzed = True
try:
sample = texts[:10]
ctx = llm.analyze_context(sample, tgt)
context_result = ctx
logger.info(
f"📊 Context analysis: style={ctx.detected_style}, "
f"pronouns={ctx.detected_pronouns}"
)
except Exception as e:
logger.warning(f"Context analysis failed (continuing without): {e}")
context_result = None
# ── Step C: LLM refinement (optional, best-effort) ──
final_translations = nmt_translations
if context_result is not None and src != tgt:
try:
t1 = _time.time()
refined = llm.refine_batch(
texts, nmt_translations, context_result, tgt
)
if refined is not None:
final_translations = refined
logger.info(
f"✨ LLM refinement batch {batch_index}: "
f"{len(sentences)} segments in {_time.time() - t1:.2f}s"
)
else:
logger.debug(
f"⏭️ Refinement returned None for batch {batch_index} "
f"— using NMT output"
)
except Exception as e:
logger.warning(
f"Refinement failed for batch {batch_index} — using NMT: {e}"
)
# Apply translations to sentences
for s, t in zip(sentences, final_translations):
s.translation = t
# ── Step B: Context analysis (once per job, on first chunk) ──
if not context_analyzed:
context_analyzed = True
try:
sample = texts[:10]
ctx = llm.analyze_context(sample, tgt)
context_result = ctx
logger.info(
f"📊 Context analysis: style={ctx.detected_style}, "
f"pronouns={ctx.detected_pronouns}"
)
except Exception as e:
logger.warning(f"Context analysis failed (continuing without): {e}")
context_result = None
# ── Step C: LLM refinement (optional, best-effort) ──
final_translations = nmt_translations
if context_result is not None and src != tgt:
try:
t1 = _time.time()
refined = llm.refine_batch(
texts, nmt_translations, context_result, tgt
)
if refined is not None:
final_translations = refined
logger.info(
f"✨ LLM refinement batch {batch_index}: "
f"{len(sentences)} segments in {_time.time() - t1:.2f}s"
)
else:
logger.debug(
f"⏭️ Refinement returned None for batch {batch_index} "
f"— using NMT output"
)
except Exception as e:
logger.warning(
f"Refinement failed for batch {batch_index} — using NMT: {e}"
)
# Apply translations to sentences
for s, t in zip(sentences, final_translations, strict=True):
s.translation = t
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 401-401: Do not catch blind exception: Exception

(BLE001)


[warning] 424-424: Do not catch blind exception: Exception

(BLE001)


[warning] 430-430: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/async_pipeline.py` around lines 390 - 431, The loop that
applies translations uses zip(sentences, final_translations) which can silently
drop items if the lengths differ; update the loop to use zip(..., strict=True)
so it raises a ValueError immediately on length mismatch between sentences and
final_translations (reference: the for loop pairing sentences and
final_translations, and variables sentences, final_translations, and zip), and
ensure any surrounding try/except (if present) lets that error surface or is
handled where appropriate so mismatches are caught early.

)

user_msg = f"Text Samples:\n" + "\n".join(text_samples)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix from string without placeholders.

This f-string has no interpolation expressions.

🔧 Proposed fix
-        user_msg = f"Text Samples:\n" + "\n".join(text_samples)
+        user_msg = "Text Samples:\n" + "\n".join(text_samples)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
user_msg = f"Text Samples:\n" + "\n".join(text_samples)
user_msg = "Text Samples:\n" + "\n".join(text_samples)
🧰 Tools
🪛 Ruff (0.15.6)

[error] 60-60: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/llm_provider.py` at line 60, The assignment to
user_msg uses an unnecessary f-string prefix: change the line that sets user_msg
(user_msg = f"Text Samples:\n" + "\n".join(text_samples)) to use a plain string
literal without the f prefix so it becomes a normal concatenation (i.e., remove
the leading "f" from the string literal assigned to user_msg).

texts: List[str],
context: ContextAnalysisResult
) -> List[str]:
def generate(self, prompt: str, system_prompt: str = None) -> str:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use explicit Optional or | None type annotation.

PEP 484 prohibits implicit Optional when default is None.

🔧 Proposed fix
-    def generate(self, prompt: str, system_prompt: str = None) -> str:
+    def generate(self, prompt: str, system_prompt: str | None = None) -> str:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def generate(self, prompt: str, system_prompt: str = None) -> str:
def generate(self, prompt: str, system_prompt: str | None = None) -> str:
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 178-178: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/llm_provider.py` at line 178, The generate signature
uses a default None for system_prompt but lacks an explicit Optional annotation;
update the function signature for generate in llm_provider.py to annotate
system_prompt as Optional[str] (or str | None for Python 3.10+) so the type
reflects the None default, e.g., change def generate(self, prompt: str,
system_prompt: str = None) -> str to use the explicit Optional/union type and
adjust any imports (typing.Optional) if needed.

Comment on lines +141 to +153
if source_lang not in NLLB_LANG_MAP:
logger.warning(
f"NMTTranslator: source_lang '{source_lang}' not in NLLB_LANG_MAP — "
"using raw string as language token"
)
if target_lang not in NLLB_LANG_MAP:
logger.warning(
f"NMTTranslator: target_lang '{target_lang}' not in NLLB_LANG_MAP — "
"using raw string as language token"
)

src_code: str = NLLB_LANG_MAP.get(source_lang, source_lang)
tgt_code: str = NLLB_LANG_MAP.get(target_lang, target_lang)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "nmt_translator.py" -type f

Repository: KMA-Final-Project/project

Length of output: 112


🏁 Script executed:

cat -n ./apps/ai-engine/src/core/nmt_translator.py | head -200

Repository: KMA-Final-Project/project

Length of output: 8368


🏁 Script executed:

grep -r "translate_batch\|NMTTranslator" --include="*.py" | head -50

Repository: KMA-Final-Project/project

Length of output: 3647


🏁 Script executed:

grep -B5 -A5 "nmt.translate_batch" ./apps/ai-engine/src/async_pipeline.py

Repository: KMA-Final-Project/project

Length of output: 526


🏁 Script executed:

grep -B20 "nmt.translate_batch" ./apps/ai-engine/src/async_pipeline.py | grep -E "(src|tgt)\s*=" | head -10

Repository: KMA-Final-Project/project

Length of output: 158


🏁 Script executed:

grep -B50 "nmt.translate_batch" ./apps/ai-engine/src/async_pipeline.py | head -80

Repository: KMA-Final-Project/project

Length of output: 2228


🏁 Script executed:

grep -B100 "def _source_lang\|def _translate_and_upload" ./apps/ai-engine/src/async_pipeline.py | grep -E "(_source_lang|target_lang|i18n)" | head -20

Repository: KMA-Final-Project/project

Length of output: 231


🏁 Script executed:

grep -A10 "def _source_lang" ./apps/ai-engine/src/async_pipeline.py

Repository: KMA-Final-Project/project

Length of output: 477


🏁 Script executed:

grep -B30 "def _source_lang" ./apps/ai-engine/src/async_pipeline.py | head -50

Repository: KMA-Final-Project/project

Length of output: 1197


🏁 Script executed:

grep -B100 "source_lang_holder\|async def consumer" ./apps/ai-engine/src/async_pipeline.py | grep -E "(source_lang_holder|target_lang|def.*process|mobile|i18n)" | head -20

Repository: KMA-Final-Project/project

Length of output: 292


🏁 Script executed:

grep -n "source_lang_holder\|target_lang" ./apps/ai-engine/src/async_pipeline.py | head -30

Repository: KMA-Final-Project/project

Length of output: 560


🏁 Script executed:

sed -n '100,250p' ./apps/ai-engine/src/async_pipeline.py

Repository: KMA-Final-Project/project

Length of output: 5433


🏁 Script executed:

grep -n "detected_source_lang\|_detect_source_language" ./apps/ai-engine/src/async_pipeline.py

Repository: KMA-Final-Project/project

Length of output: 535


🏁 Script executed:

sed -n '70,95p' ./apps/ai-engine/src/async_pipeline.py

Repository: KMA-Final-Project/project

Length of output: 1035


🏁 Script executed:

grep -r "detected_lang\|i18n\.language\|Whisper" ./apps/ai-engine/src --include="*.py" | grep -E "(detected_lang|i18n|mobile)" | head -20

Repository: KMA-Final-Project/project

Length of output: 915


🏁 Script executed:

grep -B5 -A5 "transcription_result\[\"info\"\].language" ./apps/ai-engine/src/core/smart_aligner.py

Repository: KMA-Final-Project/project

Length of output: 1081


Fail fast on unsupported language codes.

Lines 152-153 silently accept any unknown language code by passing it raw to the NLLB model. If an upstream caller passes an unsupported locale (e.g., "en-US", "fr-CA" from user configuration), the model receives an invalid Flores-200 token instead of failing explicitly. Add language-code normalization and validation before tokenization to catch invalid codes early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/nmt_translator.py` around lines 141 - 153, The code
currently falls back to passing raw source_lang/target_lang strings to
NLLB_LANG_MAP, which can silently allow invalid Flores-200 tokens; update
NMTTranslator to normalize and validate language codes before mapping: implement
normalization of source_lang and target_lang (e.g., lowercasing and stripping
regional variants like splitting on '-' or '_', mapping "en-US" -> "en"), then
lookup normalized codes in NLLB_LANG_MAP (referencing NLLB_LANG_MAP,
source_lang, target_lang, src_code, tgt_code) and if a normalized code is not
present raise a clear exception (e.g., ValueError) or return a descriptive error
instead of passing the raw token to the model so failures are explicit and
early.

Comment on lines 124 to +132
Rules:
1. Translate the meaning of sentences naturally.
2. DO NOT return the original text. You MUST translate it to {target_lang}.
3. Maintain the approximate length but prioritize meaning.
4. Use the "Context-First" approach:
- Apply the specific Tone/Style.
- {pronoun_enforcement}
5. Do not merge sentences unless necessary for grammar.
6. STRICTLY Return a JSON list. Do not wrap it in a dictionary key like "translations".
7. **LANGUAGE ENFORCEMENT**: The output must be PURE {target_lang}. Do NOT include any source language characters (e.g. No Chinese chars in Vietnamese output).
1. Return EXACTLY {count} refined translations as a JSON array of strings.
2. NEVER merge or split lines — one input line = one output line.
3. If the DRAFT is already good, return it unchanged.
4. Preserve numbers, proper nouns, and key terms from the keyword list.
5. Fix awkward phrasing, wrong pronouns, or unnatural word order.
{pronoun_rule}
6. Output must be PURE target language — no source-language characters.
7. Return ONLY a JSON array. No explanation, no wrapping object.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the proper-noun conflict in the refinement rules.

Line 128 says to preserve proper nouns and key terms, but Line 131 bans source-language characters outright. For same-script pairs and brand names, that contradiction pushes the model to rewrite or drop names. Relax the ban so unchanged names/brands are still allowed when appropriate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/core/prompts.py` around lines 124 - 132, The refinement
prompt has a contradiction between the “preserve proper nouns and key terms”
rule and the strict ban on source-language characters; update the prompt
template so rule 6 allows unchanged proper nouns/brand names and key terms when
they appear verbatim and use the same script as the target language while still
forbidding other source-language content. Concretely, modify the Rules block
(the prompt string that contains "Rules:" and the {pronoun_rule} placeholder) to
change rule 6 into a conditional allowance like “6. Output must be PURE target
language — except unchanged proper nouns, brand names, and key terms that appear
verbatim in the same script as the target language; all other source-language
characters are prohibited.” Ensure the change references the existing
preservation rule (rule 4) so the two rules are consistent and update any tests
or examples that assert the previous absolute ban.

Comment thread apps/ai-engine/src/db.py
Comment on lines +108 to +119
try:
pool = _get_db_pool()
conn = pool.getconn()
try:
with conn:
with conn.cursor() as cur:
cur.execute(sql, values)
finally:
pool.putconn(conn)

except Exception as e:
logger.error(f"DB update failed for media {media_id}: {e}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not silently continue after a status write failure.

If update_media_status() only logs here, the worker can still emit COMPLETED/FAILED while Postgres keeps the row in its previous state. After a refresh or missed socket event, clients fall back to stale status indefinitely. Bubble this up or add a retry path before broadcasting terminal events.

🧰 Tools
🪛 Ruff (0.15.6)

[warning] 118-118: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/ai-engine/src/db.py` around lines 108 - 119, The try/except in
update_media_status is swallowing DB write failures (the block using
_get_db_pool(), pool.getconn(), cur.execute(sql, values), media_id and logger)
so callers may broadcast terminal events despite Postgres not being updated;
change this block to either implement a small retry loop with backoff around
cur.execute(sql, values) (and still putconn in the finally) or, if retries fail,
re-raise the exception after logging so the caller sees the failure instead of
continuing; ensure update_media_status surfaces errors (raises) so the worker
can avoid emitting COMPLETED/FAILED when the DB write failed.

Comment on lines +5 to +9
const sourcePath = path.resolve(process.cwd(), 'prisma/schema.prisma');
const targetPath = path.resolve(
process.cwd(),
'../../../final_project/schema.prisma',
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fragile path resolution depends on working directory.

The script uses process.cwd() which assumes it's always run from apps/backend-api/. If invoked from a different directory (e.g., monorepo root), both source and target paths will resolve incorrectly.

Consider using import.meta.dirname (Node 20.11+) or path.dirname(fileURLToPath(import.meta.url)) to anchor paths relative to the script's location rather than the working directory.

🛠️ Proposed fix using import.meta
 import { copyFile, mkdir } from 'node:fs/promises';
 import path from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));

 async function main() {
-  const sourcePath = path.resolve(process.cwd(), 'prisma/schema.prisma');
+  const sourcePath = path.resolve(__dirname, '../prisma/schema.prisma');
   const targetPath = path.resolve(
-    process.cwd(),
-    '../../../final_project/schema.prisma',
+    __dirname,
+    '../../../../final_project/schema.prisma',
   );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const sourcePath = path.resolve(process.cwd(), 'prisma/schema.prisma');
const targetPath = path.resolve(
process.cwd(),
'../../../final_project/schema.prisma',
);
import { copyFile, mkdir } from 'node:fs/promises';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
const __dirname = path.dirname(fileURLToPath(import.meta.url));
async function main() {
const sourcePath = path.resolve(__dirname, '../prisma/schema.prisma');
const targetPath = path.resolve(
__dirname,
'../../../../final_project/schema.prisma',
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/scripts/sync-schema.ts` around lines 5 - 9, The path
resolution for sourcePath and targetPath currently uses process.cwd(), which is
fragile; change the script to compute a scriptDir anchored to the file's
location (e.g., via import.meta.url or fileURLToPath(import.meta.url) +
path.dirname) and then build sourcePath and targetPath relative to that
scriptDir instead of process.cwd(), updating any references to
sourcePath/targetPath to use the new anchored paths.

Comment on lines +91 to +96
const safeChunkIndex = Math.max(0, Math.trunc(chunkIndex));

return {
...normalized,
chunkCount: Math.max(normalized.chunkCount, safeChunkIndex + 1),
latestChunkIndex: maxNullable(normalized.latestChunkIndex, safeChunkIndex),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's find the file and examine the merge functions
fd media-artifact-summary.ts

Repository: KMA-Final-Project/project

Length of output: 129


🏁 Script executed:

# Also search for the toNonNegativeInteger function
rg "toNonNegativeInteger" --type ts

Repository: KMA-Final-Project/project

Length of output: 419


🏁 Script executed:

# Read the file to examine the implementation
cat -n apps/backend-api/src/modules/media/media-artifact-summary.ts | head -120

Repository: KMA-Final-Project/project

Length of output: 4175


🏁 Script executed:

# Quick verification of JavaScript Math behavior
node << 'EOF'
console.log('Math.trunc(NaN):', Math.trunc(NaN));
console.log('Math.trunc(Infinity):', Math.trunc(Infinity));
console.log('Math.max(0, NaN):', Math.max(0, NaN));
console.log('Math.max(0, Infinity):', Math.max(0, Infinity));
console.log('Number.isFinite(NaN):', Number.isFinite(NaN));
console.log('Number.isFinite(Infinity):', Number.isFinite(Infinity));
EOF

Repository: KMA-Final-Project/project

Length of output: 235


Replace unsafe manual guards with toNonNegativeInteger() in merge functions.

Math.trunc(NaN) returns NaN, and Math.max(0, NaN) returns NaN. Non-finite chunkIndex/batchIndex parameters will propagate invalid values into chunkCount/latestChunkIndex and corrupt the summary object. The helper toNonNegativeInteger() already exists and properly guards against non-finite numbers using Number.isFinite().

Suggested fix
 export function mergeChunkArtifactSummary(
   current: unknown,
   chunkIndex: number,
 ): ProcessedArtifactSummary {
   const normalized = normalizeArtifactSummary(current);
-  const safeChunkIndex = Math.max(0, Math.trunc(chunkIndex));
+  const safeChunkIndex = toNonNegativeInteger(chunkIndex);

   return {
     ...normalized,
     chunkCount: Math.max(normalized.chunkCount, safeChunkIndex + 1),
     latestChunkIndex: maxNullable(normalized.latestChunkIndex, safeChunkIndex),
@@
 export function mergeBatchArtifactSummary(
   current: unknown,
   batchIndex: number,
 ): ProcessedArtifactSummary {
   const normalized = normalizeArtifactSummary(current);
-  const safeBatchIndex = Math.max(0, Math.trunc(batchIndex));
+  const safeBatchIndex = toNonNegativeInteger(batchIndex);

   return {
     ...normalized,
     translatedBatchCount: Math.max(
       normalized.translatedBatchCount,

Also applies to: 105-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/media/media-artifact-summary.ts` around lines 91
- 96, Replace the manual guards that compute safeChunkIndex/safeBatchIndex with
the existing toNonNegativeInteger() helper to avoid propagating NaN/Infinity
into summaries: wherever chunkIndex or batchIndex is currently normalized via
Math.trunc/Math.max (e.g., the computation of safeChunkIndex and subsequent uses
updating chunkCount and latestChunkIndex, and the analogous block around lines
105-113 for batchIndex), call toNonNegativeInteger(chunkIndex) /
toNonNegativeInteger(batchIndex) and use those results to compute chunkCount,
latestChunkIndex, batchCount, and latestBatchIndex so non-finite inputs are
converted to 0 and cannot corrupt the summary object.

Comment on lines +59 to +68
private async handleMediaUpdate(message: string): Promise<void> {
const parsedPayload: unknown = JSON.parse(message);
const event = parseMediaEvent(parsedPayload);

if (!event) {
this.logger.warn(
`Received invalid media_updates payload: ${JSON.stringify(redactMediaPayload(parsedPayload))}`,
);
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap JSON.parse in try-catch for clearer error handling.

JSON.parse(message) on line 60 can throw a SyntaxError if the Redis message contains malformed JSON. While the outer promise chain catches this, the error message will be generic. Consider wrapping the parse in a try-catch to log the specific parsing failure with the redacted payload.

🛡️ Proposed fix
 private async handleMediaUpdate(message: string): Promise<void> {
-  const parsedPayload: unknown = JSON.parse(message);
+  let parsedPayload: unknown;
+  try {
+    parsedPayload = JSON.parse(message);
+  } catch {
+    this.logger.warn(`Received malformed JSON on media_updates channel`);
+    return;
+  }
   const event = parseMediaEvent(parsedPayload);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private async handleMediaUpdate(message: string): Promise<void> {
const parsedPayload: unknown = JSON.parse(message);
const event = parseMediaEvent(parsedPayload);
if (!event) {
this.logger.warn(
`Received invalid media_updates payload: ${JSON.stringify(redactMediaPayload(parsedPayload))}`,
);
return;
}
private async handleMediaUpdate(message: string): Promise<void> {
let parsedPayload: unknown;
try {
parsedPayload = JSON.parse(message);
} catch {
this.logger.warn(`Received malformed JSON on media_updates channel`);
return;
}
const event = parseMediaEvent(parsedPayload);
if (!event) {
this.logger.warn(
`Received invalid media_updates payload: ${JSON.stringify(redactMediaPayload(parsedPayload))}`,
);
return;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/backend-api/src/modules/socket/socket.service.ts` around lines 59 - 68,
handleMediaUpdate currently calls JSON.parse(message) which can throw a
SyntaxError; wrap the parse in a try/catch inside handleMediaUpdate, and on
catch call this.logger.warn (or error) with a descriptive message including the
redacted payload via redactMediaPayload(message or parsed attempt) and the error
details, then return early so parseMediaEvent is not called; ensure references
to parseMediaEvent and redactMediaPayload remain unchanged.

Comment on lines +30 to +36
function withAlpha(color: string, alphaHex: string): string {
if (color.startsWith("#") && color.length === 7) {
return `${color}${alphaHex}`;
}

return color;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

withAlpha only handles 7-character hex colors.

The helper assumes #RRGGBB format, but theme colors could be rgb(), rgba(), hsl(), shorthand hex (#RGB), or 8-character hex with existing alpha. This could produce invalid color strings like rgb(255, 0, 0)16 or #FFF16.

Consider using a color library or extending the helper to handle other formats gracefully.

🛠️ Safer fallback approach
 function withAlpha(color: string, alphaHex: string): string {
   if (color.startsWith("#") && color.length === 7) {
     return `${color}${alphaHex}`;
   }
-
-  return color;
+  // For non-hex formats, return original color unchanged
+  // (animations will still work, just without alpha modification)
+  console.warn(`withAlpha: unsupported color format "${color}"`);
+  return color;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function withAlpha(color: string, alphaHex: string): string {
if (color.startsWith("#") && color.length === 7) {
return `${color}${alphaHex}`;
}
return color;
}
function withAlpha(color: string, alphaHex: string): string {
if (color.startsWith("#") && color.length === 7) {
return `${color}${alphaHex}`;
}
// For non-hex formats, return original color unchanged
// (animations will still work, just without alpha modification)
console.warn(`withAlpha: unsupported color format "${color}"`);
return color;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/mobile-app/src/components/media/ProcessingProgressRing.tsx` around lines
30 - 36, withAlpha currently only appends an alpha hex to 7-character hex
strings and will produce invalid colors for other formats; update withAlpha
(used by ProcessingProgressRing) to safely handle formats: detect and expand
4-char hex (`#RGB` -> `#RRGGBB`) then append/replace alpha for 6- or 8-char hex,
preserve rgb()/rgba()/hsl() strings (or convert using a color library), and
return the original color unchanged if unsupported; alternatively replace the
helper with a small color utility (e.g., tinycolor2 or polished) to parse any
CSS color, set the desired alpha, and return a valid CSS color string.

@sondoannam sondoannam changed the title Feat/processing screen feat(mobile-app): ✨ build processing screen Mar 23, 2026
@sondoannam sondoannam merged commit 1f34088 into main Mar 23, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants