Zero token counts for empty transcripts and README alignment by gistrec · Pull Request #43 · gistrec/ClearTranscriptBot

gistrec · 2026-01-20T16:20:46Z

Motivation

Avoid counting tokens for the user-facing fallback text when transcription is empty and ensure zero token counts are recorded for empty transcriptions.
Keep stored text user-friendly while basing token accounting on the raw recognition output.
Fix README schema formatting so the llm_tokens_by_model column aligns with other columns for readability.

Description

Update utils/tokens.py so tokens_by_model returns zeros for every model when text.strip() is empty and keep LLM_TOKEN_MODELS-based mapping.
Change schedulers/transcription.py to compute token_counts = tokens_by_model(raw_text) using the raw parse_text(result) output and then replace empty text with the friendly fallback string before persisting results.
Persist llm_tokens_by_model=token_counts on both successful and failed updates in update_transcription calls.
Adjust README.md spacing for the llm_tokens_by_model JSON column to align with other schema columns.

Testing

No automated tests were run for this change.
Local static inspection and manual review of modified files were performed during the rollout and changes were committed successfully.

Codex Task

sentry · 2026-03-12T15:57:31Z


        text = parse_text(result)
-        if not text.strip():
+        token_counts = tokens_by_model(text)


Bug: When an S3 upload fails during transcription, the call to update_transcription omits the llm_tokens_by_encoding parameter, preventing the token count from being saved for the failed task.
_{Severity: MEDIUM}

Suggested Fix

Modify the update_transcription call within the if s3_uri is None: block to include the llm_tokens_by_encoding=token_counts argument. This will ensure token counts are persisted consistently across all failure scenarios, aligning with the behavior of other error-handling paths.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: schedulers/transcription.py#L90 Potential issue: In the transcription scheduler, token counts are calculated and stored in the `token_counts` variable. If the subsequent S3 upload fails, `s3_uri` will be `None`, triggering a failure path. In this specific failure case, the call to `update_transcription` on line 102 omits the `llm_tokens_by_encoding=token_counts` argument. This contradicts the logic in other success and failure paths where the token counts are correctly passed. As a result, when a transcription fails due to an S3 upload issue, the token count information for that task is lost instead of being persisted with the 'failed' status.

Zero token counts for empty transcripts

5c80731

gistrec added the codex label Jan 20, 2026 — with ChatGPT Codex Connector

sentry Bot reviewed Jan 20, 2026

View reviewed changes

Comment thread utils/tokens.py Outdated

gistrec added 2 commits January 20, 2026 23:38

fixup! Zero token counts for empty transcripts

0d272de

fixup! Zero token counts for empty transcripts

1b64811

sentry Bot reviewed Mar 12, 2026

View reviewed changes

Comment thread utils/tokens.py

fixup! Zero token counts for empty transcripts

01f3da7

sentry Bot reviewed Mar 12, 2026

View reviewed changes

gistrec merged commit 5b896d6 into main Mar 12, 2026
1 check passed

gistrec deleted the codex/add-llm_tokens_by_model-column-to-transcriptionhistory branch March 12, 2026 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero token counts for empty transcripts and README alignment#43

Zero token counts for empty transcripts and README alignment#43
gistrec merged 4 commits intomainfrom
codex/add-llm_tokens_by_model-column-to-transcriptionhistory

gistrec commented Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

sentry Bot Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gistrec commented Jan 20, 2026

Motivation

Description

Testing

Uh oh!

Uh oh!

Uh oh!

sentry Bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant