Skip to content

Add title-based deduplication to create_issue safe outputs (MCP + apply-time)#32655

Merged
pelikhan merged 18 commits into
mainfrom
copilot/feature-title-based-deduplication
May 16, 2026
Merged

Add title-based deduplication to create_issue safe outputs (MCP + apply-time)#32655
pelikhan merged 18 commits into
mainfrom
copilot/feature-title-based-deduplication

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 16, 2026

Agents were emitting duplicate create_issue messages with the same title in a single run, and prompt-level controls were not reliable. This change adds handler-level title deduplication so duplicates are dropped before duplicate issues are created.

  • Config + schema

    • Adds safe-outputs.create-issue.deduplicate-by-title.
    • Supports:
      • true → exact normalized title match
      • integer N (0–100) → Levenshtein distance threshold
    • Updates:
      • pkg/parser/schemas/main_workflow_schema.json
      • actions/setup/js/types/safe-outputs-config.d.ts
  • Dedup engine (new focused modules)

    • Adds actions/setup/js/levenshtein_distance.cjs (with full unit tests).
    • Adds actions/setup/js/issue_title_dedup.cjs for:
      • config parsing/validation
      • title normalization
      • nearest duplicate matching by threshold
  • MCP pre-check (immediate feedback)

    • Adds a dedicated create_issue MCP handler in safe_outputs_handlers.cjs.
    • Applies within-run dedup at tool-call time.
    • On duplicate:
      • records a dropped entry in NDJSON with duplicate metadata
      • returns duplicate_dropped to the agent immediately
    • Wires handler via safe_outputs_tools_loader.cjs.
  • Apply-time enforcement (authoritative safety net)

    • Extends create_issue.cjs to enforce dedup before issue creation:
      • Within-run: against titles already accepted in current run
      • Repo-level: against open + recently-closed issues
    • Drops duplicates with structured result fields:
      • dropped_duplicate
      • dedup_source (mcp-precheck / within-run / repo-level)
      • matched title + distance
    • Logs dropped duplicates explicitly.
  • Step summary rendering

    • Extends safe_output_summary.cjs to render dropped duplicate outcomes with clear status and dedup details (matched title, distance, source).
  • Tests

    • Adds full suite for Levenshtein implementation.
    • Adds focused cases for:
      • boolean and numeric dedup config
      • MCP duplicate drop behavior
      • apply-time within-run and repo-level drops
      • summary rendering of dropped duplicates
safe-outputs:
  create-issue:
    title-prefix: "[AI] "
    deduplicate-by-title: 1   # true for exact match, or distance threshold

Copilot AI and others added 2 commits May 16, 2026 16:41
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add title-based deduplication for create-issue handler Add title-based deduplication to create_issue safe outputs (MCP + apply-time) May 16, 2026
Copilot AI requested a review from pelikhan May 16, 2026 16:46
@pelikhan pelikhan marked this pull request as ready for review May 16, 2026 16:49
Copilot AI review requested due to automatic review settings May 16, 2026 16:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds title-based deduplication for create_issue safe outputs so duplicate tool calls (same/near-same title) are dropped before creating duplicate issues, with enforcement both at MCP tool-call time and at apply time.

Changes:

  • Introduces deduplicate-by-title configuration (boolean exact-match or integer Levenshtein threshold) in schema + TS types.
  • Adds a small dedup engine (levenshtein_distance.cjs, issue_title_dedup.cjs) and wires an MCP create_issue handler for within-run dedup feedback.
  • Extends apply-time create_issue to enforce within-run and repo-level dedup, and updates step summary rendering + tests.
Show a summary per file
File Description
pkg/parser/schemas/main_workflow_schema.json Adds deduplicate-by-title to workflow schema for create-issue.
actions/setup/js/types/safe-outputs-config.d.ts Exposes deduplicate-by-title in safe-outputs TS types.
actions/setup/js/safe_outputs_tools_loader.cjs Wires create_issue to a dedicated MCP handler.
actions/setup/js/safe_outputs_tools_loader.test.cjs Adds coverage for create_issue handler attachment.
actions/setup/js/safe_outputs_handlers.cjs Implements MCP pre-check within-run title dedup for create_issue.
actions/setup/js/safe_outputs_handlers.test.cjs Adds tests for MCP pre-check dedup behavior and config validation.
actions/setup/js/levenshtein_distance.cjs Adds Levenshtein distance utility.
actions/setup/js/levenshtein_distance.test.cjs Adds unit tests for Levenshtein implementation.
actions/setup/js/issue_title_dedup.cjs Adds config parsing, normalization, and duplicate matching helper.
actions/setup/js/create_issue.cjs Enforces dedup at apply-time (within-run + repo-level) with structured dropped results.
actions/setup/js/create_issue.test.cjs Adds tests for apply-time dedup behavior and config validation.
actions/setup/js/safe_output_summary.cjs Renders “Duplicate Dropped” outcomes with details in step summary.
actions/setup/js/safe_output_summary.test.cjs Adds summary test coverage for dropped duplicate outcomes.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 12/13 changed files
  • Comments generated: 2

Comment on lines +407 to +410
/**
* Search for existing issues that are potential title-duplicates.
* Includes all open issues and recently closed issues.
*
Comment thread actions/setup/js/create_issue.cjs Outdated
Comment on lines +418 to +422
const [openIssues, recentlyClosedIssues] = await Promise.all([
githubClient.rest.search.issuesAndPullRequests({
q: `repo:${owner}/${repo} is:issue is:open`,
per_page: 100,
sort: "updated",
@pelikhan
Copy link
Copy Markdown
Collaborator

@github-actions
Copy link
Copy Markdown
Contributor

@copilot please summarize the remaining blockers for this PR and the next concrete step to unblock it.

Generated by 👨‍🍳 PR Sous Chef ·

Copilot AI and others added 3 commits May 16, 2026 17:15
… test

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 16, 2026

@copilot review all comments.

Fix https://github.com/github/gh-aw/actions/runs/25967498263/job/76333469009#step:7:1

Fixed in f33d579.

  • Addressed the review feedback in create_issue.cjs by adding bounded pagination, truncation warnings, and per-repo candidate caching for title dedup search.
  • Added/updated tests for pagination, cache behavior, and truncation warning coverage.
  • Fixed the failing step-7 test by making the delay assertion deterministic with fake timers in expired_entity_cleanup_helpers.test.cjs.

Validation run:

  • cd actions/setup/js && npm run lint:cjs
  • cd actions/setup/js && npm run test:js -- create_issue.test.cjs expired_entity_cleanup_helpers.test.cjs --no-file-parallelism
  • cd actions/setup/js && npm run test:js -- --no-file-parallelism --shard=3/4 (the prior step-7 failure is resolved; this local rerun surfaced an unrelated flaky failure in push_repo_memory.test.cjs).

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot

  • check rate limit level before trying deduplication, avoid search of level is low
  • deduplication issues within the same run

Copilot AI and others added 4 commits May 16, 2026 17:42
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 16, 2026

@copilot

  • check rate limit level before trying deduplication, avoid search of level is low...

Implemented in 1349800.

  • Added a search-rate-limit precheck before repo-level title dedup; when search quota is low, repo-level search is skipped instead of calling Search API.
  • Strengthened same-run dedup by recording repo-level dropped titles in the within-run set, so repeated duplicates in the same run are dropped without extra search calls.
  • Added tests covering low-rate-limit skip behavior and repeated repo-level duplicates within the same run.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 16, 2026

@copilot enable deduplicate on 50% of AW using create-issue

Implemented in 34afb06.

This enables create_issue title deduplication by default for a deterministic 50% rollout of workflows when deduplicate-by-title is omitted, while preserving explicit config overrides. The same rollout resolution is now used in both MCP pre-check and apply-time handling, with added test coverage for default/override rollout behavior.

This reverts commit 34afb06.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

@copilot review all comments and address the unresolved review feedback, especially the pagination/truncation and search-rate-limit concerns.
Please also add or verify a regression test for those edge cases.

Generated by 👨‍🍳 PR Sous Chef ·

Copilot AI and others added 2 commits May 16, 2026 18:11
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 16, 2026

@copilot revert previous commit

Done in fd7d430: reverted the previous commit as requested.

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot revert automatic rollout 34afb06

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 16, 2026

@copilot revert automatic rollout 34afb06

Done in f3844a0: automatic rollout is reverted (default rollout percent is back to 0, so dedup is no longer implicitly enabled when deduplicate-by-title is omitted).

@github-actions github-actions Bot mentioned this pull request May 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

@copilot review all comments and address the unresolved review feedback, especially pagination/truncation and search-rate-limit concerns. Please refresh the branch and rerun checks after the fixes.

Generated by 👨‍🍳 PR Sous Chef ·

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot Remove all automatic rollout code. This should not have been implemented.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 16, 2026

@copilot Remove all automatic rollout code. This should not have been implemented.

Done in 735b89e.

Removed all automatic rollout code for create_issue title deduplication. Dedup now only applies when deduplicate-by-title is explicitly configured, and rollout-specific tests/env usage were removed accordingly.

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot Enable the deduplicate flag on selected agentic-workflows in this repository, then recompile, merge into main, and recompile again.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 16, 2026

@copilot Enable the deduplicate flag on selected agentic-workflows in this repository, then recompile, merge into main, and recompile again.

I can do this, but I need two specifics before I apply changes:

  1. Which workflows are “selected” (please provide the list, or confirm “all workflows that use safe-outputs.create-issue”)?
  2. For “merge into main,” should I merge origin/main into this PR branch after the first recompile? (I can’t directly merge this PR into main from here.)

@pelikhan pelikhan merged commit 4550a5b into main May 16, 2026
18 of 19 checks passed
@pelikhan pelikhan deleted the copilot/feature-title-based-deduplication branch May 16, 2026 19:35
github-actions Bot added a commit that referenced this pull request May 17, 2026
Adds reference documentation for the new safe-outputs.create-issue.deduplicate-by-title
field (true for exact match, integer for Levenshtein edit distance) introduced in #32655.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: title-based deduplication for create-issue safe-output handler

3 participants