Skip to content

feat(vision-metrics): split img_edit_score#651

Open
davidberenstein1957 wants to merge 2 commits into
feat/vlm-pr-4b-vie-scorefrom
feat/vlm-pr-4c-img-edit-score
Open

feat(vision-metrics): split img_edit_score#651
davidberenstein1957 wants to merge 2 commits into
feat/vlm-pr-4b-vie-scorefrom
feat/vlm-pr-4c-img-edit-score

Conversation

@davidberenstein1957

@davidberenstein1957 davidberenstein1957 commented Apr 28, 2026

Copy link
Copy Markdown
Member

Summary

Splits img_edit_score into its own stacked PR, adds ImageEditScoreMetric, and wires ImgEdit benchmark entry with clamping regression coverage.

This PR also carries benchmark-paper alignment cleanup from the umbrella work while preserving compatibility:

  • keeps text_to_image task_type literal behavior
  • introduces TASK_TYPE_* constants for readability
  • removes private-reference style notes

Stack Position

Files

  • src/pruna/evaluation/metrics/metric_img_edit_score.py
  • src/pruna/evaluation/benchmarks.py
  • tests/evaluation/test_vision_metrics.py

Test Plan

uv run pytest tests/evaluation/test_vision_metrics.py -k img_edit_score

Review Focus

  • ImgEdit score clamping behavior
  • Benchmark metadata/docs alignment without task_type breaking changes

Review Flow (Order)

Review the stack in this exact order:

  1. feat(vendor): add LLM2Vec embedding model #637 vendor
  2. feat(infrastructure): add VLM base classes and utilities #638 infrastructure
  3. feat(text-metrics): split qa_accuracy #645 qa_accuracy
  4. feat(text-metrics): split oneig_alignment #646 oneig_alignment
  5. feat(text-metrics): split text_score pair #647 text_score pair
  6. feat(text-metrics): split oneig_reasoning #648 oneig_reasoning
  7. feat(vision-metrics): split vqa #649 vqa
  8. feat(vision-metrics): split vie_score #650 vie_score
  9. feat(vision-metrics): split img_edit_score #651 img_edit_score
  10. feat(e2e-tests): stacked e2e after split metrics #641 e2e tests

This PR in the flow (9/10)

@github-actions

Copy link
Copy Markdown

This PR has been inactive for 10 days and is now marked as stale.

@github-actions github-actions Bot added the stale label May 19, 2026
@davidberenstein1957 davidberenstein1957 force-pushed the feat/vlm-pr-4b-vie-score branch from 693f888 to 01406d1 Compare June 2, 2026 17:30
@davidberenstein1957 davidberenstein1957 force-pushed the feat/vlm-pr-4c-img-edit-score branch from f4a489b to 2713f3d Compare June 2, 2026 17:30
@github-actions github-actions Bot removed the stale label Jun 19, 2026
@github-actions

Copy link
Copy Markdown

This PR has been inactive for 10 days and is now marked as stale. It will be closed in 7 days if there is no further activity.

@github-actions github-actions Bot added the stale label Jun 30, 2026
@davidberenstein1957 davidberenstein1957 force-pushed the feat/vlm-pr-4b-vie-score branch from 52f8cbc to eecabcf Compare July 2, 2026 13:25
Co-authored-by: Cursor <cursoragent@cursor.com>
@davidberenstein1957 davidberenstein1957 force-pushed the feat/vlm-pr-4b-vie-score branch from eecabcf to 0beeaba Compare July 2, 2026 13:51
@davidberenstein1957 davidberenstein1957 force-pushed the feat/vlm-pr-4c-img-edit-score branch from 70dcdbf to 42d9254 Compare July 2, 2026 13:51
- sync OneIG subset dataset loaders for benchmark registration
- ruff check/format on changed VLM src files
@github-actions github-actions Bot removed the stale label Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant