feat(infrastructure): add VLM base classes and utilities#638
feat(infrastructure): add VLM base classes and utilities#638davidberenstein1957 wants to merge 8 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 21212de. Configure here.
| top = getattr(tok, "top_logprobs", None) or [] | ||
| for t in top: | ||
| token_str = (getattr(t, "token", "") or "").lower() | ||
| lp = float(getattr(t, "logprob", -1e9) or -1e9) |
There was a problem hiding this comment.
Logprob zero treated as missing due to falsy check
Medium Severity
The expression float(getattr(t, "logprob", -1e9) or -1e9) uses the or operator to provide a fallback, but 0.0 is falsy in Python. A logprob of 0.0 means P = exp(0) = 1.0 (100% probability), yet 0.0 or -1e9 evaluates to -1e9, turning that into P ≈ 0. This silently corrupts probability scoring whenever a token has logprob exactly zero.
Reviewed by Cursor Bugbot for commit 21212de. Configure here.
| self.pooling_mode = pooling_mode | ||
| self.skip_instruction = skip_instruction | ||
| self.max_length = max_length | ||
| self.doc_max_length = 512 |
There was a problem hiding this comment.
Constructor ignores doc_max_length parameter, hardcodes 512
Medium Severity
LLM2Vec.__init__ accepts a doc_max_length parameter (line 79) but line 88 assigns self.doc_max_length = 512 instead of self.doc_max_length = doc_max_length. The parameter value is silently discarded, so any doc_max_length loaded from llm2vec_config.json via from_pretrained or passed explicitly has no effect on document truncation behavior.
Reviewed by Cursor Bugbot for commit 21212de. Configure here.
| "peft>=0.18.0,<0.19.0", | ||
| "trl<=0.21.0", | ||
| "termcolor==2.3.0", | ||
| "realesrgan", |
There was a problem hiding this comment.
Heavy realesrgan moved from optional to core dependencies
Medium Severity
realesrgan was previously under the optional upscale extra but is now a core dependency in dependencies. This forces all users to install a heavy GPU-oriented package (with native compilation requirements) even if they never use upscaling. The upscale optional extra was simultaneously removed.
Reviewed by Cursor Bugbot for commit 21212de. Configure here.
f89b047 to
fb6d967
Compare
21212de to
7054e53
Compare
Keep PR #638 focused on VLM infrastructure by removing exports for downstream metric classes and restoring Rapidata export from the base branch. Co-authored-by: Cursor <cursoragent@cursor.com>
|
This PR has been inactive for 10 days and is now marked as stale. |
Logprob None check, shared OneIG grid helpers, pyproject extras restore, temporary CI on feat/vlm-pr-* bases, and clearer LiteLLM documentation. Co-authored-by: Cursor <cursoragent@cursor.com>
Review follow-up
Stack rebased and pushed. |
|
This PR has been inactive for 10 days and is now marked as stale. It will be closed in 7 days if there is no further activity. |
- Add BaseVLM abstract interface - Add LitellmVLM for API-based inference (OpenAI, Anthropic, etc.) - Add TransformersVLM for local Hugging Face models - Add StatefulVLMMeanScoresMetric base class for judge metrics - Add vlm_utils.py with image/batch utilities - Add pyproject.toml dependency pins (peft, litellm) - Add unit tests for infrastructure
Keep PR #638 focused on VLM infrastructure by removing exports for downstream metric classes and restoring Rapidata export from the base branch. Co-authored-by: Cursor <cursoragent@cursor.com>
Logprob None check, shared OneIG grid helpers, pyproject extras restore, temporary CI on feat/vlm-pr-* bases, and clearer LiteLLM documentation. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the broken Intel uv index (aligned with main), fix QAAccuracy keyword-only aggregation syntax, pass single/y_gt call types correctly for OneIG alignment, and expose metric_units on results. Co-authored-by: Cursor <cursoragent@cursor.com>
Replace forward-import VLM test module on pre-e2e branches with infrastructure-only tests; propagate docstring and conftest fixes. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Remove verify helper and duplicate infra test template from scripts/; tests live under tests/evaluation/ only. Co-authored-by: Cursor <cursoragent@cursor.com>
Match AlgorithmTag numpydoc pattern so docstring checks pass on Python 3.11. Co-authored-by: Cursor <cursoragent@cursor.com>
4eb78b7 to
a22c48b
Compare


Summary
Adds the VLM inference infrastructure used by all downstream VLM judge metrics:
BaseVLMLitellmVLMTransformersVLMStatefulVLMMeanScoresMetricStack Position
feat/vlm-pr-1-vendor)feat/vlm-pr-3a-qa-accuracy)feat/vlm-pr-5-e2e-tests)feat/metrics-vlm-support)Files
src/pruna/evaluation/metrics/vlm_base.pysrc/pruna/evaluation/metrics/vlm_utils.pytests/evaluation/test_vlm_base_infrastructure.pysrc/pruna/evaluation/metrics/utils.pysrc/pruna/evaluation/metrics/__init__.pypyproject.tomlAlignment Notes
This PR is intentionally based on
feat/vlm-pr-1-vendorso reviewers only see infrastructure delta.Test Plan
Review Focus
Review Flow (Order)
Review the stack in this exact order:
This PR in the flow (2/10)