Skip to content

[Enhancement] Adding aggregator metrics to platform for generic or task specific usage #66

Merged
vipul-mittal merged 27 commits intomainfrom
scratch/aggregator_metrics
Dec 11, 2025
Merged

[Enhancement] Adding aggregator metrics to platform for generic or task specific usage #66
vipul-mittal merged 27 commits intomainfrom
scratch/aggregator_metrics

Conversation

@omkar-anustoop-ai
Copy link
Collaborator

@omkar-anustoop-ai omkar-anustoop-ai commented Nov 26, 2025

Summary

This PR adds support for aggregator metrics and some unit metrics as discussed in the finalised design. The implementation provides industry-standard evaluation metrics with a clean, extensible architecture.

Explain the features implemented:

Aggregator Metrics (Industry Standard)

  • Accuracy: Calculates overall accuracy (correct predictions / total predictions)
  • Precision: Measures proportion of positive predictions that are correct
  • Recall: Measures proportion of actual positives predicted correctly
  • F1 Score: Harmonic mean of precision and recall

Unit Metrics (Validators)

  • ExactMatch: Validates exact string match between predicted and golden values with configurable case sensitivity and whitespace normalization

Supporting Infrastructure

  • BaseAggregatorMetric: Abstract base class for all aggregator metrics
  • BaseUnitMetric: Abstract base class for all unit metrics with batch evaluation support
  • AggregatorMetricRegistry: Decorator-based registry for metric discovery and instantiation
  • UnitMetricResult: Dataclass consumed by aggregator metrics as per design
  • Metadata System: Structured metadata for all metrics (name, description, range, etc.)

Performance impact (if any):

N/A - New feature with no impact on existing functionality.

How to Test the feature

  1. Run unit tests: poetry run pytest tests/core/eval/metrics/ -v
  2. Expected: 144 tests passing (103 aggregator metrics + 41 unit metrics)
  3. Check all metrics are properly registered and discoverable via registry

Screenshots (if applicable)

N/A

Checklist

  • Lint fixes and unit testing done
  • End to end task testing
  • Documentation updated

Notes

Corresponding unit tests and documentation have been added for reference and usage purposes. The architecture supports easy addition of new metrics following the established patterns.

@omkar-anustoop-ai omkar-anustoop-ai requested a review from a team as a code owner November 26, 2025 09:46
@omkar-anustoop-ai omkar-anustoop-ai marked this pull request as draft November 26, 2025 09:47
@omkar-anustoop-ai omkar-anustoop-ai marked this pull request as ready for review December 1, 2025 10:50
@bidyapati-p bidyapati-p requested review from a team, psriramsnc and zephyrzilla December 2, 2025 08:41
Copy link
Collaborator

@psriramsnc psriramsnc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we follow the structure /sygra/core/eval/metrics instead?
Please look into other comments.

@zephyrzilla
Copy link
Member

As it is a public facing repository, let's refrain from mentioning internal references (the PR description points to an internal Lucidchart document). Please remove it.

vipul-mittal and others added 2 commits December 3, 2025 19:52
…decoupling for generalization, pydantic metadata base class for easier re-usability, extendibility and separation of concern
psriramsnc
psriramsnc previously approved these changes Dec 4, 2025
Copy link
Collaborator

@psriramsnc psriramsnc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@omkar-anustoop-ai omkar-anustoop-ai requested review from a team and psriramsnc December 5, 2025 11:17
@psriramsnc psriramsnc added the enhancement New feature or request label Dec 8, 2025
bidyapati-p
bidyapati-p previously approved these changes Dec 8, 2025
Copy link
Collaborator

@psriramsnc psriramsnc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@vipul-mittal vipul-mittal dismissed zephyrzilla’s stale review December 11, 2025 05:29

Already taken care of the requested changes

@vipul-mittal vipul-mittal merged commit 4271c36 into main Dec 11, 2025
3 checks passed
@vipul-mittal vipul-mittal deleted the scratch/aggregator_metrics branch December 11, 2025 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants