[Enhancement] Adding aggregator metrics to platform for generic or task specific usage by omkar-anustoop-ai · Pull Request #66 · ServiceNow/SyGra

omkar-anustoop-ai · 2025-11-26T09:46:25Z

Summary

This PR adds support for aggregator metrics and some unit metrics as discussed in the finalised design. The implementation provides industry-standard evaluation metrics with a clean, extensible architecture.

Explain the features implemented:

Aggregator Metrics (Industry Standard)

Accuracy: Calculates overall accuracy (correct predictions / total predictions)
Precision: Measures proportion of positive predictions that are correct
Recall: Measures proportion of actual positives predicted correctly
F1 Score: Harmonic mean of precision and recall

Unit Metrics (Validators)

ExactMatch: Validates exact string match between predicted and golden values with configurable case sensitivity and whitespace normalization

Supporting Infrastructure

BaseAggregatorMetric: Abstract base class for all aggregator metrics
BaseUnitMetric: Abstract base class for all unit metrics with batch evaluation support
AggregatorMetricRegistry: Decorator-based registry for metric discovery and instantiation
UnitMetricResult: Dataclass consumed by aggregator metrics as per design
Metadata System: Structured metadata for all metrics (name, description, range, etc.)

Performance impact (if any):

N/A - New feature with no impact on existing functionality.

How to Test the feature

Run unit tests: poetry run pytest tests/core/eval/metrics/ -v
Expected: 144 tests passing (103 aggregator metrics + 41 unit metrics)
Check all metrics are properly registered and discoverable via registry

Screenshots (if applicable)

N/A

Checklist

Lint fixes and unit testing done
End to end task testing
Documentation updated

Notes

Corresponding unit tests and documentation have been added for reference and usage purposes. The architecture supports easy addition of new metrics following the established patterns.

…trics for downstream tasks

…, accuracy

…ision and recall

…it tests

sygra/core/metrics/aggregator_metrics/f1_score.py

sygra/core/eval/metrics/unit_metrics/unit_metric_result.py

psriramsnc

Should we follow the structure /sygra/core/eval/metrics instead?
Please look into other comments.

docs/metrics/README.md

sygra/core/eval/metrics/aggregator_metrics/base_aggregator_metric.py

sygra/core/metrics/aggregator_metrics/base_aggregator_metric.py

sygra/core/eval/metrics/aggregator_metrics/f1_score.py

sygra/core/eval/metrics/unit_metrics/unit_metric_result.py

…ixes

zephyrzilla · 2025-12-02T17:56:41Z

As it is a public facing repository, let's refrain from mentioning internal references (the PR description points to an internal Lucidchart document). Please remove it.

sygra/core/eval/metrics/aggregator_metrics/f1_score.py

sygra/core/eval/metrics/aggregator_metrics/base_aggregator_metric.py

…decoupling for generalization, pydantic metadata base class for easier re-usability, extendibility and separation of concern

psriramsnc

LGTM 🚀

…ion, in-class pydantic config validation

sygra/core/eval/metrics/aggregator_metrics/base_aggregator_metric.py

sygra/core/eval/metrics/unit_metrics/base_unit_metric.py

psriramsnc

LGTM 🚀

Already taken care of the requested changes

Adding unit_metric_result class since it is required in aggregator_me…

9d33e30

…trics for downstream tasks

omkar-anustoop-ai requested a review from a team as a code owner November 26, 2025 09:46

omkar-anustoop-ai marked this pull request as draft November 26, 2025 09:47

omkar-anustoop-ai and others added 11 commits November 26, 2025 15:36

Fix linting errors for unit metric result

5c1f91b

Adding support for metric registry, base class for aggregator metrics…

1323ac4

…, accuracy

Modified existing base classes to make it more generic and extensible

c094c46

Adding support for precision and recall metrics

dd7faf4

Adding support for f1 score computation using existing logic for prec…

d27dbe9

…ision and recall

Merge branch 'main' into scratch/aggregator_metrics

d3363b7

Fixing division bug in base aggregator metric, and adding relevant un…

69d0a11

…it tests

Changes to avoid default instantiation of class level metrics

65da4b0

Adding documentation for aggregator metrics

59e6b89

Modifying documentation for end-user usage ease and dev understanding

b990b56

Merge branch 'main' into scratch/aggregator_metrics

a6f6224

omkar-anustoop-ai marked this pull request as ready for review December 1, 2025 10:50

omkar-anustoop-ai added 2 commits December 1, 2025 16:42

Fixing linting errors

10d00f6

Fixed unit test error leading to make test fail

31f1d6a

bidyapati-p reviewed Dec 2, 2025

View reviewed changes

sygra/core/metrics/aggregator_metrics/f1_score.py Outdated Show resolved Hide resolved

sygra/core/eval/metrics/unit_metrics/unit_metric_result.py Show resolved Hide resolved

bidyapati-p requested review from a team, psriramsnc and zephyrzilla December 2, 2025 08:41

psriramsnc reviewed Dec 2, 2025

View reviewed changes

omkar-anustoop-ai and others added 4 commits December 2, 2025 18:28

Moved metric to eval folder, changed decorator name, review comment f…

8390abe

…ixes

Moved units tests and docs to eval parent folder for consistenct

a13d1df

Fixed sys path for sygra import in unit test cases

55444cb

Merge branch 'main' into scratch/aggregator_metrics

669420d

zephyrzilla previously requested changes Dec 2, 2025

View reviewed changes

sygra/core/eval/metrics/aggregator_metrics/f1_score.py Outdated Show resolved Hide resolved

sygra/core/eval/metrics/aggregator_metrics/base_aggregator_metric.py Show resolved Hide resolved

vipul-mittal and others added 2 commits December 3, 2025 19:52

Merge branch 'main' into scratch/aggregator_metrics

bfc311d

Review comment fixes: Added config class for common init, class name …

27e38e5

…decoupling for generalization, pydantic metadata base class for easier re-usability, extendibility and separation of concern

omkar-anustoop-ai requested review from bidyapati-p and zephyrzilla December 3, 2025 15:49

psriramsnc previously approved these changes Dec 4, 2025

View reviewed changes

Review fixes:- Class naming consistency, method naming as per convent…

1780afd

…ion, in-class pydantic config validation

omkar-anustoop-ai dismissed psriramsnc’s stale review via 1780afd December 5, 2025 10:57

Merge branch 'main' into scratch/aggregator_metrics

8cc0a6c

omkar-anustoop-ai requested review from a team and psriramsnc December 5, 2025 11:17

psriramsnc assigned omkar-anustoop-ai Dec 8, 2025

psriramsnc added the enhancement New feature or request label Dec 8, 2025

Merge branch 'main' into scratch/aggregator_metrics

c7d0881

github-code-quality bot found potential problems Dec 8, 2025

View reviewed changes

sygra/core/eval/metrics/aggregator_metrics/base_aggregator_metric.py Fixed Show fixed Hide fixed

sygra/core/eval/metrics/unit_metrics/base_unit_metric.py Fixed Show fixed Hide fixed

bidyapati-p previously approved these changes Dec 8, 2025

View reviewed changes

vipul-mittal and others added 3 commits December 9, 2025 10:39

Merge branch 'main' into scratch/aggregator_metrics

1b8a8c9

Merge branch 'main' into scratch/aggregator_metrics

d011666

Refactoring as per github code suggest

d868c11

omkar-anustoop-ai dismissed bidyapati-p’s stale review via d868c11 December 10, 2025 06:08

omkar-anustoop-ai requested review from a team, bidyapati-p, psriramsnc and zephyrzilla and removed request for bidyapati-p, psriramsnc and zephyrzilla December 10, 2025 06:10

psriramsnc approved these changes Dec 11, 2025

View reviewed changes

bidyapati-p approved these changes Dec 11, 2025

View reviewed changes

vipul-mittal merged commit 4271c36 into main Dec 11, 2025
3 checks passed

vipul-mittal deleted the scratch/aggregator_metrics branch December 11, 2025 05:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Adding aggregator metrics to platform for generic or task specific usage #66

[Enhancement] Adding aggregator metrics to platform for generic or task specific usage #66
vipul-mittal merged 27 commits intomainfrom
scratch/aggregator_metrics

omkar-anustoop-ai commented Nov 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

psriramsnc left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zephyrzilla commented Dec 2, 2025

Uh oh!

Uh oh!

Uh oh!

psriramsnc left a comment

Uh oh!

Uh oh!

Uh oh!

psriramsnc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

omkar-anustoop-ai commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Explain the features implemented:

Aggregator Metrics (Industry Standard)

Unit Metrics (Validators)

Supporting Infrastructure

Performance impact (if any):

How to Test the feature

Screenshots (if applicable)

Checklist

Notes

Uh oh!

Uh oh!

Uh oh!

psriramsnc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zephyrzilla commented Dec 2, 2025

Uh oh!

Uh oh!

Uh oh!

psriramsnc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

psriramsnc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

omkar-anustoop-ai commented Nov 26, 2025 •

edited

Loading