Skip to content

Quality tagging updates and tools restructure#10

Merged
vipul-mittal merged 13 commits intomainfrom
scratch/data_quality
Sep 4, 2025
Merged

Quality tagging updates and tools restructure#10
vipul-mittal merged 13 commits intomainfrom
scratch/data_quality

Conversation

@amitsnow
Copy link
Collaborator

@amitsnow amitsnow commented Sep 1, 2025

Summary

In this PR we are adding the missing data_quality tasks like metadata_tagging and llm_based data quality.
Along with that I've restructured and updated the tools implementation.

Related Issue(s):

Impacted Features:

  • Quality tagging
  • tool execution from grasp library

How to Test

Steps for reviewers to verify functionality:

  1. Run any task with additional args "--quality True". This should add llm based quality metrics, category and instruction tags.
  2. You can run ./run_tools.sh to run any configuration (data_quality or oasst_mapper)

Screenshots (if applicable)

N/A

Checklist

  • Lint fixes and unit testing done
  • End to end task testing
  • Documentation updated

Notes

instag metadata persistance will be done as a separate PR as it is a feature addition.

@amitsnow amitsnow marked this pull request as ready for review September 2, 2025 07:13
@amitsnow amitsnow changed the title Scratch/data quality Quality tagging updates and tools restructure Sep 2, 2025
@amitsnow amitsnow requested a review from a team September 2, 2025 08:03
@vipul-mittal vipul-mittal enabled auto-merge (squash) September 2, 2025 08:13
Copy link
Collaborator

@psriramsnc psriramsnc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vipul-mittal vipul-mittal merged commit 1accf35 into main Sep 4, 2025
1 check passed
@vipul-mittal vipul-mittal deleted the scratch/data_quality branch September 4, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

llm_based and metadata_tagging pipelines missing

4 participants