Skip to content

FEAT: Backfill class-level metadata for all remote seed datasets#1780

Merged
romanlutz merged 9 commits into
microsoft:mainfrom
romanlutz:romanlutz/backfill-dataset-metadata
Jun 2, 2026
Merged

FEAT: Backfill class-level metadata for all remote seed datasets#1780
romanlutz merged 9 commits into
microsoft:mainfrom
romanlutz:romanlutz/backfill-dataset-metadata

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

What

Adds class-level tags, size, and modalities to every remote seed dataset loader so they participate in SeedDatasetFilter discovery (e.g. SeedDatasetFilter(tags={"default"})). Before this change, only 5 of ~33 remote loaders declared metadata, so the others were silently skipped by metadata-driven filtering.

This is the follow-up to the review discussion on #1757 (cc @jsong468) where reviewer asked why some loaders declared these fields and others didn't. Answer: most predate the metadata schema and just hadn't been backfilled.

How

  1. Pinned a canonical, advisory tag vocabulary as RECOMMENDED_TAGS in pyrit/datasets/seed_datasets/seed_metadata.py. Users can still set custom tags — the metadata parser does not enforce, but a new parametrized coverage test does.
  2. Documented the 5-condition rule for the special default tag inline in seed_metadata.py:
    1. Ungated — no HF token, API key, auth, or signup.
    2. Citable — peer-reviewed paper / established benchmark.
    3. Single-callawait loader.fetch_dataset_async() works with no manual setup.
    4. Size >= medium (>=100 prompts).
    5. Broadly applicable — not narrowly scoped to a vertical (medical, legal, cybersecurity). Cross-cutting axes like privacy, bias, multimodal, multilingual, refusal, and jailbreak DO count.
  3. Walked every remote loader and assigned size / tags / modalities based on the loader's docstring, tests, and upstream dataset card. Added inline # N prompts comments next to each size so reviewers can verify the bucket choice locally.
  4. Renamed the non-canonical multilingual_culture tag on _SGXSTestDataset to multilingual and dropped its default tag (SGXSTest is gated on HF).
  5. Marked _ORBenchBaseDataset as should_register = False (it has no usable dataset_name) and explicitly opted the three OR-Bench leaf classes back in.
  6. Added TestRemoteLoaderMetadataCoverage — a parametrized test that walks every concrete _RemoteDatasetLoader subclass via auto-registration and asserts: metadata is present, tags/size/modalities are non-empty, size is in SeedDatasetSizeCategory, and tags is a subset of RECOMMENDED_TAGS (catches future typos like multilingual_culture).

Class-level harm_categories is intentionally deferred — per-row SeedPrompt.harm_categories already labels individual prompts; picking a "broadest" class-level summary is a judgment call better made by domain owners in a focused follow-up.

Backfill table

Loader size modalities tags default?
_AegisContentSafetyDataset huge (text,) {default, safety} YES
_AyaRedteamingDataset medium (text,) {safety, multilingual} no — per-language ~few hundred
_BabelscapeAlertDataset huge (text,) {default, safety, jailbreak} YES
_BeaverTailsDataset huge (text,) {default, safety} YES
_CBTBenchDataset medium (text,) {safety, medical} no — vertical
_CCPSensitivePromptsDataset small (text,) {safety, multilingual} no
_DarkBenchDataset medium (text,) {default, safety} YES
_EquityMedQADataset medium (text,) {safety, bias, medical} no — vertical
_ForbiddenQuestionsDataset medium (text,) {default, safety, jailbreak} YES
_HarmBenchMultimodalDataset small (text, image) {safety, jailbreak, multimodal} no
_HarmfulQADataset large (text,) {default, safety, jailbreak} YES
_JBBBehaviorsDataset small (text,) {safety, jailbreak} no
_LibrAIDoNotAnswerDataset medium (text,) {default, safety, refusal} YES
_LLMLatentAdversarialTrainingDataset large (text,) {default, safety, jailbreak} YES
_MedSafetyBenchDataset large (text,) {safety, medical} no — vertical
_MLCommonsAILuminateDataset large (text,) {default, safety} YES
_MultilingualVulnerabilityDataset medium (text,) {default, safety, multilingual} YES
_ORBench80KDataset huge (text,) {default, safety, refusal} YES
_ORBenchHardDataset large (text,) {default, safety, refusal} YES
_ORBenchToxicDataset large (text,) {default, safety, refusal} YES
_PKUSafeRLHFDataset huge (text,) {default, safety} YES
_PromptIntelDataset medium (text,) {safety, jailbreak, cybersecurity} no — API key
_RedTeamSocialBiasDataset small (text,) {safety, bias, multiturn} no
_SaladBenchDataset huge (text,) {default, safety, jailbreak} YES
_SimpleSafetyTestsDataset small (text,) {safety} no
_SorryBenchDataset large (text,) {safety, jailbreak, synthetic} no — gated
_SOSBenchDataset large (text,) {safety, medical, cybersecurity} no — vertical
_TDC23RedteamingDataset small (text,) {safety, jailbreak} no
_ToxicChatDataset huge (text,) {default, safety, multiturn} YES
_TransphobiaAwarenessDataset medium (text,) {default, safety, bias} YES
_VLGuardDataset large (text, image) {safety, multimodal} no — gated
_VLSUMultimodalDataset large (text, image) {default, safety, multimodal} YES
_XSTestDataset medium (text,) {default, safety, refusal} YES
_SGXSTestDataset (fixed) medium (text,) {safety, multilingual} no — gated, was {default, safety, multilingual_culture}

Already-tagged (unchanged): _HarmBenchDataset, _ComicJailbreakDataset, _VisualLeakBenchDataset.

No-op

  • No public API changes
  • No runtime behavior changes
  • No changes to per-row SeedPrompt.harm_categories
  • The 4 already-tagged loaders are not migrated to immutable frozenset / tuple style (cosmetic — out of scope)

Discussion link

#1757

Adds class-level `tags`, `size`, and `modalities` to all remote seed

dataset loaders so they participate in `SeedDatasetFilter` discovery.

Pins a recommended tag vocabulary and the 5-condition rule for the special

`default` tag in `seed_metadata.py` as a soft contract, and enforces it

via a new parametrized coverage test in `test_seed_dataset_provider.py`.

Also renames `_SGXSTestDataset`'s non-canonical `multilingual_culture` tag

to `multilingual` and drops `default` (the dataset is gated), and gates

`_ORBenchBaseDataset` from auto-registration since it is not a usable

loader on its own.

No runtime behavior or public API changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@behnam-o behnam-o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of minor comments, but also looks good as is.

Comment thread pyrit/datasets/seed_datasets/remote/harmbench_multimodal_dataset.py Outdated
Comment thread pyrit/datasets/seed_datasets/remote/forbidden_questions_dataset.py Outdated
romanlutz and others added 3 commits May 22, 2026 13:20
…uckets

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…taset-metadata

# Conflicts:
#	pyrit/datasets/seed_datasets/remote/comic_jailbreak_dataset.py
#	pyrit/datasets/seed_datasets/remote/harmbench_multimodal_dataset.py
#	pyrit/datasets/seed_datasets/remote/visual_leak_bench_dataset.py
#	pyrit/datasets/seed_datasets/remote/vlsu_multimodal_dataset.py
romanlutz and others added 5 commits May 29, 2026 20:44
…threats

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…taset-metadata

# Conflicts:
#	pyrit/models/__init__.py
@romanlutz romanlutz enabled auto-merge June 1, 2026 23:53
@romanlutz romanlutz added this pull request to the merge queue Jun 1, 2026
Merged via the queue into microsoft:main with commit 648faa9 Jun 2, 2026
80 of 81 checks passed
@romanlutz romanlutz deleted the romanlutz/backfill-dataset-metadata branch June 2, 2026 00:14
romanlutz added a commit to romanlutz/PyRIT that referenced this pull request Jun 2, 2026
Brings in 3 new commits from main:

- 648faa9 FEAT: Backfill class-level metadata for all remote seed datasets (microsoft#1780)
- 092126d MAINT: Migrate AddImage/AddTextImage converter deprecations to print_deprecation_message (microsoft#1875)
- 376e000 FEAT: Add TatweelConverter for Arabic kashida insertion (microsoft#1869)

Conflict resolution (11 files): took main's version everywhere
(`git checkout --theirs`), then re-ran `ruff check --fix` to
re-apply the PEP 604 sweep to main's new code (~36 violations
auto-fixed). Same hand-fix for the runtime `Optional[dict]` in
`pyrit/models/message_piece.py` PlainSerializer `return_type`
that ruff can't auto-rewrite.

Verification:
- ruff check pyrit/ tests/ doc/ - clean
- ruff format --check - clean
- pytest tests/unit -n 4 - 8977 passed, 5 skipped, 0 failures

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
romanlutz added a commit to romanlutz/PyRIT that referenced this pull request Jun 3, 2026
The merge from origin/main brought in TestDarkBenchDataset.
test_fetch_dataset_with_custom_config (added in microsoft#1780), which
constructed _DarkBenchDataset(split="test") and asserted the kwarg
was forwarded to _fetch_from_huggingface_async. That contract no
longer holds: this PR deprecates the dead split kwarg and hardcodes
split="train" at the DarkBench call site, since upstream
apart/darkbench publishes only the "train" split.

Drop the deprecated kwarg from the constructor call and assert the
hardcoded "train" literal that actually flows through.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants