TEST: Add unit tests for remote dataset loaders by romanlutz · Pull Request #1604 · microsoft/PyRIT

romanlutz · 2026-04-12T01:17:19Z

Summary

Adds unit tests for 20 previously untested remote dataset loader files.

New Test Files

Each test mocks HTTP/HuggingFace calls and tests dataset fetching, filtering, and validation:

aegis_ai_content_safety, aya_redteaming, ccp_sensitive_prompts, darkbench,
equitymedqa, forbidden_questions, harmbench, harmbench_multimodal,
jbb_behaviors, librai_do_not_answer, llm_latent_adversarial_training,
medsafetybench, mlcommons_ailuminate, multilingual_vulnerability,
pku_safe_rlhf, red_team_social_bias, sorry_bench, sosbench,
tdc23_redteaming, xstest

Testing

All 69 tests pass locally.

Adds tests for 20 previously untested remote dataset loaders: - aegis_ai, aya_redteaming, ccp_sensitive_prompts - darkbench, equitymedqa, forbidden_questions - harmbench, harmbench_multimodal, jbb_behaviors - librai_do_not_answer, llm_latent_adversarial_training - medsafetybench, mlcommons_ailuminate - multilingual_vulnerability, pku_safe_rlhf - red_team_social_bias, sorry_bench, sosbench - tdc23_redteaming, xstest Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds pytest unit coverage for previously untested remote seed dataset loaders by mocking network/HuggingFace fetches and asserting parsing, filtering, and validation behavior.

Changes:

Added new unit test modules for multiple remote dataset loaders.
Mocked _fetch_from_url / _fetch_from_huggingface (and image download helpers where applicable) to avoid live network calls.
Added assertions for filtering, validation/error paths, and dataset_name properties.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/unit/datasets/test_xstest_dataset.py	Tests URL-backed XSTest loader mapping to `SeedPrompt` and `dataset_name`.
tests/unit/datasets/test_tdc23_redteaming_dataset.py	Tests HF-backed TDC23 loader fetch + `dataset_name`.
tests/unit/datasets/test_sosbench_dataset.py	Tests HF-backed SOSBench loader mapping goal/subject to seed fields.
tests/unit/datasets/test_sorry_bench_dataset.py	Tests SorryBench filtering (style/category) + validation errors + `dataset_name`.
tests/unit/datasets/test_red_team_social_bias_dataset.py	Tests parsing single/multi-turn prompts and prompt grouping + `dataset_name`.
tests/unit/datasets/test_pku_safe_rlhf_dataset.py	Tests include/exclude safe prompts and harm-category filtering + `dataset_name`.
tests/unit/datasets/test_multilingual_vulnerability_dataset.py	Tests URL-backed multilingual loader mapping + `dataset_name`.
tests/unit/datasets/test_mlcommons_ailuminate_dataset.py	Tests URL-backed AILuminate loader hazard→category mapping + `dataset_name`.
tests/unit/datasets/test_medsafetybench_dataset.py	Tests subset handling, key validation, and `dataset_name`.
tests/unit/datasets/test_llm_latent_adversarial_training_dataset.py	Tests HF-backed LAT loader fetch + `dataset_name`.
tests/unit/datasets/test_librai_do_not_answer_dataset.py	Tests HF-backed LibrAI mapping multiple harm fields + `dataset_name`.
tests/unit/datasets/test_jbb_behaviors_dataset.py	Tests HF-backed JBB parsing, empty-dataset error, and category mapping helpers.
tests/unit/datasets/test_harmbench_multimodal_dataset.py	Tests multimodal filtering, image-download behavior, and error handling + `dataset_name`.
tests/unit/datasets/test_harmbench_dataset.py	Tests URL-backed HarmBench mapping and missing-key validation + `dataset_name`.
tests/unit/datasets/test_forbidden_questions_dataset.py	Tests HF-backed Forbidden Questions mapping + `dataset_name`.
tests/unit/datasets/test_equitymedqa_dataset.py	Tests subset selection (single/multiple) + `dataset_name` + invalid subset.
tests/unit/datasets/test_darkbench_dataset.py	Tests HF-backed DarkBench mapping and config/split passthrough + `dataset_name`.
tests/unit/datasets/test_ccp_sensitive_prompts_dataset.py	Tests HF-backed CCP prompts mapping + `dataset_name`.
tests/unit/datasets/test_aya_redteaming_dataset.py	Tests URL-backed Aya filtering by category/scope and language mapping + `dataset_name`.
tests/unit/datasets/test_aegis_ai_content_safety_dataset.py	Tests unsafe filtering, category filtering, and invalid categories + `dataset_name`.

- xstest: use dict mapping instead of index-based assertions - sorry_bench: use dict mapping instead of index-based assertions - red_team_social_bias: rename test to reflect actual behavior - jbb_behaviors: add comment explaining why Exception (not ValueError) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge 8 simple 2-test dataset files into test_simple_remote_datasets.py with parametrized test_dataset_name and test_fetch_dataset tests. Loaders with non-trivial logic keep their own files: aegis, aya, darkbench, equitymedqa, harmbench, harmbench_multimodal, jbb_behaviors, medsafetybench, pku_safe_rlhf, red_team_social_bias, sorry_bench. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

romanlutz and others added 3 commits April 11, 2026 18:05

Add missing remote dataset loader tests

d9579ef

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' into test/datasets-coverage

f2fa8a8

rlundeen2 requested a review from Copilot April 13, 2026 17:19

rlundeen2 self-assigned this Apr 13, 2026

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Comment thread tests/unit/datasets/test_xstest_dataset.py Outdated

Comment thread tests/unit/datasets/test_jbb_behaviors_dataset.py

Comment thread tests/unit/datasets/test_red_team_social_bias_dataset.py Outdated

Comment thread tests/unit/datasets/test_sorry_bench_dataset.py Outdated

Copilot started reviewing on behalf of rlundeen2 April 13, 2026 17:29 View session

romanlutz and others added 2 commits April 13, 2026 10:33

Merge branch 'main' into test/datasets-coverage

ddd840d

rlundeen2 reviewed Apr 13, 2026

View reviewed changes

Comment thread tests/unit/datasets/test_multilingual_vulnerability_dataset.py Outdated

rlundeen2 approved these changes Apr 13, 2026

View reviewed changes

romanlutz merged commit 6099098 into microsoft:main Apr 13, 2026
35 checks passed

romanlutz deleted the test/datasets-coverage branch April 13, 2026 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEST: Add unit tests for remote dataset loaders#1604

TEST: Add unit tests for remote dataset loaders#1604
romanlutz merged 6 commits into
microsoft:mainfrom
romanlutz:test/datasets-coverage

romanlutz commented Apr 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

romanlutz commented Apr 12, 2026

Summary

New Test Files

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants