TEST: Add unit tests for remote dataset loaders#1604
Merged
Conversation
Adds tests for 20 previously untested remote dataset loaders: - aegis_ai, aya_redteaming, ccp_sensitive_prompts - darkbench, equitymedqa, forbidden_questions - harmbench, harmbench_multimodal, jbb_behaviors - librai_do_not_answer, llm_latent_adversarial_training - medsafetybench, mlcommons_ailuminate - multilingual_vulnerability, pku_safe_rlhf - red_team_social_bias, sorry_bench, sosbench - tdc23_redteaming, xstest Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds pytest unit coverage for previously untested remote seed dataset loaders by mocking network/HuggingFace fetches and asserting parsing, filtering, and validation behavior.
Changes:
- Added new unit test modules for multiple remote dataset loaders.
- Mocked
_fetch_from_url/_fetch_from_huggingface(and image download helpers where applicable) to avoid live network calls. - Added assertions for filtering, validation/error paths, and
dataset_nameproperties.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/datasets/test_xstest_dataset.py | Tests URL-backed XSTest loader mapping to SeedPrompt and dataset_name. |
| tests/unit/datasets/test_tdc23_redteaming_dataset.py | Tests HF-backed TDC23 loader fetch + dataset_name. |
| tests/unit/datasets/test_sosbench_dataset.py | Tests HF-backed SOSBench loader mapping goal/subject to seed fields. |
| tests/unit/datasets/test_sorry_bench_dataset.py | Tests SorryBench filtering (style/category) + validation errors + dataset_name. |
| tests/unit/datasets/test_red_team_social_bias_dataset.py | Tests parsing single/multi-turn prompts and prompt grouping + dataset_name. |
| tests/unit/datasets/test_pku_safe_rlhf_dataset.py | Tests include/exclude safe prompts and harm-category filtering + dataset_name. |
| tests/unit/datasets/test_multilingual_vulnerability_dataset.py | Tests URL-backed multilingual loader mapping + dataset_name. |
| tests/unit/datasets/test_mlcommons_ailuminate_dataset.py | Tests URL-backed AILuminate loader hazard→category mapping + dataset_name. |
| tests/unit/datasets/test_medsafetybench_dataset.py | Tests subset handling, key validation, and dataset_name. |
| tests/unit/datasets/test_llm_latent_adversarial_training_dataset.py | Tests HF-backed LAT loader fetch + dataset_name. |
| tests/unit/datasets/test_librai_do_not_answer_dataset.py | Tests HF-backed LibrAI mapping multiple harm fields + dataset_name. |
| tests/unit/datasets/test_jbb_behaviors_dataset.py | Tests HF-backed JBB parsing, empty-dataset error, and category mapping helpers. |
| tests/unit/datasets/test_harmbench_multimodal_dataset.py | Tests multimodal filtering, image-download behavior, and error handling + dataset_name. |
| tests/unit/datasets/test_harmbench_dataset.py | Tests URL-backed HarmBench mapping and missing-key validation + dataset_name. |
| tests/unit/datasets/test_forbidden_questions_dataset.py | Tests HF-backed Forbidden Questions mapping + dataset_name. |
| tests/unit/datasets/test_equitymedqa_dataset.py | Tests subset selection (single/multiple) + dataset_name + invalid subset. |
| tests/unit/datasets/test_darkbench_dataset.py | Tests HF-backed DarkBench mapping and config/split passthrough + dataset_name. |
| tests/unit/datasets/test_ccp_sensitive_prompts_dataset.py | Tests HF-backed CCP prompts mapping + dataset_name. |
| tests/unit/datasets/test_aya_redteaming_dataset.py | Tests URL-backed Aya filtering by category/scope and language mapping + dataset_name. |
| tests/unit/datasets/test_aegis_ai_content_safety_dataset.py | Tests unsafe filtering, category filtering, and invalid categories + dataset_name. |
- xstest: use dict mapping instead of index-based assertions - sorry_bench: use dict mapping instead of index-based assertions - red_team_social_bias: rename test to reflect actual behavior - jbb_behaviors: add comment explaining why Exception (not ValueError) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rlundeen2
reviewed
Apr 13, 2026
rlundeen2
approved these changes
Apr 13, 2026
Merge 8 simple 2-test dataset files into test_simple_remote_datasets.py with parametrized test_dataset_name and test_fetch_dataset tests. Loaders with non-trivial logic keep their own files: aegis, aya, darkbench, equitymedqa, harmbench, harmbench_multimodal, jbb_behaviors, medsafetybench, pku_safe_rlhf, red_team_social_bias, sorry_bench. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds unit tests for 20 previously untested remote dataset loader files.
New Test Files
Each test mocks HTTP/HuggingFace calls and tests dataset fetching, filtering, and validation:
aegis_ai_content_safety, aya_redteaming, ccp_sensitive_prompts, darkbench,
equitymedqa, forbidden_questions, harmbench, harmbench_multimodal,
jbb_behaviors, librai_do_not_answer, llm_latent_adversarial_training,
medsafetybench, mlcommons_ailuminate, multilingual_vulnerability,
pku_safe_rlhf, red_team_social_bias, sorry_bench, sosbench,
tdc23_redteaming, xstest
Testing
All 69 tests pass locally.