MAINT: Deprecate dead split kwarg on 8 single-split HF dataset loaders#1901
Merged
romanlutz merged 3 commits intoJun 3, 2026
Merged
Conversation
These loaders expose a `split: str` constructor kwarg that forwards to
_fetch_from_huggingface, but the upstream HuggingFace dataset publishes
only one split per (loader, config). The kwarg has exactly one valid
value and is a misleading API surface that suggests users can pick a
split when they can't.
Per the public-API constraint (the kwarg is constructor-visible), this
change deprecates rather than deletes:
- constructor signature changes from `split: str = "<value>"` to
`split: str | None = None`
- DeprecationWarning (stacklevel=2) is emitted when a non-None value
is passed; target removal is v0.16.0, matching the existing cohort
in seed_dataset_provider.py and prepended_conversation_config.py
- self.split is dropped; the (single) literal split name is hardcoded
at the _fetch_from_huggingface(..., split="<value>") call site
- constructor docstrings now mark `split` as deprecated
Loaders affected (with the hardcoded literal):
- _CBTBenchDataset -> split="train" (38 configs, each single-split)
- _DarkBenchDataset -> split="train"
- _ForbiddenQuestionsDataset -> config="default", split="train" (also
fixes a misforwarding: the kwarg was passed as config= rather than
split=, but TrustAIRLab/forbidden_question_set only has one of each,
so it never did anything useful)
- _HarmfulQADataset -> split="train"
- _HiXSTestDataset -> split="train" (gated)
- _ORBench{80K,Hard,Toxic}Dataset -> split="train" (one edit on the
shared _ORBenchBaseDataset)
- _SGXSTestDataset -> split="train" (gated)
- _SimpleSafetyTestsDataset -> split="test" (this dataset is the odd
one out: its single published split is "test", not "train")
Multi-split loaders (BeaverTails, SaladBench, ToxicChat, JBBBehaviors)
are out of scope - their split kwarg has multiple valid values and is
doing real work. Whether they should be refactored into sibling
subclasses per the "each distinct upstream artifact is its own
registered subclass" rule is a separate design discussion.
Each affected test file gets a test_split_kwarg_emits_deprecation_warning
case asserting the warning fires; existing tests that previously
forwarded a non-default split= are updated to drop the kwarg (the
hardcoded literal is still asserted on the _fetch_from_huggingface call
kwargs).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…taset-split-args # Conflicts: # pyrit/datasets/seed_datasets/remote/or_bench_dataset.py # tests/unit/datasets/test_hixstest_dataset.py # tests/unit/datasets/test_sgxstest_dataset.py
The merge from origin/main brought in TestDarkBenchDataset. test_fetch_dataset_with_custom_config (added in microsoft#1780), which constructed _DarkBenchDataset(split="test") and asserted the kwarg was forwarded to _fetch_from_huggingface_async. That contract no longer holds: this PR deprecates the dead split kwarg and hardcodes split="train" at the DarkBench call site, since upstream apart/darkbench publishes only the "train" split. Drop the deprecated kwarg from the constructor call and assert the hardcoded "train" literal that actually flows through. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
hannahwestra25
approved these changes
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Eight
_*Datasetloaders underpyrit/datasets/seed_datasets/remote/expose asplit: strconstructor kwarg that forwards to_fetch_from_huggingface, even though the upstream HuggingFace dataset publishes only one split. The kwarg has exactly one valid value, never does anything useful, and is a misleading API surface that suggests users can pick a split when they can't.Because the kwarg is part of the public constructor signature, this PR deprecates rather than deletes it (target removal: v0.16.0, matching the existing cohort in
seed_dataset_provider.pyandprepended_conversation_config.py):split: str = "<value>"→split: str | None = NoneDeprecationWarning(stacklevel=2)when a non-None value is passedself.splitdropped; the single literal split name is hardcoded at the_fetch_from_huggingface(..., split="<value>")call sitesplitas deprecatedLoaders affected (with the hardcoded literal):
_CBTBenchDatasetPsychotherapy-LLM/CBT-Bench(38 configs, each single-split)"train"_DarkBenchDatasetapart/darkbench"train"_ForbiddenQuestionsDatasetTrustAIRLab/forbidden_question_setconfig="default",split="train"_HarmfulQADatasetdeclare-lab/HarmfulQA"train"_HiXSTestDatasetwalledai/HiXSTest(gated)"train"_ORBench{80K,Hard,Toxic}Datasetbench-llm/OR-Bench(one base-class edit)"train"_SGXSTestDatasetwalledai/SGXSTest(gated)"train"_SimpleSafetyTestsDatasetBertievidgen/SimpleSafetyTests"test"(note: not"train")_ForbiddenQuestionsDatasetis a special case: the kwarg was actually misforwarded to HuggingFace asconfig=rather thansplit=, butTrustAIRLab/forbidden_question_setonly has one config ("default") and one split ("train"), so the bug was harmless. The deprecation message explains both issues.Out of scope: multi-split loaders where the
splitkwarg has more than one valid value —_BeaverTailsDataset,_SaladBenchDataset,_ToxicChatDataset,_JBBBehaviorsDataset. Whether those should be refactored into sibling subclasses per the "each distinct upstream artifact is its own registered subclass" rule is a separate design discussion.Tests and Documentation
test_split_kwarg_emits_deprecation_warningcase asserting the warning fires; existing tests that previously forwarded a non-defaultsplit=are updated to drop the kwarg (the hardcoded literal is still asserted on the_fetch_from_huggingfacecall kwargs).tests/end_to_end/test_all_datasets.pywas run scoped to the 10 affected providers (8 loader entries + 2 extra OR-Bench siblings) and all 10 successfully fetched real data from HuggingFace, including the 2 gated datasets (HiXSTest, SGXSTest) and the large OR-Bench 80K.pyrit/,tests/, anddoc/confirmed no other call sites passsplit=to these loaders — only the new deprecation tests do.