fix: validate split_overlap is less than split_length in DocumentSplitter#11625
Merged
anakin87 merged 2 commits intoJun 15, 2026
Merged
Conversation
…tter DocumentSplitter validated split_length > 0 and split_overlap >= 0, but not that split_overlap < split_length. When split_overlap >= split_length the window step (split_length - split_overlap) becomes zero or negative, which crashed at run time with an opaque error from more-itertools' `windowed`. Validate it at initialization with a clear ValueError, consistent with the adjacent checks. Adds a regression test and a release note.
|
@i-anubhav-anand is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
Contributor
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||
anakin87
reviewed
Jun 15, 2026
…ea8234c2c322cc.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issues
None — self-found while reviewing
DocumentSplittervalidation.Proposed Changes
DocumentSplitter.__init__validatessplit_length > 0andsplit_overlap >= 0, but not thatsplit_overlap < split_length. Whensplit_overlap >= split_length, the window step (split_length - split_overlap) becomes zero or negative, and_concatenate_unitscallsmore_itertools.windowed(..., step=step), which raises an opaqueValueError: step must be >= 1deep in the call stack at run time — long after the misconfiguration was introduced.This adds an early, clear validation at initialization, consistent with the two checks right above it:
How did you test it?
test_split_overlap_not_less_than_split_length) coveringsplit_overlap == split_lengthandsplit_overlap > split_length. Verified it fails before this change (no error at init) and passes after.hatch run test:unit test/components/preprocessors/test_document_splitter.py→ 55 passed.hatch run test:types(module) andhatch run fmt→ clean.Notes for the reviewer
Behavior change is limited to rejecting a configuration that was already broken (it crashed at run time); valid configs (
split_overlap < split_length) are unaffected. Release note added underreleasenotes/notes/.