Skip to content

fix: validate split_overlap is less than split_length in DocumentSplitter#11625

Merged
anakin87 merged 2 commits into
deepset-ai:mainfrom
i-anubhav-anand:fix/document-splitter-overlap-validation
Jun 15, 2026
Merged

fix: validate split_overlap is less than split_length in DocumentSplitter#11625
anakin87 merged 2 commits into
deepset-ai:mainfrom
i-anubhav-anand:fix/document-splitter-overlap-validation

Conversation

@i-anubhav-anand

Copy link
Copy Markdown
Contributor

Related Issues

None — self-found while reviewing DocumentSplitter validation.

Proposed Changes

DocumentSplitter.__init__ validates split_length > 0 and split_overlap >= 0, but not that split_overlap < split_length. When split_overlap >= split_length, the window step (split_length - split_overlap) becomes zero or negative, and _concatenate_units calls more_itertools.windowed(..., step=step), which raises an opaque ValueError: step must be >= 1 deep in the call stack at run time — long after the misconfiguration was introduced.

This adds an early, clear validation at initialization, consistent with the two checks right above it:

if split_overlap >= split_length:
    raise ValueError("split_overlap must be less than split_length.")

How did you test it?

  • Added a regression test (test_split_overlap_not_less_than_split_length) covering split_overlap == split_length and split_overlap > split_length. Verified it fails before this change (no error at init) and passes after.
  • hatch run test:unit test/components/preprocessors/test_document_splitter.py55 passed.
  • hatch run test:types (module) and hatch run fmt → clean.

Notes for the reviewer

Behavior change is limited to rejecting a configuration that was already broken (it crashed at run time); valid configs (split_overlap < split_length) are unaffected. Release note added under releasenotes/notes/.

…tter

DocumentSplitter validated split_length > 0 and split_overlap >= 0, but not
that split_overlap < split_length. When split_overlap >= split_length the
window step (split_length - split_overlap) becomes zero or negative, which
crashed at run time with an opaque error from more-itertools' `windowed`.

Validate it at initialization with a clear ValueError, consistent with the
adjacent checks. Adds a regression test and a release note.
@i-anubhav-anand i-anubhav-anand requested a review from a team as a code owner June 14, 2026 19:22
@i-anubhav-anand i-anubhav-anand requested review from anakin87 and removed request for a team June 14, 2026 19:22
@vercel

vercel Bot commented Jun 14, 2026

Copy link
Copy Markdown

@i-anubhav-anand is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant

CLAassistant commented Jun 14, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions Bot added topic:tests type:documentation Improvements on the docs labels Jun 14, 2026
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  haystack/components/preprocessors
  document_splitter.py
Project Total  

This report was generated by python-coverage-comment-action

Comment thread releasenotes/notes/fix-document-splitter-overlap-validation-92ea8234c2c322cc.yaml Outdated

@anakin87 anakin87 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thank you!

@anakin87 anakin87 enabled auto-merge (squash) June 15, 2026 08:44
@anakin87 anakin87 merged commit 3611929 into deepset-ai:main Jun 15, 2026
32 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants