fix: avoid NaN cosine scores for zero-norm embeddings in InMemoryDocumentStore by i-anubhav-anand · Pull Request #11628 · deepset-ai/haystack

i-anubhav-anand · 2026-06-14T19:56:29Z

Related Issues

Self-found bug (no existing issue). InMemoryDocumentStore.embedding_retrieval returns NaN similarity scores when a document (or the query) has a zero-norm embedding, silently corrupting ranking.

Proposed Changes:

For cosine similarity, embeddings are normalized by their L2 norm:

query_embedding /= np.linalg.norm(x=query_embedding, axis=1, keepdims=True)
document_embeddings /= np.linalg.norm(x=document_embeddings, axis=1, keepdims=True)

A zero-norm vector (e.g. a zero embedding, which some models emit for empty/whitespace input) makes this divide by zero, producing NaN scores (numpy even emits a RuntimeWarning: invalid value encountered in divide). NaN then sorts unpredictably and silently corrupts the ranking.

This guards the normalization so zero-norm vectors stay zero (denominator forced to 1.0), giving such documents a cosine score of 0.0 instead of NaN. Non-zero embeddings are unaffected.

Reproduction (before the fix):

from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore

store = InMemoryDocumentStore(embedding_similarity_function="cosine")
store.write_documents([Document(content="zero", embedding=[0.0, 0.0, 0.0])])
print(store.embedding_retrieval(query_embedding=[1.0, 0.0, 0.0])[0].score)  # nan

How did you test it?

Added test_embedding_retrieval_with_zero_vector_does_not_produce_nan in test/document_stores/test_in_memory.py: a zero-embedding document no longer yields a NaN score (it gets 0.0) while a normal document is unaffected. It fails on main (NaN) and passes with this change. Ran hatch run test:unit test/document_stores/test_in_memory.py (148 passed, 4 skipped), hatch run fmt (clean), hatch run test:types haystack/document_stores/in_memory/document_store.py (mypy clean), and added a release note.

Notes for the reviewer

Behavior for non-zero embeddings is unchanged; only the zero-norm edge case is guarded.

Checklist

I have read the contributors guidelines and the code of conduct.
I have added unit tests and updated the docstrings.
I've used a conventional commit type for my PR title (fix:).
I have added a release note file.
I have run pre-commit hooks / hatch run fmt and fixed any issue.

This PR was generated with the help of an AI assistant. I have reviewed the changes, reproduced the bug, and run the relevant tests locally.

…mentStore embedding_retrieval normalized embeddings by their L2 norm for cosine similarity, dividing by zero when a document or the query had a zero-norm embedding and producing NaN scores that silently corrupt ranking. Guard the normalization so zero-norm vectors stay zero (score 0.0) instead of NaN.

vercel · 2026-06-14T19:56:34Z

@i-anubhav-anand is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

vercel · 2026-06-16T08:37:32Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
haystack-docs	Ignored	Preview	Jun 16, 2026 9:07am

julian-risch

Looks good to me! Thank you for your contribution @i-anubhav-anand !

Updated formatting for code snippets in release notes.

Document.score is Optional[float], so math.isnan() needs a None guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-16T09:11:39Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
haystack/document_stores/in_memory
document_store.py					801
Project Total

_{This report was generated by python-coverage-comment-action}

i-anubhav-anand requested a review from a team as a code owner June 14, 2026 19:56

i-anubhav-anand requested review from julian-risch and removed request for a team June 14, 2026 19:56

github-actions Bot added the topic:tests label Jun 14, 2026

julian-risch enabled auto-merge (squash) June 16, 2026 08:37

julian-risch approved these changes Jun 16, 2026

View reviewed changes

github-actions Bot added the type:documentation Improvements on the docs label Jun 16, 2026

julian-risch and others added 2 commits June 16, 2026 10:40

Fix formatting of release note

0b7ec4d

Updated formatting for code snippets in release notes.

fix: guard against None score in zero-vector test to satisfy mypy

c757a64

Document.score is Optional[float], so math.isnan() needs a None guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

julian-risch merged commit dd05a63 into deepset-ai:main Jun 16, 2026
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: avoid NaN cosine scores for zero-norm embeddings in InMemoryDocumentStore#11628

fix: avoid NaN cosine scores for zero-norm embeddings in InMemoryDocumentStore#11628
julian-risch merged 3 commits into
deepset-ai:mainfrom
i-anubhav-anand:fix/in-memory-cosine-zero-vector-nan

i-anubhav-anand commented Jun 14, 2026

Uh oh!

vercel Bot commented Jun 14, 2026

Uh oh!

vercel Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

julian-risch left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

i-anubhav-anand commented Jun 14, 2026

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

vercel Bot commented Jun 14, 2026

Uh oh!

vercel Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julian-risch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 16, 2026

Coverage report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 16, 2026 •

edited

Loading