Releases · deepset-ai/haystack

17 Jun 14:32

github-actions

Immutable

v2.30.2-rc1

5d3c996

v2.30.2-rc1 Pre-release

Pre-release

🐛 Bug Fixes

Fixed the Agent exiting prematurely under the default exit_conditions=["text"]. The agent now only stops when the last message is an assistant message with non-empty text (or when no tool invoker is configured). Previously, if the LLM produced an invalid tool call that was discarded, the resulting assistant message with empty text and no tool calls would trigger an exit, preventing the agent from recovering. The agent now continues looping so the model can recover on the next iteration.

Assets 3

09 Jun 13:26

github-actions

Immutable

v2.30.1

47a19c2

v2.30.1 Latest

Latest

⚡️ Enhancement Notes

AzureOpenAIChatGenerator now accepts a Secret for the azure_endpoint and api_version parameters in addition to a plain string. This makes it possible to resolve these values from environment variables at runtime, for example with Secret.from_env_var("AZURE_OPENAI_ENDPOINT"), so the same serialized pipeline can switch between environments (e.g. dev and prod) by changing environment variables instead of the pipeline definition.

Assets 3

09 Jun 12:45

github-actions

Immutable

v2.30.1-rc1

df454b5

v2.30.1-rc1 Pre-release

Pre-release

v2.30.1-rc1

Assets 3

03 Jun 10:21

github-actions

Immutable

v2.30.0

cdd105e

v2.30.0

⭐️ Highlights

🐍 Syntax-aware Python code splitting with `PythonCodeSplitter`

The new PythonCodeSplitter is a syntax-aware splitter for Python source files, built for code-RAG and code-search pipelines where naive line-based splitting tends to cut through functions and lose structural context. It parses sources with the ast module and greedily merges units, such as module docstring, import blocks, top-level functions, class headers, methods, and nested classes, into chunks of roughly max_effective_lines, keeping whole functions and methods together. For functions that exceed oversized_factor * max_effective_lines, it falls back to a line-based secondary split with overlap.

Two options make the resulting chunks more useful downstream: strip_docstrings=True moves docstrings into chunk metadata, and preserve_class_definition=True prepends the enclosing class signature to chunks whose members live in a later chunk. Each chunk also carries rich metadata including start_line, end_line, unit_kinds, include_classes, decorators, docstrings, source_id, and split_id.

from haystack.components.preprocessors import PythonCodeSplitter

splitter = PythonCodeSplitter(
    max_effective_lines=80,
    strip_docstrings=True,
    preserve_class_definition=True,
)
result = splitter.run(documents=[doc])

💬 Pass a plain string to any `ChatGenerator`

All Haystack ChatGenerator components now accept a plain string for the messages parameter in addition to a list of ChatMessage objects. The string is automatically wrapped in a ChatMessage with the user role. This makes switching from a Generator to a ChatGenerator a one-line change. The change applies to AzureOpenAIChatGenerator, AzureOpenAIResponsesChatGenerator, FallbackChatGenerator, HuggingFaceAPIChatGenerator, HuggingFaceLocalChatGenerator, OpenAIChatGenerator, and OpenAIResponsesChatGenerator, and will soon be rolled out to the ChatGenerators in Haystack Core Integrations.

from haystack.components.generators.chat import OpenAIChatGenerator

generator = OpenAIChatGenerator()

# passing a string is equivalent to passing [ChatMessage.from_user("...")]
response = generator.run("What's Natural Language Processing?")
print(response["replies"][0].text)

⬆️ Upgrade Notes

DALLEImageGenerator has been updated to account for OpenAI's retirement of the DALL-E models. The default model is now gpt-image-2 (previously dall-e-3). To migrate:
- Update model value: besides gpt-image-2, gpt-image-1 and gpt-image-1-mini are also supported.
- Update quality value: the new accepted values are auto, high, medium, or low (previously standard or hd).
- Update size value: the new accepted values are 1024x1024, 1024x1536, 1536x1024, or auto. gpt-image-2 also supports arbitrary sizes.
- The response_format parameter is now ignored. The component always returns base64-encoded JSON.
```
# Before
llm.run([message], my_callback)

# After
llm.run(messages=[message], streaming_callback=my_callback)
```

🚀 New Features

Introduced the PythonCodeSplitter component, a syntax-aware splitter for Python source files:
- Parses sources with the ast module and merges units (module docstring, import blocks, top-level functions, class headers, methods, nested classes, and remaining statements) greedily into chunks of roughly max_effective_lines.
- Keeps whole functions and methods together; falls back to a line-based secondary split (using DocumentSplitter) with overlap only for functions whose effective length exceeds oversized_factor * max_effective_lines.
- Optionally strips docstrings into chunk metadata via strip_docstrings=True, and prepends the enclosing class signature to chunks whose members live in a later chunk via preserve_class_definition=True.
- Emits per-chunk metadata including start_line, end_line, unit_kinds, include_classes, decorators, docstrings, source_id, and split_id.
All Haystack ChatGenerator components now also accept a plain string for the messages parameter in addition to a list of ChatMessage objects. The string is automatically converted into a list containing a ChatMessage with the user role. This is done to simplify switching from Generators to ChatGenerators; Generators might be removed in Haystack 3.0.

This applies to AzureOpenAIChatGenerator, AzureOpenAIResponsesChatGenerator, FallbackChatGenerator, HuggingFaceAPIChatGenerator, HuggingFaceLocalChatGenerator, OpenAIChatGenerator, and OpenAIResponsesChatGenerator.

The same change will be soon applied to ChatGenerators available in Haystack Core Integrations.

Example:
```
from haystack.components.generators.chat import OpenAIChatGenerator

generator = OpenAIChatGenerator()

# passing a string is equivalent to passing [ChatMessage.from_user("...")]
response = generator.run("What's Natural Language Processing?")
print(response["replies"][0].text)
```

⚡️ Enhancement Notes

Added run_async to TextEmbeddingRetriever, MultiQueryEmbeddingRetriever, and MultiQueryTextRetriever. These components now execute natively as coroutines in AsyncPipeline, delegating to each wrapped component's run_async when available and falling back to a thread executor otherwise.
Fix grammar in the AzureOpenAIGenerator and AzureOpenAIChatGenerator docstring code examples ("<this a model name..." → "<this is a model name...") so that copy-pasted snippets read correctly.
Update ToolsType to improve type checking for the tools parameter. Any class that inherits from either Tool or Toolset is now accepted in any sequence (list, tuple, etc).
Pipeline.draw() and Pipeline.show() now validate the Mermaid server response before writing it to disk. The response body is checked against the expected output format (PNG, JPEG, WebP, SVG, or PDF) via its magic-byte signature, and the Content-Type header is checked as well. If the response is empty or does not match the requested format, a PipelineDrawingError is raised and no file is written. This prevents a misconfigured or untrusted server_url from causing arbitrary content (for example an HTML error page) to be saved verbatim to the output path.

🐛 Bug Fixes

Prevent Document.from_dict() from mutating the input dictionary during deserialization.
Prevent DocumentLanguageClassifier from crashing when Document.content=None by marking them as unmatched and logging a warning.
Fixed a bug where Agent would not exit when the model emitted multiple tool calls in a single turn and the configured exit-condition tool was not the first one in the list. Previously, only the first tool call in each assistant message was checked against exit_conditions, so a reply like [search, finish] (with exit_conditions=["finish"]) would silently fail to stop the loop and keep iterating until max_agent_steps was reached. Since parallel tool calls are now the norm for frontier models, this could quietly turn a single successful turn into dozens of wasted LLM calls. The Agent now inspects every tool call in the message, so the exit condition is honored regardless of ordering.
Fix AnswerBuilder.run() mutating the meta dict of input Document objects. source_index (and referenced when reference_pattern is set) are now only added to the document copies inside GeneratedAnswer.documents, not to the originals.
Fixed DocumentJoiner in concatenate mode so that documents with a score of exactly 0.0 are no longer treated as unscored during deduplication. Previously a truthiness check coerced score=0.0 to -inf, which could cause a worse, negatively-scored duplicate to be kept instead of the 0.0-scored document. The merge mode was updated to the same explicit is not None check for consistency; its observable behavior is unchanged.
Fixed in-place mutation of ExtractedAnswer.meta in ExtractiveReader._add_answer_page_number when the answer's meta was None. Now uses dataclasses.replace to avoid triggering the dataclass mutation warning.
Fixed ExtractiveReader raising ValueError when the number of valid answer spans for a sequence was smaller than answers_per_seq (for example with short documents or when answers_per_seq exceeded the number of upper-triangular, non-masked (start, end) token pairs). _postprocess now filters the per-sequence probabilities by the same validity mask it already applied to the start/end token indices, so the three structures always have matching lengths.
HierarchicalDocumentSplitter no longer mutates the metadata of the input Document. _add_meta_data now returns a new Document with a copied meta dict via dataclasses.replace instead of writing __block_size, __parent_id, __children_ids and __level onto the caller's Document.
Fixed a bug in LLMMetadataExtractor.run_async where the asyncio.Semaphore intended to bound concurrent LLM calls to max_workers was acquired once around the outer gather(...) call instead of inside each task. As a result, max_workers had no effect in run_async and all LLM requests for a batch were issued simultaneously. The semaphore is now acquired per task, so max_workers correctly caps in-flight requests.
expand_page_range() now raises a ValueError: too many values to unpack when a page range string contained more than one hyphen (e.g. "10-20-30"). The parser now validates the format and raises a clear ValueError with an explanatory message for invalid inputs.
LLMMetadataExtractor now raises a clear ValueError when the prompt contains no template variables. Previously this case raised an unhelpful IndexError: list index out of range. The error message now consistently expl...

Contributors

julian-risch, davidsbatista, and 12 other contributors

Assets 3

02 Jun 12:23

github-actions

Immutable

v2.30.0-rc1

b24edf7

v2.30.0-rc1 Pre-release

Pre-release

v2.30.0-rc1

Assets 3

12 May 14:25

github-actions

Immutable

v2.29.0

63929e0

v2.29.0

⭐️ Highlights

🔍 Combine Retrievers with `MultiRetriever` and `TextEmbeddingRetriever`

Two new retriever components make it easier to build hybrid search pipelines. MultiRetriever runs multiple text retrievers in parallel and merges their results into a single deduplicated list, ranked by reciprocal rank fusion by default. You can selectively enable or disable individual retrievers at runtime using the active_retrievers parameter. This is useful when you want to skip the embedding retriever for short or keyword-only queries, for example.

TextEmbeddingRetriever wraps an embedding-based retriever together with a text embedder into a single component, making it compatible with MultiRetriever by implementing the TextRetriever protocol. Here's how to combine BM25 and embedding retrieval in a single component:

from haystack.components.retrievers import MultiRetriever, TextEmbeddingRetriever
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever, InMemoryEmbeddingRetriever
from haystack.components.embedders import SentenceTransformersTextEmbedder

retriever = MultiRetriever(
    retrievers={
        "bm25": InMemoryBM25Retriever(document_store=doc_store),
        "embedding": TextEmbeddingRetriever(
            retriever=InMemoryEmbeddingRetriever(document_store=doc_store),
            text_embedder=SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"),
        ),
    },
    top_k=3,
)

# Run all retrievers
result = retriever.run(query="green energy sources")

# Run only the BM25 retriever
result = retriever.run(query="green energy sources", active_retrievers=["bm25"])

⬆️ Upgrade Notes

LLM.run and LLM.run_async no longer accept messages and streaming_callback as positional arguments — they must now be passed as keyword arguments. Update any direct calls accordingly:
```
# Before
llm.run([message], my_callback)

# After
llm.run(messages=[message], streaming_callback=my_callback)
```

🚀 New Features

Add run_async to CacheChecker, enabling it to be used in AsyncPipeline without blocking the event loop.

⚡️ Enhancement Notes

Document the input ordering behavior of auto-promoted lazy variadic sockets in Pipeline.connect(). When multiple senders are connected to the same list-typed receiver socket, ordering depends on the pipeline class. With Pipeline, items are ordered alphabetically by sender component name (because Pipeline.run() schedules components in alphabetical order for deterministic execution), not by the order of connect() calls. With AsyncPipeline, no ordering is guaranteed, since components in different branches may run in parallel. The docstrings now point users to a dedicated joiner component when they need explicit ordering.
Add join_mode parameter to the experimental MultiRetriever component, supporting "reciprocal_rank_fusion" (default) and "concatenate". Reciprocal Rank Fusion merges the ranked result lists from all retrievers into a single deduplicated list ordered by RRF score. The underlying RRF logic is extracted into a shared utility _reciprocal_rank_fusion in haystack.utils.misc, which is now also used by DocumentJoiner.
LLM now supports two usage modes:
1. Template-variable mode: provide a user_prompt with Jinja2 variables (e.g. {{ query }}).
  Those variables become pipeline inputs and messages is optional. The rendered user_prompt
  is always appended after any messages provided at runtime.
2. Pass-through mode: omit user_prompt or provide one with no template variables. messages
  becomes a required input, allowing a fully-constructed list of ChatMessages to be passed from upstream.

🐛 Bug Fixes

Fixed a bug in NamedEntityExtractor where the spaCy/Thinc device state was not correctly restored after execution, potentially affecting the device configuration of other spaCy components in the same process.
Preserve resumable snapshots when some inputs or outputs are non-serializable. Haystack now omits only the failing top-level fields (for example non-serializable callbacks or runtime objects) instead of replacing the whole payload with an empty dictionary. This applies both to agent sub-component inputs (chat_generator and tool_invoker) and to pipeline-level inputs, original_input_data, and pipeline_outputs captured by _create_pipeline_snapshot. When every field fails to serialize, the snapshot still stores a structurally valid empty payload ({"serialization_schema": {"type": "object", "properties": {}}, "serialized_data": {}}) so that resuming the snapshot does not raise DeserializationError — for example when resuming from a ToolBreakpoint where the sub-component's inputs are not strictly required.
Fixed tools_strict=True in OpenAIChatGenerator to recursively apply additionalProperties: false and required to all nested objects in tool parameter schemas. Previously only the top-level object was transformed, causing OpenAI's strict mode to reject tools with nested parameters.

💙 Big thank you to everyone who contributed to this release!

@Aftabbs, @albertodiazdurana, @anakin87, @ArkaD171717, @bilgeyucel, @bogdankostic, @davidsbatista, @FuturMix, @julian-risch, @kacperlukawski, @ritikraj2425, @saivedant169, @shaun0927, @sjrl, @SyedShahmeerAli12

Contributors

kacperlukawski, julian-risch, and 13 other contributors

Assets 3

12 May 13:08

github-actions

Immutable

v2.29.0-rc2

42ad2af

v2.29.0-rc2 Pre-release

Pre-release

v2.29.0-rc2

Assets 3

11 May 14:37

github-actions

Immutable

v2.29.0-rc1

6e19535

v2.29.0-rc1 Pre-release

Pre-release

v2.29.0-rc1

Assets 3

20 Apr 15:02

github-actions

Immutable

v2.28.0

66af637

v2.28.0

Upgrade Notes

As part of the migration from requests to httpx, request_with_retry and async_request_with_retry (in haystack.utils.requests_utils) no longer raise requests.exceptions.RequestException on failure; they now raise httpx.HTTPError instead. This also affects HuggingFaceTEIRanker, which relies on these utilities. Users catching requests.exceptions.RequestException should update their code to catch httpx.HTTPError.
The LLM component now requires user_prompt to be provided at initialization and it must contain at least one Jinja2 template variable (e.g. {{ variable_name }}). This ensures the component always exposes at least one required input socket, which is necessary for correct pipeline scheduling.

required_variables now defaults to "*" (all variables in user_prompt are required), and passing an empty list raises a ValueError.

If you are affected: update any code that instantiates LLM without a user_prompt, or with a user_prompt that has no template variables, to include at least one variable.

Before:
```
llm = LLM(chat_generator=OpenAIChatGenerator(), system_prompt="You are helpful.")
```
After:
```
llm = LLM(
    chat_generator=OpenAIChatGenerator(),
    system_prompt="You are helpful.",
    user_prompt='{% message role="user" %}{{ query }}{% endmessage %}',
)
```
Agent.run() and Agent.run_async() now require messages as an explicit argument (no longer optional). If you were relying on the default None value in Haystack version 2.26 or 2.27, pass an empty list instead:
```
agent.run(messages=[], ...)
```
LLM.run() and LLM.run_async() are unaffected — they still accept None and default to an empty list internally.

New Features

Tools and components can now declare a State (or State | None) parameter in their signature to receive the live agent State object at invocation time — no extra wiring needed.

For function-based tools created with @tool or create_tool_from_function, add a state parameter annotated as State:
```
from haystack.components.agents import State
from haystack.tools import tool

@tool
def my_tool(query: str, state: State) -> str:
    """Search using context from agent state."""
    history = state.get("history")
    ...
```
For component-based tools created with ComponentTool, declare a State input socket on the component's run method:
```
from haystack import component
from haystack.components.agents import State
from haystack.tools import ComponentTool

@component
class MyComponent:
    @component.output_types(result=str)
    def run(self, query: str, state: State) -> dict:
        history = state.get("history")
        ...

tool = ComponentTool(component=MyComponent())
```
In both cases ToolInvoker automatically injects the runtime State object before calling the tool, and State/Optional[State] parameters are excluded from the LLM-facing schema so the model is not asked to supply them.

This is an alternative to the existing inputs_from_state and outputs_to_state options on Tool and ComponentTool, which map individual state keys to specific tool parameters and outputs declaratively. Injecting the full State object is more flexible and useful when a tool needs to read from or write to multiple keys, but it couples the tool implementation directly to State.

Enhancement Notes

Clarify in the Markdown-producing converter documentation that DocumentCleaner with its default settings can flatten Markdown output, and update the example pipelines for PaddleOCRVLDocumentConverter, MistralOCRDocumentConverter, AzureDocumentIntelligenceConverter, and MarkItDownConverter to avoid routing Markdown content through the default cleaner configuration.
Made _create_agent_snapshot robust towards serialization errors. If serializing agent component inputs fails, a warning is logged and an empty dictionary is used as a fallback, preventing the serialization error from masking the real pipeline runtime error.
Standardize HTTP request handling in Haystack by adopting httpx for both synchronous and asynchronous requests, replacing requests. Error reporting for failed requests has also been improved: exceptions now include additional details alongside the reason field.
Add run_async method to LLMMetadataExtractor. ChatGenerator requests now run concurrently using the existing max_workers init parameter.
MarkdownHeaderSplitter now accepts a header_split_levels parameter (list of integers 1–6, default all levels) to control which header depths create split boundaries. For example, header_split_levels=[1, 2] splits only on # and ## headers, merging content under deeper headers into the preceding chunk.
MarkdownHeaderSplitter now ignores # lines that appear inside fenced code blocks (triple-backtick or triple-tilde), preventing Python comments and other hash-prefixed lines in code from being misidentified as Markdown headers.
Expand the PaddleOCRVLDocumentConverter documentation with more detailed guidance on advanced parameters, common usage scenarios, and a more realistic configuration example for layout-heavy documents.

Bug Fixes

Fix ToolInvoker._merge_tool_outputs silently appending None to list-typed state when a tool's outputs_to_state source key is absent from the tool result. This is a common scenario with PipelineTool wrapping a pipeline that has conditional branches where not all outputs are always produced even if defined in outputs_to_state. The mapping is now skipped entirely when the source key is not present in the result dict.

When using the MarkdownHeaderSplitter, in the split chunks, the child header previously lost its direct parent header in the metadata. Previously if one executed the code below:

from haystack.components.preprocessors import MarkdownHeaderSplitter
from haystack import Document
text = """
# header 1
intro text

## header 1.1
text 1

## header 1.2
text 2

### header 1.2.1
text 3

### header 1.2.2
text 4
"""

document = Document(content=text)

splitter = MarkdownHeaderSplitter(
        keep_headers=True,
        secondary_split="word"
)
result = splitter.run(documents=[document])["documents"]

for doc in result:
    print(f"Header: {doc.meta['header']}, parent headers: {doc.meta['parent_headers']}")

We would have expected this output:

Header: header 1, parent headers: []
Header: header 1.1, parent headers: ['header 1']
Header: header 1.2, parent headers: ['header 1']
Header: header 1.2.1, parent headers: ['header 1', 'header 1.2']
Header: header 1.2.2, parent headers: ['header 1', 'header 1.2']

But instead we actually got:

Header: header 1, parent headers: []
Header: header 1.1, parent headers: []
Header: header 1.2, parent headers: ['header 1']
Header: header 1.2.1, parent headers: ['header 1']
Header: header 1.2.2, parent headers: ['header 1', 'header 1.2']

The error happened when a parent header had its own content chunk before the first child header.

This has been fixed so even when a parent header has its own content chunk before the first child header all content is preserved.

Reverts the change that made Agent messages optional as it caused issues with pipeline execution. As a consequence, the LLM component now defaults to an empty messages list unless provided at runtime.