Skip to content

Conversation

@gnodet
Copy link
Contributor

@gnodet gnodet commented Jan 23, 2026

Implements CAMEL-22851.

Overview

This PR implements a native tool-search-tool feature for the camel-langchain4j-tools component that allows LLMs to discover and access tools dynamically without consuming the entire context window with tool definitions.

Problem Statement

When working with LLMs that support function calling, exposing all available tools in every request can:

  • Consume significant context window space, reducing space for actual conversation
  • Overwhelm the LLM with too many options, potentially degrading performance
  • Limit scalability when dealing with hundreds or thousands of tools

Solution

This PR introduces an exposed parameter (default: true) that allows tools to be marked as "searchable" rather than immediately exposed to the LLM. A native toolSearchTool is automatically provided when searchable tools exist, enabling the LLM to discover tools on-demand based on tags.

Key Features

1. Exposed Parameter

  • New exposed boolean parameter on LangChain4jToolsEndpoint (default: true)
  • Tools with exposed=false are added to a searchable registry
  • Maintains backward compatibility - existing code works unchanged

2. Native Tool Search Tool

  • Automatically exposed when searchable tools exist
  • Searches by tags with support for comma-separated lists
  • Returns formatted tool descriptions for LLM consumption
  • Prevents duplicate results when tools have multiple matching tags

3. Proper Resource Management

  • Endpoints only remove their own tools on shutdown (fixes potential memory leak)
  • Separate caches for exposed and searchable tools
  • Clean separation of concerns

Usage Example

// Exposed tool - immediately available to LLM
from("langchain4j-tools:queryById?tags=users&description=Query user by ID&parameter.userId=integer")
    .to("sql:SELECT name FROM users WHERE id = :#userId");

// Searchable tool - discoverable via toolSearchTool
from("langchain4j-tools:queryBySSN?tags=users&description=Query user by SSN&parameter.ssn=string&exposed=false")
    .to("sql:SELECT name FROM users WHERE ssn = :#ssn");

// Another searchable tool with different tags
from("langchain4j-tools:sendEmail?tags=users,email&description=Send email&parameter.email=string&parameter.message=string&exposed=false")
    .to("smtp://mailserver");

Benefits

  • Reduced Context Usage: Only expose commonly used tools initially
  • Scalability: Support hundreds or thousands of tools without overwhelming the LLM
  • Dynamic Discovery: LLM discovers tools as needed based on conversation context
  • Better Organization: Tag-based tool grouping for easier discovery

This commit implements a native tool-search-tool feature for the camel-langchain4j-tools component that allows LLMs to discover and access tools dynamically without consuming the entire context window.

Key changes:
- Added 'exposed' parameter to LangChain4jToolsEndpoint (default: true)
- Tools with exposed=false are added to a searchable registry instead of being immediately exposed to the LLM
- Created ToolSearchTool class that provides search functionality for discovering non-exposed tools
- Modified CamelToolExecutorCache to maintain separate caches for exposed and searchable tools
- Updated LangChain4jToolsProducer to automatically expose the toolSearchTool when searchable tools exist
- Added comprehensive integration tests demonstrating the feature
- Updated component documentation with usage examples and benefits

Benefits:
- Reduced context usage by only exposing commonly used tools initially
- Scalability to support hundreds or thousands of tools without overwhelming the LLM
- Dynamic discovery allowing the LLM to find tools as needed based on conversation
- Better organization through tag-based tool grouping
This commit addresses all Priority 1, 2, and 3 enhancements from the code review:

Priority 1 (Critical):
- Fixed tool search logic to search all searchable tools, not just those filtered by producer tags
- Added duplicate prevention using LinkedHashSet in search results
- Fixed doStop() memory leak by only removing tools registered by the specific endpoint
- Added remove() and removeSearchable() methods to CamelToolExecutorCache for proper cleanup

Priority 2 (Important):
- Added comprehensive unit tests:
  * ToolSearchToolTest - tests search functionality with various tag combinations
  * ToolSearchToolFormatTest - tests output formatting for LLMs
  * CamelToolExecutorCacheTest - tests cache management operations
- Completed Javadoc for isExposed() method
- Added debug logging for tool registration and removal in LangChain4jToolsEndpoint
- Added debug logging and null safety checks in handleToolSearchToolInvocation()
- Enhanced parameter descriptions with examples in createToolSearchToolSpecification()

Priority 3 (Nice to have):
- Enhanced documentation with Best Practices section covering:
  * Tag strategy and naming conventions
  * Performance considerations
  * LLM guidance recommendations
  * Tool description best practices
  * Testing recommendations
- Added Limitations section to documentation
- Improved error messages and null handling

Technical improvements:
- Added static LOG field to LangChain4jToolsEndpoint following Camel conventions
- Improved Javadoc comments with detailed explanations
- Enhanced search algorithm to be more intuitive and user-friendly
- Better separation of concerns between exposed and searchable tools

All tests pass (24/24) and the build succeeds.
@github-actions
Copy link
Contributor

🌟 Thank you for your contribution to the Apache Camel project! 🌟

🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

  • First-time contributors require MANUAL approval for the GitHub Actions to run

  • You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot.

  • You can label PRs using build-all, build-dependents, skip-tests and test-dependents to fine-tune the checks executed by this PR.

  • Build and test logs are available in the Summary page. Only Apache Camel committers have access to the summary.

  • ⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

@Croway
Copy link
Contributor

Croway commented Jan 23, 2026

@gnodet For the integration test could you use https://github.com/apache/camel/blob/main/components/camel-ai/camel-langchain4j-tools/src/test/java/org/apache/camel/component/langchain4j/tools/integration/LangChain4jToolIT.java#L30 ?

And the disable the test on the ci.

with the ollama test-infra extension you can run the test following https://github.com/apache/camel/blob/main/components/camel-ai/camel-langchain4j-tools/test-execution.md

in your case mvn verify -Dollama.endpoint=http://localhost:11434/ -Dollama.model=llama3.1:latest -Dollama.instance.type=remote

From time to time I do execute the langchain4j tests locally using mvn verify -Dollama.endpoint=http://localhost:11434/ -Dollama.model=granite4:3b -Dollama.instance.type=remote

Copy link
Contributor

@orpiske orpiske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Looking forward to having this one on the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants