Semantic code intelligence for any codebase — parse, embed, search, inject.
Code search tools force a tradeoff. Grep finds exact strings. Language servers require a running daemon and IDE integration. Cloud AI search sends your code to a third party. None of those work well inside a Rust library or a local tool.
semtree uses tree-sitter to parse your codebase into structured chunks (functions, structs, methods), embeds them locally via fastembed, and stores them in an HNSW vector index — all on-device, no API key required, no daemon.
$ semtree index ./my-project
⠿ [========================================] 87/87 files (3s)
Done (incremental). Indexed 312 chunks → .semtree/
$ semtree search "how is authentication handled"
1. [Function] validate_token (score: 0.921)
src/auth/jwt.rs:14
pub fn validate_token(token: &str, secret: &[u8]) -> Result<Claims> {
2. [Function] middleware (score: 0.887)
src/auth/middleware.rs:28
pub async fn middleware(req: Request, next: Next) -> Response {
$ semtree stats
=== Index: .semtree ===
Chunks : 312
Files : 87
Size : 1.8 MB
By language:
rust: 200 (64%)
typescript: 80 (26%)
go: 32 (10%)
No daemon. No Python. Embeddings run on CPU via ONNX, cached after first use. Supports OpenAI and Ollama as drop-in embedding backends when you need higher quality.
CLI:
cargo install semtree-cliLibrary:
[dependencies]
semtree-rag = "0.1"
semtree-embed = "0.1"
semtree-store = "0.1"semtree init # create .semtree.toml
semtree index ./my-project # index (incremental by default)
semtree index ./my-project --full # force full re-scan
semtree search "error handling strategy" -k 5 # semantic search
semtree context "authentication flow" # RAG context block for LLMs
semtree stats # chunks, languages, index size
semtree analyze # complexity metrics, largest functionsAll commands accept --config <path> to point to a custom .semtree.toml.
Re-running semtree index only processes files whose content has changed. A manifest (manifest.json) is stored alongside the index to track per-file hashes. Pass --full to force a complete re-scan.
semtree init creates a .semtree.toml in the current directory:
[embed]
backend = "fastembed" # fastembed | openai | ollama
# model = "text-embedding-3-small"
# url = "http://localhost:11434" # ollama only
# api_key = "sk-..." # or set OPENAI_API_KEY
[store]
backend = "usearch" # usearch | qdrant
# url = "http://localhost:6333"
# collection = "semtree"
index_dir = ".semtree"| Backend | Default model | Notes |
|---|---|---|
fastembed (default) |
AllMiniLML6V2 (384-dim) |
On-device, no key needed |
openai |
text-embedding-3-small |
Set OPENAI_API_KEY or embed.api_key |
ollama |
nomic-embed-text |
Requires local Ollama server |
| Backend | Notes |
|---|---|
usearch (default) |
In-process HNSW, saved to disk |
qdrant |
Remote Qdrant server — set QDRANT_URL or store.url |
use std::sync::Arc;
use semtree_embed::fastembed::FastEmbedder;
use semtree_store::usearch::UsearchStore;
use semtree_rag::{ChunkRegistry, FileManifest, Indexer, SearchEngine};
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let embedder = Arc::new(FastEmbedder::new()?);
let store = Arc::new(UsearchStore::new(384)?);
let indexer = Indexer::new(embedder.clone(), store.clone());
let mut registry = ChunkRegistry::default();
let mut manifest = FileManifest::default();
let n = indexer
.index_dir("./src".as_ref(), &mut registry, Some(&mut manifest), |done, total| {
eprint!("\r{done}/{total}");
})
.await?;
println!("\nIndexed {n} chunks");
let engine = SearchEngine::new(embedder, store);
let hits = engine.search("error handling", 5).await?;
for hit in &hits {
if let Some(chunk) = registry.get(&hit.id) {
println!("{} — {}:{} (score: {:.3})",
chunk.name.as_deref().unwrap_or("?"),
chunk.path.display(),
chunk.span.start_line + 1,
hit.score);
}
}
Ok(())
}Each crate is independently published to crates.io — use only what you need.
semtree-core # shared types: Language, Span, Chunk, ChunkKind
semtree-parse # tree-sitter parsing + chunk extraction
semtree-embed # Embedder trait + fastembed / OpenAI / Ollama backends
semtree-store # VectorStore trait + usearch / Qdrant backends
semtree-rag # index, search, LLM context, incremental manifest
semtree-analyze # complexity metrics, large-function detection
semtree-cli # CLI binary (semtree)
| Language | Parse | Extract |
|---|---|---|
| Rust | ✅ | ✅ functions, structs, enums, traits, impls, modules |
| Python | ✅ | ✅ functions, classes, decorators |
| TypeScript | ✅ | ✅ functions, classes, interfaces, enums, type aliases, exports |
| JavaScript | ✅ | ✅ functions, classes, generators, exports |
| Go | ✅ | ✅ functions, methods, structs, interfaces |
Plain text files (.md, .json, .toml, .yaml, …) are chunked into overlapping 40-line windows.
Custom embedder:
use async_trait::async_trait;
use semtree_embed::{Embedder, Embedding, EmbedError};
struct MyEmbedder;
#[async_trait]
impl Embedder for MyEmbedder {
async fn embed(&self, texts: &[&str]) -> Result<Vec<Embedding>, EmbedError> {
todo!() // call your API or local model
}
}Custom vector store:
use semtree_store::{VectorStore, Hit, StoreError};
use semtree_embed::Embedding;
struct MyStore;
#[async_trait]
impl VectorStore for MyStore {
async fn insert(&self, id: &str, emb: &Embedding) -> Result<(), StoreError> { todo!() }
async fn search(&self, query: &Embedding, top_k: usize) -> Result<Vec<Hit>, StoreError> { todo!() }
async fn delete(&self, id: &str) -> Result<(), StoreError> { todo!() }
fn save(&self, _path: &std::path::Path) -> Result<(), StoreError> { Ok(()) }
fn load(&mut self, _path: &std::path::Path) -> Result<(), StoreError> { Ok(()) }
fn len(&self) -> usize { 0 }
}MIT — see LICENSE