REST API to index and query documents in a vector database: k-Nearest Neighbor (kNN) search over embeddings. The API is containerized with Docker (or Podman).
Develop a REST API that allows users to index and query their documents within a Vector Database. The API is containerized in a Docker container.
- Chunk: A piece of text with an associated embedding and metadata.
- Document: Made of multiple chunks and metadata.
- Library: Made of a list of documents and metadata.
- CRUD libraries — create, read, update, delete.
- CRUD documents and chunks within a library.
- Index the contents of a library.
- k-Nearest Neighbor vector search over the selected library with a given embedding query.
flowchart LR
subgraph Client
A[HTTP Client]
end
subgraph API
B[FastAPI routes]
end
subgraph Services
C[LibraryService]
D[DocumentService]
E[ChunkService]
F[SearchService]
G[IndexService]
end
subgraph Data
H[(Repositories)]
I[(Index Registry)]
end
A --> B
B --> C
B --> D
B --> E
B --> F
B --> G
C --> H
D --> H
E --> H
F --> H
F --> I
G --> H
G --> I
- Request → API (routes) → Services (business logic) → Repositories (in-memory store) and Index Registry (kNN indexes).
- Reads use a shared read lock; writes use an exclusive write lock so there are no data races.
- Pydantic models with a fixed schema: no user-defined metadata fields.
- Chunk:
id,text,embedding(list of floats),created_at,name(optional),document_id. - Document:
id,name,created_at,chunk_ids,library_id. - Library:
id,name,created_at,document_ids. - All IDs are UUIDs generated by the application.
- Brute-force: Linear scan; compare query to every vector. Build: O(n·d), Query: O(n·d), Space: O(n·d). Baseline, exact k-NN.
- KD-Tree: Tree over vectors (median split). Build: O(n log n · d), Query: O(log n) typical, Space: O(n·d). Exact k-NN; good for moderate dimensions.
- IVF: K-means clusters; search in nearest cluster(s). Build: O(n·d·C·I), Query: O(C·d + m·d), Space: O(n·d). Approximate k-NN for larger n.
Only numpy is used for math (norms, etc.). No chroma-db, pinecone, FAISS, etc.
- Reader-writer lock: Reads (get, list, search) hold a shared read lock; writes (create, update, delete, build index) hold an exclusive write lock.
- Design: one lock around repositories and index registry; simple and correct for a single-process, in-memory API.
- Services implement the logic: Library, Document, Chunk, Search, Index.
- Repositories do CRUD only (in-memory); Services orchestrate them and keep relationships consistent (e.g.
document_idson Library,chunk_idson Document, cascade deletes).
- FastAPI routes call Services; routes are thin (parse request, call service, map to HTTP status).
- Status codes: 200, 201, 204, 404, 409, 422 via
fastapi.status(no hardcoded numbers). - REST:
POST/GET/PUT/DELETEfor libraries, documents, chunks;POST /libraries/{id}/indexto build index;POST /libraries/{id}/searchwith body{"embedding": [...], "k": N}for k-NN search.
- Dockerfile (multi-stage): build with
uv, runtime with uvicorn. Works with Docker and Podman. - docker-compose (or podman-compose): one command to run the API (and optional UI). No need to run anything on the host except the containers.
Prerequisites: Docker or Podman, and (optional) uv for local dev/tests.
# From project root
docker compose up
# or: podman compose up- API: http://localhost:8000
- Interactive docs: http://localhost:8000/docs
Optional: copy env.example to .env and set COHERE_API_KEY if you use Cohere for embeddings. The assignment suggested a Cohere API key for creating embeddings for tests; manually created chunks suffice to test the system.
Tests (in container):
docker build --target test -t vector-db-api-test .
docker run --rm vector-db-api-test
# or with podman- No chroma-db, pinecone, FAISS, or similar; indexing algorithms are implemented with numpy only.
- No document processing pipeline required; manually created chunks are enough to test the system.
- API backend: Python + FastAPI + Pydantic
- Dependency management: uv (
pyproject.toml,uv sync) - Containers: Dockerfile + docker-compose (Podman-compatible)
The following were not required by the task; they are implemented and documented below with Mermaid diagrams.
Embeddings are resolved by URI. The registry chooses the backend; invalid or missing HF_TOKEN does not break the flow (fallback to unauthenticated Hub access).
flowchart LR
subgraph Input
T[Text chunks]
Q[Query text]
Im[Images]
end
subgraph Registry["get_embedder(uri)"]
R[Embedder Registry]
end
subgraph Backends
C[Cohere API\ncohere://]
S[Sentence Transformers\nembedding_transformer://...]
I[CLIP / ViT\nembedding_image://...]
end
subgraph Output
V[Vectors]
end
T --> R
Q --> R
Im --> R
R --> C
R --> S
R --> I
C --> V
S --> V
I --> V
- cohere:// — Remote API; requires
COHERE_API_KEY. Used for text (indexing and search). - embedding_transformer://MODEL — Sentence Transformers (e.g.
all-MiniLM-L6-v2); runs in-process. Text only. OptionalHF_TOKENfor Hub rate limits. - embedding_image://MODEL — CLIP/ViT for image bytes; text embedders do not support images.
The React UI runs in its own container (port 80 → 3000). The browser loads the UI and then sends requests to the API (port 8000). CORS is enabled on the API so the browser allows cross-origin requests.
flowchart TB
subgraph Browser
UI[React UI\nlocalhost:3000]
end
subgraph API_Server["API (localhost:8000)"]
CORS[CORS middleware]
Lib[POST/GET /libraries]
Ingest[POST /libraries/.../ingest-pdf]
Index[POST /libraries/.../index]
Search[POST /libraries/.../search/by-query]
end
UI -->|fetch, JSON/form-data| CORS
CORS --> Lib
CORS --> Ingest
CORS --> Index
CORS --> Search
- UI (React, nginx in container): lists libraries, opens library detail, uploads PDF, builds index, runs search by text.
- API (FastAPI): serves REST endpoints;
allow_origins=["*"]so any origin (e.g.http://localhost:3000) can call the API. - VITE_API_URL is set at UI build time so the frontend knows the API base URL (e.g.
http://localhost:8000).
The client sends text; the server embeds it with the chosen embedder, runs k-NN, then enriches results with chunk text and name for display.
sequenceDiagram
participant Client
participant API
participant Registry
participant Embedder
participant Search
participant ChunkRepo
Client->>API: POST /libraries/{id}/search/by-query<br/>{ query, k, embedder }
API->>Registry: get_embedder(embedder)
Registry-->>API: embedder instance
API->>Embedder: embed_queries([query])
Embedder-->>API: query_embedding
API->>Search: search(library_id, query_embedding, k)
Search-->>API: [(chunk_id, distance), ...]
loop For each result
API->>ChunkRepo: get(chunk_id)
ChunkRepo-->>API: chunk (text, name)
end
API-->>Client: { results: [{ chunk_id, distance, text, name }, ...] }
- Same embedder (and dimension) as at indexing time must be used.
- Chunk text and name are attached so the UI can show the matching snippet without extra calls.
Three k-NN indexes are implemented (numpy only). Each library has one index; build via POST /libraries/{id}/index with algorithm: brute_force, kd_tree, or ivf.
flowchart TB
subgraph Build["Build index"]
V[(chunk_id, embedding)]
V --> BF
V --> KD
V --> IVF
BF["Brute-force<br/>Store all vectors"]
KD["KD-Tree<br/>Median split, tree"]
IVF["IVF<br/>K-means clusters"]
end
subgraph Query["Query: k-NN"]
Q["query_embedding + k"]
Q --> BFQ["Brute: scan all<br/>O(n·d)"]
Q --> KDQ["KD-Tree: traverse tree<br/>O(log n) typical"]
Q --> IVFQ["IVF: nearest clusters<br/>O(C·d + m·d) approx"]
BFQ --> R[(chunk_id, distance)]
KDQ --> R
IVFQ --> R
end
BF --> BFQ
KD --> KDQ
IVF --> IVFQ
| Algorithm | Build | Query | Space | Type |
|---|---|---|---|---|
| Brute-force | O(n·d) | O(n·d) | O(n·d) | Exact |
| KD-Tree | O(n log n·d) | O(log n) typ. | O(n·d) | Exact |
| IVF | O(n·d·C·I) | O(C·d + m·d) | O(n·d) | Approx. |
n = vectors, d = dimension, C = centroids, I = k-means iterations, m = points in probed clusters.
- PDF ingest:
POST /libraries/{id}/ingest-pdf— upload PDF; server extracts text (pypdf), chunks, embeds (Cohere or Sentence Transformers), creates one document and its chunks. - Scripts:
smoke_test.py,index_file.py,index_pdf.py,search_query.py,inspect_library.pyfor CLI testing (seescripts/README.md).