Document intelligence for messy onboarding, due diligence, and operations workflows.
Docness is an agentic document-understanding platform that turns uploaded documents into structured, auditable workspace knowledge. Users define a goal, upload source material, and chat with an AI agent that reads, extracts, asks follow-up questions, tracks completion, and stores every fact in an append-only session log.
This repository is built as a full-stack product, not a toy prompt wrapper: a React workspace UI, a FastAPI SaaS API, a transport-agnostic agent harness, AWS-style persistence adapters, local cloud emulation, streaming turns, tenant-aware auth, and contract-tested API behavior.
- Goal-first document onboarding. Create a workspace around a real business goal, not a generic chat thread. Docness converts that goal into trackable extraction items and categories.
- Agentic extraction loop. The harness lets the model decide when to answer, when to call tools, and when to persist facts through
store_result. - Documents become memory. Uploads are ingested, normalized, extracted, and written as events so every later turn can reason over accumulated workspace context.
- Auditable by design. Session history is event-sourced through JSONL logs, with DynamoDB-style records acting as read-optimized projections.
- Streaming UX. The frontend consumes server-sent events for live thinking and assistant deltas during workspace turns.
- Cloud-shaped, laptop-friendly. The backend targets AWS services like S3, DynamoDB, Cognito, and Bedrock, while local development can run against Floci and mock models.
- Kernel architecture. The core harness is independent of HTTP, auth, tenancy, and persistence, so it can be tested and reused without the web stack.
- Create a workspace with a goal such as onboarding a customer, reviewing a vendor packet, or extracting project requirements.
- Upload documents including PDFs, Markdown, text, CSV, XML/draw.io diagrams, images, Word documents, and Excel workbooks.
- Docness ingests and extracts facts against the workspace goal, recording
item_extractedevents with source and confidence. - Chat with the workspace agent to resolve gaps, ask questions, and collect missing information.
- Track completion through tri-state goal items:
not_started,partial, andcomplete. - Finish onboarding when the workspace has enough verified information.
Docness is organized around one important rule: the harness is the kernel. The model-calling loop lives below the API and UI, while transport, auth, tenancy, and persistence wrap it.
flowchart TB
subgraph UI[React Frontend]
Workspaces[Workspace UI]
Chat[Streaming onboarding chat]
AuthUI[Cognito auth screens]
end
subgraph API[FastAPI Service]
REST[REST routes]
SSE[SSE streaming]
Auth[Cognito middleware]
Services[Application services]
end
subgraph Harness[Agent Harness Kernel]
Loop[HarnessLoop]
Context[ContextManager]
Tools[ToolRegistry]
Log[SessionLog]
end
subgraph Adapters[Adapters]
Model[Bedrock or MockModel]
Readers[PDF, text, image, DOCX, XLSX readers]
StoreResult[store_result tool]
end
subgraph Infra[Infrastructure]
S3[S3 document and event storage]
DDB[DynamoDB workspace/session state]
Cognito[Cognito JWTs]
Bedrock[Amazon Bedrock]
Floci[Floci local AWS emulator]
end
Workspaces --> REST
Chat --> SSE
AuthUI --> Auth
REST --> Auth --> Services
SSE --> Services
Services --> Loop
Loop --> Context
Loop --> Tools
Tools --> StoreResult --> Log
Services --> Log
Loop --> Model --> Bedrock
Services --> Readers
Services --> DDB
Services --> S3
Auth --> Cognito
S3 -. local .-> Floci
DDB -. local .-> Floci
| Layer | Responsibility | Key modules |
|---|---|---|
| Transport | HTTP, REST routes, SSE, OpenAPI, CORS | backend/docness/api/main.py, backend/docness/api/routers/ |
| Auth and tenancy | Cognito JWT verification, tenant context, request identity | backend/docness/api/middleware/, backend/docness/api/dependencies.py |
| Services | Workspace, message, document ingest, session use cases | backend/docness/api/services/ |
| Harness kernel | Deterministic model loop, tools, context, hooks, permissions | backend/docness/harness/ |
| Adapters | Bedrock, DynamoDB, S3, document readers | backend/docness/aws/, backend/docness/harness/builtins/ |
sequenceDiagram
participant User
participant UI as React UI
participant API as FastAPI
participant Ingest as Document Ingest
participant Harness as HarnessLoop
participant Model as Bedrock/Mock Model
participant Log as SessionLog JSONL
participant DB as DynamoDB Projection
User->>UI: Create workspace with goal
UI->>API: POST /api/v1/workspaces
API->>DB: Store workspace and goal items
User->>UI: Upload documents
UI->>API: POST /workspaces/{id}/documents
API->>Ingest: Normalize and extract
Ingest->>Model: Extract facts by goal item
Ingest->>Log: Append document_ingested and item_extracted events
Ingest->>DB: Refresh goal statuses
User->>UI: Ask a question or provide missing data
UI->>API: GET /workspaces/{id}/stream?text=...
API->>Harness: Run turn with workspace memory
Harness->>Model: Decide answer or tool use
Harness->>Log: Append user, assistant, thinking, and tool events
API-->>UI: Stream SSE events
| Area | Stack |
|---|---|
| Frontend | React 19, React Router 7, TypeScript, Tailwind CSS, Radix UI, shadcn-style components, Framer Motion |
| Backend API | FastAPI, Pydantic v2, Uvicorn, python-jose |
| Agent harness | Custom synchronous loop with async streaming wrapper, tool registry, permission checks, context compaction |
| Models | Amazon Bedrock adapter plus mock model for local/test runs |
| Documents | PyMuPDF, python-docx, openpyxl, image/base64 normalization, draw.io XML parsing |
| Persistence | DynamoDB-style workspace/session state, S3-style document storage, JSONL event logs |
| Local cloud | Floci via Docker Compose |
| Quality | pytest, pytest-asyncio, moto, schemathesis, ruff, mypy, Biome, TypeScript |
.
├── backend/
│ ├── docness/
│ │ ├── api/ # FastAPI app, routers, schemas, services, middleware
│ │ ├── aws/ # DynamoDB, S3, Bedrock adapters
│ │ ├── config/ # Base instructions and goal templates
│ │ ├── harness/ # Agent loop, tools, context, persistence, builtins
│ │ └── skills/ # Domain skills used by the harness
│ └── tests/ # API, AWS, CLI, harness, and OpenAPI contract tests
├── frontend/
│ └── app/ # React Router app, auth, workspace UI, API client
├── docs/ # Architecture, API MVP notes, information-flow diagrams
├── sample/ # Demo goals and onboarding documents
├── docker-compose.yml # Floci local AWS emulator
└── Makefile # Common dev commands
- Python 3.11+
uv- Bun
- Docker Desktop or another Docker runtime
- AWS credentials for real Bedrock runs, or mock/local settings for development
make devThis starts:
- Floci on
localhost:4566 - The FastAPI backend through
uv run --project backend docness-api - The React frontend through
bun run dev
The frontend defaults to http://localhost:5173; the API defaults to http://localhost:8000.
make floci
make backend
make frontendThe current API settings class reads these names from .env or the process environment:
API_HOST=127.0.0.1
API_PORT=8000
AWS_REGION=us-east-1
USE_LOCALSTACK=true
AWS_ENDPOINT_URL=http://localhost:4566
USE_MOCK_MODEL=true
DATA_DIR=.docness-data
DEV_AUTH_BYPASS=trueFor real model calls, set USE_MOCK_MODEL=false and configure the Bedrock model/region credentials.
VITE_API_URL=http://localhost:8000| Endpoint | Purpose |
|---|---|
GET /healthz |
Process liveness |
GET /readyz |
Backend dependency readiness |
POST /api/v1/workspaces |
Create a goal-oriented workspace |
GET /api/v1/workspaces |
List tenant workspaces |
GET /api/v1/workspaces/goal-templates |
List bundled goal templates |
POST /api/v1/workspaces/{workspace_id}/documents |
Upload and ingest documents |
GET /api/v1/workspaces/{workspace_id}/messages |
Read paginated workspace chat history |
POST /api/v1/workspaces/{workspace_id}/messages |
Run a request/response workspace turn |
GET /api/v1/workspaces/{workspace_id}/stream |
Stream a workspace turn over SSE |
POST /api/v1/workspaces/{workspace_id}/onboarding/complete |
Mark onboarding complete |
Legacy /api/v1/sessions routes remain available while workspace flows evolve.
Backend checks:
uv run --project backend --extra dev pytest
uv run --project backend --extra dev ruff check backend
uv run --project backend --extra dev mypy backend/docnessFrontend checks:
cd frontend
bun run lint
bun run typecheckFormatting helpers:
make format
make lint- Harness first.
HarnessLoopis the only component that calls the model. - Sync core, async edges. The loop stays simple and testable; FastAPI wraps it with async boundaries and streaming.
- Tenant isolation. API calls flow through tenant context and tenant-scoped storage keys.
- Append-only truth. JSONL event logs preserve user messages, assistant responses, tool activity, and extracted facts.
- Pluggable adapters. Models, storage, tools, and cloud services can be swapped for local or production implementations.
- Fail closed. Unknown tools, denied permissions, invalid tenants, and auth failures return explicit errors.
Docness is an active MVP with a working harness, FastAPI service, React workspace UI, local cloud emulator path, document ingestion, streaming workspace chat, and tests around core backend behavior. The architecture is intentionally ready for production concerns such as tenant isolation, event sourcing, and cloud-native adapters while still being easy to run locally.