Skip to content

zeeshanejaz/docness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Docness

Document intelligence for messy onboarding, due diligence, and operations workflows.

Docness is an agentic document-understanding platform that turns uploaded documents into structured, auditable workspace knowledge. Users define a goal, upload source material, and chat with an AI agent that reads, extracts, asks follow-up questions, tracks completion, and stores every fact in an append-only session log.

This repository is built as a full-stack product, not a toy prompt wrapper: a React workspace UI, a FastAPI SaaS API, a transport-agnostic agent harness, AWS-style persistence adapters, local cloud emulation, streaming turns, tenant-aware auth, and contract-tested API behavior.

Python FastAPI React TypeScript AWS

Why It Is Cool

  • Goal-first document onboarding. Create a workspace around a real business goal, not a generic chat thread. Docness converts that goal into trackable extraction items and categories.
  • Agentic extraction loop. The harness lets the model decide when to answer, when to call tools, and when to persist facts through store_result.
  • Documents become memory. Uploads are ingested, normalized, extracted, and written as events so every later turn can reason over accumulated workspace context.
  • Auditable by design. Session history is event-sourced through JSONL logs, with DynamoDB-style records acting as read-optimized projections.
  • Streaming UX. The frontend consumes server-sent events for live thinking and assistant deltas during workspace turns.
  • Cloud-shaped, laptop-friendly. The backend targets AWS services like S3, DynamoDB, Cognito, and Bedrock, while local development can run against Floci and mock models.
  • Kernel architecture. The core harness is independent of HTTP, auth, tenancy, and persistence, so it can be tested and reused without the web stack.

Product Flow

  1. Create a workspace with a goal such as onboarding a customer, reviewing a vendor packet, or extracting project requirements.
  2. Upload documents including PDFs, Markdown, text, CSV, XML/draw.io diagrams, images, Word documents, and Excel workbooks.
  3. Docness ingests and extracts facts against the workspace goal, recording item_extracted events with source and confidence.
  4. Chat with the workspace agent to resolve gaps, ask questions, and collect missing information.
  5. Track completion through tri-state goal items: not_started, partial, and complete.
  6. Finish onboarding when the workspace has enough verified information.

Architecture

Docness is organized around one important rule: the harness is the kernel. The model-calling loop lives below the API and UI, while transport, auth, tenancy, and persistence wrap it.

flowchart TB
    subgraph UI[React Frontend]
        Workspaces[Workspace UI]
        Chat[Streaming onboarding chat]
        AuthUI[Cognito auth screens]
    end

    subgraph API[FastAPI Service]
        REST[REST routes]
        SSE[SSE streaming]
        Auth[Cognito middleware]
        Services[Application services]
    end

    subgraph Harness[Agent Harness Kernel]
        Loop[HarnessLoop]
        Context[ContextManager]
        Tools[ToolRegistry]
        Log[SessionLog]
    end

    subgraph Adapters[Adapters]
        Model[Bedrock or MockModel]
        Readers[PDF, text, image, DOCX, XLSX readers]
        StoreResult[store_result tool]
    end

    subgraph Infra[Infrastructure]
        S3[S3 document and event storage]
        DDB[DynamoDB workspace/session state]
        Cognito[Cognito JWTs]
        Bedrock[Amazon Bedrock]
        Floci[Floci local AWS emulator]
    end

    Workspaces --> REST
    Chat --> SSE
    AuthUI --> Auth
    REST --> Auth --> Services
    SSE --> Services
    Services --> Loop
    Loop --> Context
    Loop --> Tools
    Tools --> StoreResult --> Log
    Services --> Log
    Loop --> Model --> Bedrock
    Services --> Readers
    Services --> DDB
    Services --> S3
    Auth --> Cognito
    S3 -. local .-> Floci
    DDB -. local .-> Floci
Loading

Backend Layers

Layer Responsibility Key modules
Transport HTTP, REST routes, SSE, OpenAPI, CORS backend/docness/api/main.py, backend/docness/api/routers/
Auth and tenancy Cognito JWT verification, tenant context, request identity backend/docness/api/middleware/, backend/docness/api/dependencies.py
Services Workspace, message, document ingest, session use cases backend/docness/api/services/
Harness kernel Deterministic model loop, tools, context, hooks, permissions backend/docness/harness/
Adapters Bedrock, DynamoDB, S3, document readers backend/docness/aws/, backend/docness/harness/builtins/

Data Flow

sequenceDiagram
    participant User
    participant UI as React UI
    participant API as FastAPI
    participant Ingest as Document Ingest
    participant Harness as HarnessLoop
    participant Model as Bedrock/Mock Model
    participant Log as SessionLog JSONL
    participant DB as DynamoDB Projection

    User->>UI: Create workspace with goal
    UI->>API: POST /api/v1/workspaces
    API->>DB: Store workspace and goal items

    User->>UI: Upload documents
    UI->>API: POST /workspaces/{id}/documents
    API->>Ingest: Normalize and extract
    Ingest->>Model: Extract facts by goal item
    Ingest->>Log: Append document_ingested and item_extracted events
    Ingest->>DB: Refresh goal statuses

    User->>UI: Ask a question or provide missing data
    UI->>API: GET /workspaces/{id}/stream?text=...
    API->>Harness: Run turn with workspace memory
    Harness->>Model: Decide answer or tool use
    Harness->>Log: Append user, assistant, thinking, and tool events
    API-->>UI: Stream SSE events
Loading

Tech Stack

Area Stack
Frontend React 19, React Router 7, TypeScript, Tailwind CSS, Radix UI, shadcn-style components, Framer Motion
Backend API FastAPI, Pydantic v2, Uvicorn, python-jose
Agent harness Custom synchronous loop with async streaming wrapper, tool registry, permission checks, context compaction
Models Amazon Bedrock adapter plus mock model for local/test runs
Documents PyMuPDF, python-docx, openpyxl, image/base64 normalization, draw.io XML parsing
Persistence DynamoDB-style workspace/session state, S3-style document storage, JSONL event logs
Local cloud Floci via Docker Compose
Quality pytest, pytest-asyncio, moto, schemathesis, ruff, mypy, Biome, TypeScript

Repository Layout

.
├── backend/
│   ├── docness/
│   │   ├── api/              # FastAPI app, routers, schemas, services, middleware
│   │   ├── aws/              # DynamoDB, S3, Bedrock adapters
│   │   ├── config/           # Base instructions and goal templates
│   │   ├── harness/          # Agent loop, tools, context, persistence, builtins
│   │   └── skills/           # Domain skills used by the harness
│   └── tests/                # API, AWS, CLI, harness, and OpenAPI contract tests
├── frontend/
│   └── app/                  # React Router app, auth, workspace UI, API client
├── docs/                     # Architecture, API MVP notes, information-flow diagrams
├── sample/                   # Demo goals and onboarding documents
├── docker-compose.yml        # Floci local AWS emulator
└── Makefile                  # Common dev commands

Quick Start

Prerequisites

  • Python 3.11+
  • uv
  • Bun
  • Docker Desktop or another Docker runtime
  • AWS credentials for real Bedrock runs, or mock/local settings for development

Run the Full Local Stack

make dev

This starts:

  • Floci on localhost:4566
  • The FastAPI backend through uv run --project backend docness-api
  • The React frontend through bun run dev

The frontend defaults to http://localhost:5173; the API defaults to http://localhost:8000.

Run Services Separately

make floci
make backend
make frontend

Useful Backend Environment Variables

The current API settings class reads these names from .env or the process environment:

API_HOST=127.0.0.1
API_PORT=8000
AWS_REGION=us-east-1
USE_LOCALSTACK=true
AWS_ENDPOINT_URL=http://localhost:4566
USE_MOCK_MODEL=true
DATA_DIR=.docness-data
DEV_AUTH_BYPASS=true

For real model calls, set USE_MOCK_MODEL=false and configure the Bedrock model/region credentials.

Frontend API URL

VITE_API_URL=http://localhost:8000

API Highlights

Endpoint Purpose
GET /healthz Process liveness
GET /readyz Backend dependency readiness
POST /api/v1/workspaces Create a goal-oriented workspace
GET /api/v1/workspaces List tenant workspaces
GET /api/v1/workspaces/goal-templates List bundled goal templates
POST /api/v1/workspaces/{workspace_id}/documents Upload and ingest documents
GET /api/v1/workspaces/{workspace_id}/messages Read paginated workspace chat history
POST /api/v1/workspaces/{workspace_id}/messages Run a request/response workspace turn
GET /api/v1/workspaces/{workspace_id}/stream Stream a workspace turn over SSE
POST /api/v1/workspaces/{workspace_id}/onboarding/complete Mark onboarding complete

Legacy /api/v1/sessions routes remain available while workspace flows evolve.

Testing and Quality

Backend checks:

uv run --project backend --extra dev pytest
uv run --project backend --extra dev ruff check backend
uv run --project backend --extra dev mypy backend/docness

Frontend checks:

cd frontend
bun run lint
bun run typecheck

Formatting helpers:

make format
make lint

Design Principles

  • Harness first. HarnessLoop is the only component that calls the model.
  • Sync core, async edges. The loop stays simple and testable; FastAPI wraps it with async boundaries and streaming.
  • Tenant isolation. API calls flow through tenant context and tenant-scoped storage keys.
  • Append-only truth. JSONL event logs preserve user messages, assistant responses, tool activity, and extracted facts.
  • Pluggable adapters. Models, storage, tools, and cloud services can be swapped for local or production implementations.
  • Fail closed. Unknown tools, denied permissions, invalid tenants, and auth failures return explicit errors.

Documentation

Current Status

Docness is an active MVP with a working harness, FastAPI service, React workspace UI, local cloud emulator path, document ingestion, streaming workspace chat, and tests around core backend behavior. The architecture is intentionally ready for production concerns such as tenant isolation, event sourcing, and cloud-native adapters while still being easy to run locally.

About

Docness is an agentic document intelligence platform that turns messy source files into structured, auditable workspace knowledge through goal-driven extraction and conversational onboarding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors