Skip to content

Latest commit

 

History

History

README.md

DocRAG

Ask your documents. Get cited answers.

Upload .docx files, ask questions in natural language, and get answers with citations that navigate to the exact paragraph, comment, or tracked change in the source document. Powered by SuperDoc.

How it Works

  1. Upload .docx files through the UI
  2. Extract text, comments, and tracked changes using the SuperDoc SDK
  3. Chunk each paragraph with its stable node ID, embed with OpenAI
  4. Store chunks with embeddings in PostgreSQL + pgvector, files in R2
  5. Query — ask a question, relevant chunks are retrieved via vector similarity search
  6. Answer — Claude generates a response with [cite:ID] references
  7. Navigate — click a citation to scroll to the source in the SuperDoc viewer

Architecture

apps/
  api/              Cloudflare Worker — query, documents, file serving
  web/              React frontend — document viewer + chat sidebar
  ingest/   Docker service — automated extraction for VM deployment
packages/
  shared/           SuperDoc extraction, chunking, embedding client
docs/               Sample .docx files

Stack: Cloudflare Workers, PostgreSQL + pgvector, Cloudflare R2, React, SuperDoc, Claude, OpenAI embeddings

Quick Start

Prerequisites

  • Bun v1.1+
  • Wrangler (installed automatically)
  • Cloudflare account (free tier works)
  • PostgreSQL database with pgvector
  • OpenAI API key (for embeddings)
  • Anthropic API key (for Claude)

Setup

bun install

# Create the database schema (run against your Neon database)
psql $DATABASE_URL -f apps/api/schema.sql

# Create Cloudflare R2 bucket
cd apps/api
npx wrangler r2 bucket create rag-demo-docs

# Add secrets for local dev
cat > .dev.vars << EOF
DATABASE_URL=postgresql://...your-connection-string...
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
INGEST_SERVICE_URL=http://localhost:4000
EOF
cd ../..

Run

# Terminal 1: API (Cloudflare Worker via wrangler dev)
bun run dev:api

# Terminal 2: Frontend (Vite)
bun run dev:web

Open http://localhost:3000

Sample Questions

Try these across the sample documents:

  • "What are the biggest risks to the beta launch?"
  • "Can healthcare companies join the beta?"
  • "How much will the NLQ feature cost to run?"
  • "What do customers think about anomaly detection?"
  • "What was decided about pricing?"

Deploy

Worker + Frontend

# Deploy API Worker
cd apps/api
wrangler secret put DATABASE_URL
wrangler secret put OPENAI_API_KEY
wrangler secret put ANTHROPIC_API_KEY
wrangler deploy

# Deploy frontend to Cloudflare Pages
cd ../..
bun run deploy:web

Ingest Service (VM)

For automated ingestion, deploy the Docker service to a VM:

docker build -f apps/ingest/Dockerfile -t docrag-ingest .
docker run -d \
  -e API_URL=https://docrag-api.<account>.workers.dev \
  -e OPENAI_API_KEY=sk-... \
  -e AUTH_TOKEN=your-shared-secret \
  -p 4000:4000 \
  docrag-ingest

License

MIT


Built by SuperDoc