Arti Assistant is an AI-powered educational agent designed to help educators access and utilize a knowledge base of documents on sexual consent, sexual violence, and digital safety. It leverages a Retrieval-Augmented Generation (RAG) system with Gemini function calling and Supabase for its backend.
Arti Assistant is built as a pnpm monorepo, encompassing a Next.js frontend, Supabase Edge Functions for backend logic, and a data preparation package.
- Frontend (
apps/arti-assistant): A Next.js application that provides a chat interface for educators to interact with the AI agent. - Backend (Supabase Edge Functions):
search-knowledge-base: Performs a vector search against the Supabasechunkstable to retrieve relevant document passages based on a user's query.generate-learning-objectives: (Planned) Utilizes the Gemini API to create learning objectives from retrieved content.save-content: (Planned) Allows educators to save generated or custom content to asaved_contenttable.
- Data Preparation (
packages/data-prep): A Node.js script responsible for chunking source.txtdocuments, generating embeddings using the Gemini API, and storing them in the Supabasechunkstable. - Database (Supabase + pgvector): PostgreSQL with the
pgvectorextension is used to store document chunks and their high-dimensional embeddings, enabling efficient semantic search.
For a detailed explanation of the monorepo and pnpm decisions, please refer to approach.md.
- Interactive chat interface for querying educational documents.
- Retrieval of relevant content chunks based on semantic similarity.
- Source attribution for retrieved information.
- (Planned) Generation of learning objectives from knowledge base content.
- (Planned) Ability to save educator-created content.
Follow these steps to get your Arti Assistant project up and running locally and deployed to Supabase.
You need the following API keys and URLs. Create a file named .env.local inside the apps/arti-assistant/ directory and populate it as follows (refer to env.local.example):
# Supabase
NEXT_PUBLIC_SUPABASE_URL="YOUR_SUPABASE_PROJECT_URL"
NEXT_PUBLIC_SUPABASE_ANON_KEY="YOUR_SUPABASE_ANON_KEY"
APP_SERVICE_ROLE_KEY="YOUR_SECRET_KEY"
# Gemini
GEMINI_API_KEY="YOUR_GEMINI_API_KEY"
- Supabase Project URL & Anon Key: Found in your Supabase project settings > API.
- Secret Key (
APP_SERVICE_ROLE_KEY): Found in your Supabase project settings > API. Use the newsb_secret_...key. - Gemini API Key: Obtainable from Google AI Studio.
- Create Supabase Project: Log in to your Supabase account and create a new project.
- Enable
pgvectorExtension: Navigate to the SQL Editor in your Supabase dashboard and run:CREATE EXTENSION IF NOT EXISTS vector;
- Create
chunksTable: In the SQL Editor, run the following to create the table for your document chunks:CREATE TABLE chunks ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), document_name TEXT NOT NULL, section TEXT, chunk_text TEXT NOT NULL, embedding vector(768) NOT NULL, metadata JSONB, created_at TIMESTAMP WITH TIME ZONE DEFAULT now() ); CREATE INDEX on chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
- Create
match_chunksFunction: This PostgreSQL function is crucial for efficient vector search. Run this in your SQL Editor:CREATE OR REPLACE FUNCTION match_chunks ( query_embedding vector(768), match_threshold float, match_count int ) RETURNS TABLE ( id UUID, document_name TEXT, section TEXT, chunk_text TEXT, metadata JSONB, similarity float ) LANGUAGE plpgsql AS $$ BEGIN RETURN QUERY SELECT chunks.id, chunks.document_name, chunks.section, chunks.chunk_text, chunks.metadata, 1 - (chunks.embedding <=> query_embedding) AS similarity FROM chunks WHERE 1 - (chunks.embedding <=> query_embedding) > match_threshold ORDER BY similarity DESC LIMIT match_count; END; $$;
- Create
saved_contentTable: For thesave-contentfunction, create this table in your SQL Editor:CREATE TABLE saved_content ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), title TEXT NOT NULL, content TEXT NOT NULL, metadata JSONB, created_at TIMESTAMP WITH TIME ZONE DEFAULT now() );
From the project root:
pnpm install- Place Documents: Ensure your 14 educational
.txtdocuments are placed in thepackages/data-prep/knowledge-feed/directory. - Run Seeding Script: Execute the data preparation script to populate your Supabase
chunkstable:This process will chunk your documents, generate embeddings, and insert them into Supabase. It might take some time.pnpm db:seed
You need the Supabase CLI installed and configured. From the project root:
- Login to Supabase CLI:
supabase login
- Link project:
(You can find your project ID in your Supabase project settings > General > Project ID).
supabase link --project-ref your-supabase-project-id
- Set Environment Variables for Functions:
Before deploying, you must upload your secrets so the Edge Functions can access them.
supabase secrets set --env-file apps/arti-assistant/.env.local - Deploy Functions:
Now, deploy each function one by one.
supabase functions deploy search-knowledge-base --no-verify-jwt supabase functions deploy generate-learning-objectives --no-verify-jwt supabase functions deploy save-content --no-verify-jwt
From the project root:
pnpm devThis will start the Next.js development server. Open http://localhost:3000 (or the port indicated in your console) in your browser to interact with the chat interface.
This project has a security policy that can be found in the security/SECURITY.md file. This document outlines how to report vulnerabilities and provides general security best practices for the project.
- Implement the logic within
generate-learning-objectivesandsave-contentfunctions to fully utilize their capabilities. - Integrate Gemini's generative capabilities into the frontend to provide more conversational and contextually rich responses using the retrieved chunks.
- Enhance the UI for source display, potentially linking to specific sections or documents.
- Add user authentication (e.g., Supabase Auth).
- Improve error handling and user feedback.
- Implement Gemini function calling in the main API handler to orchestrate calls to different Supabase Edge Functions based on user intent.