Skip to content

Pinkish-Warrior/arti-assistant

Repository files navigation

Arti Assistant AI Agent

Project Overview

Arti Assistant is an AI-powered educational agent designed to help educators access and utilize a knowledge base of documents on sexual consent, sexual violence, and digital safety. It leverages a Retrieval-Augmented Generation (RAG) system with Gemini function calling and Supabase for its backend.

Architecture Overview

Arti Assistant is built as a pnpm monorepo, encompassing a Next.js frontend, Supabase Edge Functions for backend logic, and a data preparation package.

  • Frontend (apps/arti-assistant): A Next.js application that provides a chat interface for educators to interact with the AI agent.
  • Backend (Supabase Edge Functions):
    • search-knowledge-base: Performs a vector search against the Supabase chunks table to retrieve relevant document passages based on a user's query.
    • generate-learning-objectives: (Planned) Utilizes the Gemini API to create learning objectives from retrieved content.
    • save-content: (Planned) Allows educators to save generated or custom content to a saved_content table.
  • Data Preparation (packages/data-prep): A Node.js script responsible for chunking source .txt documents, generating embeddings using the Gemini API, and storing them in the Supabase chunks table.
  • Database (Supabase + pgvector): PostgreSQL with the pgvector extension is used to store document chunks and their high-dimensional embeddings, enabling efficient semantic search.

For a detailed explanation of the monorepo and pnpm decisions, please refer to approach.md.

Features

  • Interactive chat interface for querying educational documents.
  • Retrieval of relevant content chunks based on semantic similarity.
  • Source attribution for retrieved information.
  • (Planned) Generation of learning objectives from knowledge base content.
  • (Planned) Ability to save educator-created content.

Setup Checklist

Follow these steps to get your Arti Assistant project up and running locally and deployed to Supabase.

1. Keys & Credentials

You need the following API keys and URLs. Create a file named .env.local inside the apps/arti-assistant/ directory and populate it as follows (refer to env.local.example):

# Supabase
NEXT_PUBLIC_SUPABASE_URL="YOUR_SUPABASE_PROJECT_URL"
NEXT_PUBLIC_SUPABASE_ANON_KEY="YOUR_SUPABASE_ANON_KEY"
APP_SERVICE_ROLE_KEY="YOUR_SECRET_KEY"

# Gemini
GEMINI_API_KEY="YOUR_GEMINI_API_KEY"
  • Supabase Project URL & Anon Key: Found in your Supabase project settings > API.
  • Secret Key (APP_SERVICE_ROLE_KEY): Found in your Supabase project settings > API. Use the new sb_secret_... key.
  • Gemini API Key: Obtainable from Google AI Studio.

2. Supabase Database Setup

  1. Create Supabase Project: Log in to your Supabase account and create a new project.
  2. Enable pgvector Extension: Navigate to the SQL Editor in your Supabase dashboard and run:
    CREATE EXTENSION IF NOT EXISTS vector;
  3. Create chunks Table: In the SQL Editor, run the following to create the table for your document chunks:
    CREATE TABLE chunks (
      id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
      document_name TEXT NOT NULL,
      section TEXT,
      chunk_text TEXT NOT NULL,
      embedding vector(768) NOT NULL,
      metadata JSONB,
      created_at TIMESTAMP WITH TIME ZONE DEFAULT now()
    );
    
    CREATE INDEX on chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
  4. Create match_chunks Function: This PostgreSQL function is crucial for efficient vector search. Run this in your SQL Editor:
    CREATE OR REPLACE FUNCTION match_chunks (
      query_embedding vector(768),
      match_threshold float,
      match_count int
    )
    RETURNS TABLE (
      id UUID,
      document_name TEXT,
      section TEXT,
      chunk_text TEXT,
      metadata JSONB,
      similarity float
    )
    LANGUAGE plpgsql
    AS $$
    BEGIN
      RETURN QUERY
      SELECT
        chunks.id,
        chunks.document_name,
        chunks.section,
        chunks.chunk_text,
        chunks.metadata,
        1 - (chunks.embedding <=> query_embedding) AS similarity
      FROM chunks
      WHERE 1 - (chunks.embedding <=> query_embedding) > match_threshold
      ORDER BY similarity DESC
      LIMIT match_count;
    END;
    $$;
  5. Create saved_content Table: For the save-content function, create this table in your SQL Editor:
    CREATE TABLE saved_content (
      id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
      title TEXT NOT NULL,
      content TEXT NOT NULL,
      metadata JSONB,
      created_at TIMESTAMP WITH TIME ZONE DEFAULT now()
    );

3. Install Dependencies

From the project root:

pnpm install

4. Prepare Document Data

  1. Place Documents: Ensure your 14 educational .txt documents are placed in the packages/data-prep/knowledge-feed/ directory.
  2. Run Seeding Script: Execute the data preparation script to populate your Supabase chunks table:
    pnpm db:seed
    This process will chunk your documents, generate embeddings, and insert them into Supabase. It might take some time.

5. Deploy Supabase Edge Functions

You need the Supabase CLI installed and configured. From the project root:

  1. Login to Supabase CLI:
    supabase login
  2. Link project:
    supabase link --project-ref your-supabase-project-id
    (You can find your project ID in your Supabase project settings > General > Project ID).
  3. Set Environment Variables for Functions: Before deploying, you must upload your secrets so the Edge Functions can access them.
    supabase secrets set --env-file apps/arti-assistant/.env.local
  4. Deploy Functions: Now, deploy each function one by one.
    supabase functions deploy search-knowledge-base --no-verify-jwt
    supabase functions deploy generate-learning-objectives --no-verify-jwt
    supabase functions deploy save-content --no-verify-jwt

6. Run the Frontend Application

From the project root:

pnpm dev

This will start the Next.js development server. Open http://localhost:3000 (or the port indicated in your console) in your browser to interact with the chat interface.

Security

This project has a security policy that can be found in the security/SECURITY.md file. This document outlines how to report vulnerabilities and provides general security best practices for the project.

Next Steps / Future Improvements

  • Implement the logic within generate-learning-objectives and save-content functions to fully utilize their capabilities.
  • Integrate Gemini's generative capabilities into the frontend to provide more conversational and contextually rich responses using the retrieved chunks.
  • Enhance the UI for source display, potentially linking to specific sections or documents.
  • Add user authentication (e.g., Supabase Auth).
  • Improve error handling and user feedback.
  • Implement Gemini function calling in the main API handler to orchestrate calls to different Supabase Edge Functions based on user intent.

About

Arti-Assistant: An AI-powered educational agent for educators. It uses a Retrieval-Augmented Generation (RAG) system to provide answers on sexual consent, sexual violence, and digital safety, based on a curated knowledge base of documents.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors