Rift through operational complexity
Autonomous infrastructure orchestration and remediation powered by DigitalOcean Gradient AI + Model Context Protocol.
This is a Vibe-coded project, I just made out of fun in a hackathon weekend as I like the idea.
Opening rifts to create, closing rifts to fix - all at machine speed.
- What is Rift?
- Project Structure
- Frontend Setup
- Backend Setup
- Running the Complete System
- Design System
- Demo Guide
- Architecture
- Tech Stack
Rift is an AI-powered infrastructure orchestrator that autonomously detects, diagnoses, and fixes infrastructure issues using multi-agent systems and the Model Context Protocol (MCP).
- DevOps engineers spend 40% of their time on routine infrastructure incidents
- Average incident response time: 2-4 hours
- Manual fixes are error-prone and inconsistent
- On-call fatigue leads to burnout
FixBot uses three specialized AI agents:
- Monitor Agent - Detects issues in seconds via DigitalOcean MCP + Prometheus
- Diagnostic Agent - Uses RAG to analyze root causes from knowledge base
- Remediation Agent - Fixes problems automatically via Terraform + MCP
Result: Incident response time reduced from hours to ~90 seconds, fully autonomous.
fixbot/
│
├── frontend/ # 🎨 Next.js Dashboard (Pre-built)
│ ├── app/
│ │ ├── page.tsx # Main dashboard
│ │ ├── layout.tsx # Root layout
│ │ ├── globals.css # Dark punk theme styles
│ │ └── favicon.ico
│ │
│ ├── components/
│ │ ├── ui/ # shadcn/ui components
│ │ │ ├── button.tsx
│ │ │ ├── card.tsx
│ │ │ └── badge.tsx
│ │ │
│ │ ├── StatusCard.tsx # Agent status display
│ │ ├── IncidentFeed.tsx # Real-time event stream
│ │ ├── AgentStatus.tsx # Agent health monitor
│ │ ├── MetricsChart.tsx # System metrics visualization
│ │ ├── TraceViewer.tsx # AI decision traceability
│ │ └── Terminal.tsx # Terminal-style output
│ │
│ ├── lib/
│ │ ├── api.ts # Backend API client
│ │ ├── websocket.ts # WebSocket connection
│ │ └── utils.ts # Utility functions
│ │
│ ├── public/
│ │ └── fixbot-logo.svg
│ │
│ ├── package.json
│ ├── tsconfig.json
│ ├── tailwind.config.ts # Dark punk theme
│ ├── next.config.js
│ ├── .env.local # Configure this!
│ └── README.md
│
└── backend/ # 🔧 Python Backend (Build this!)
├── agents/
│ ├── base_agent.py # Base agent class
│ ├── monitor_agent.py # Monitoring logic
│ ├── diagnostic_agent.py # Diagnosis with RAG
│ └── remediation_agent.py # Auto-remediation
│
├── mcp_clients/
│ ├── do_mcp.py # DigitalOcean MCP
│ ├── terraform_mcp.py # Terraform MCP
│ └── prometheus_mcp.py # Custom Prometheus MCP
│
├── orchestrator/
│ └── coordinator.py # Agent coordination
│
├── models/
│ └── incident.py # Pydantic models
│
├── terraform/
│ ├── main.tf
│ └── modules/
│
├── demo/
│ └── failure_injection.py # Demo scenarios
│
├── knowledge-base/
│ ├── do-docs.md
│ ├── runbooks.md
│ └── past-incidents.json
│
├── main.py # FastAPI backend
├── requirements.txt
├── .env # Configure this!
└── README.md
Frontend (fixbot/frontend/):
- ✅ Pre-built and ready to use - Just run
npm installand configure.env.local - Next.js 14+ with App Router
- Real-time dashboard with WebSocket updates
- Dark Punk Professional Theme - Cyberpunk aesthetics meets Bloomberg Terminal
- TypeScript + Tailwind CSS + shadcn/ui
- Minimal configuration needed
Backend (fixbot/backend/):
⚠️ You build this during the hackathon- Python FastAPI application
- AI agents (Monitor, Diagnostic, Remediation)
- MCP server integrations
- WebSocket server for real-time updates
- Infrastructure as Code (Terraform)
The frontend is pre-built with a professional dark punk theme. You just need to install and configure it.
- Node.js 18+ and npm 9+
- A running backend API (see Backend Setup)
# Navigate to frontend directory
cd fixbot/frontend
# Install dependencies
npm install
# Configure environment variables
cp .env.example .env.localEdit .env.local:
# Backend API endpoint
NEXT_PUBLIC_API_URL=http://localhost:8000
# WebSocket endpoint
NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws# Run development server
npm run dev
# Open browser
open http://localhost:3000- Framework: Next.js 14 (App Router)
- Language: TypeScript
- Styling: Tailwind CSS
- Components: shadcn/ui
- State: React Hooks
- Real-time: WebSocket client
- API Client: Fetch API with error handling
The backend is what you'll build during the hackathon.
- Python 3.11+
- DigitalOcean account with API token
- Gradient AI Platform access
- Terraform installed
- Docker (for MCP servers)
# Navigate to backend directory
cd fixbot/backend
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .envEdit .env:
# DigitalOcean
DO_API_TOKEN=your_do_token_here
DO_SPACES_KEY=your_spaces_key
DO_SPACES_SECRET=your_spaces_secret
# Gradient AI
GRADIENT_AI_API_KEY=your_gradient_key
MONITOR_AGENT_ID=agent_xxx
DIAGNOSTIC_AGENT_ID=agent_yyy
REMEDIATION_AGENT_ID=agent_zzz
# MCP Servers
DO_MCP_URL=http://localhost:3000
TERRAFORM_MCP_URL=http://localhost:3001
PROMETHEUS_URL=http://your-prometheus:9090# Start MCP servers (in separate terminals)
# See MCP Integration section in full guide
# Run FastAPI backend
python main.py
# Backend should start on http://localhost:8000- Framework: FastAPI
- AI Platform: DigitalOcean Gradient AI
- Protocol: Model Context Protocol (MCP)
- IaC: Terraform
- Monitoring: Prometheus
- Language: Python 3.11+
Terminal 1: Backend API
cd fixbot/backend
source venv/bin/activate
python main.py
# Runs on http://localhost:8000Terminal 2: Frontend Dashboard
cd fixbot/frontend
npm run dev
# Runs on http://localhost:3000Terminal 3: Monitor Logs (Optional)
cd fixbot/backend
tail -f logs/fixbot.log# Check backend health
curl http://localhost:8000/agents/health
# Check frontend loads
curl http://localhost:3000
# Open dashboard in browser
open http://localhost:3000 # macOS
xdg-open http://localhost:3000 # LinuxYou should see:
- ✅ All three agent status cards showing "Active" with green indicators
- ✅ System metrics displaying normal values
- ✅ Live connection indicator showing "Connected"
- ✅ Empty incident feed (no incidents yet)
"Professional Cyberpunk" - The aesthetic of a high-tech operations center. Think: Blade Runner meets Bloomberg Terminal. Dark, sleek, with neon accents that convey urgency and precision.
/* Background & Surfaces */
--background: #0a0e17; /* Deep space black */
--surface: #111827; /* Card/panel background */
--surface-elevated: #1f2937; /* Elevated elements */
/* Brand Colors (Neon Accents) */
--primary: #00ff9f; /* Neon green - success/active */
--secondary: #00d4ff; /* Cyber blue - info */
--accent: #ff00ff; /* Neon magenta - alerts */
/* Status Colors */
--success: #00ff9f; /* Neon green */
--warning: #ffaa00; /* Electric amber */
--danger: #ff3366; /* Hot pink red */
/* Text */
--text-primary: #e5e7eb; /* Almost white */
--text-secondary: #9ca3af; /* Muted gray */
--text-muted: #6b7280; /* Very muted */
/* Borders */
--border: #1f2937; /* Subtle borders */
--border-bright: #374151; /* Highlighted borders */Fonts:
- Headers:
"JetBrains Mono"or"Space Mono"(monospace, technical feel) - Body:
"Inter"or"DM Sans"(clean, readable) - Code/Terminal:
"Fira Code"or"Cascadia Code"(with ligatures)
Guidelines:
- Use UPPERCASE for labels and status indicators
- Use monospace for anything technical (IDs, timestamps, metrics)
- Use medium-large sizes for important info (remember: projector demo!)
- Use color to convey meaning (green = good, red = critical, blue = info)
<Card className="bg-[#111827] border border-[#1f2937] hover:border-[#00ff9f] transition-all">
<div className="flex items-center gap-3">
{/* Active indicator - pulsing green dot */}
<div className="h-2 w-2 rounded-full bg-[#00ff9f] animate-pulse" />
{/* Agent name - monospace, uppercase, neon green */}
<span className="text-[#00ff9f] font-mono uppercase tracking-wider">
Monitor Agent
</span>
</div>
{/* Status info - secondary text */}
<div className="mt-2 text-[#9ca3af] text-sm">
Status: Active • Last check: 2s ago
</div>
</Card><div className="bg-black border border-[#00ff9f] rounded p-4 font-mono">
<div className="flex gap-2 text-[#00ff9f]">
<span className="text-[#00ff9f]">●</span>
<span>14:32:15 | FixBot detected high CPU (95%)</span>
</div>
<div className="flex gap-2 text-[#00d4ff]">
<span className="text-[#00d4ff]">●</span>
<span>14:32:18 | Analyzing root cause...</span>
</div>
<div className="flex gap-2 text-[#00ff9f]">
<span className="text-[#00ff9f]">●</span>
<span>14:33:45 | ✅ RESOLVED - Droplet resized</span>
</div>
</div><div className="space-y-2">
<div className="flex justify-between text-sm">
<span className="text-[#9ca3af]">CPU Usage</span>
<span className="text-[#00ff9f] font-mono">42%</span>
</div>
{/* Progress bar with gradient */}
<div className="h-2 bg-[#1f2937] rounded-full overflow-hidden">
<div
className="h-full bg-gradient-to-r from-[#00ff9f] to-[#00d4ff]"
style={{ width: "42%" }}
/>
</div>
</div>Use sparingly and professionally:
/* Pulse for active states */
@keyframes pulse {
0%,
100% {
opacity: 1;
}
50% {
opacity: 0.6;
}
}
/* Glow effect on hover */
.hover-glow:hover {
box-shadow: 0 0 20px rgba(0, 255, 159, 0.3);
}
/* Subtle scan line (optional) */
@keyframes scan {
0% {
transform: translateY(-100%);
}
100% {
transform: translateY(100%);
}
}DO:
- ✅ Pulse indicators for active/live states
- ✅ Smooth transitions (0.2-0.3s)
- ✅ Hover effects (glow, border color change)
- ✅ Fade in/out for notifications
DON'T:
- ❌ Excessive animations
- ❌ Constant movement
- ❌ Distracting effects during demo
- ❌ Flashy transitions
Dashboard Grid (Desktop):
┌─────────────────────────────────────────────────┐
│ 🤖 FixBot [●] LIVE │
├─────────────────────────────────────────────────┤
│ │
│ [Monitor] [Diagnostic] [Remediation] │ ← Agent status cards
│ │
├─────────────────────────────────────────────────┤
│ │
│ 📊 System Metrics │ ← Metrics display
│ CPU | Memory | Disk │
│ │
├─────────────────────────────────────────────────┤
│ │
│ 🔴 Live: Incident Timeline │ ← Real-time feed
│ [Scrolling event stream...] │
│ │
└─────────────────────────────────────────────────┘
Spacing:
- Use
gap-4(1rem) orgap-6(1.5rem) between elements - Generous padding inside cards:
p-6orp-8 - Consistent margins:
my-4ormy-6
5 minutes before demo:
-
Start Backend:
cd fixbot/backend && python main.py
-
Start Frontend:
cd fixbot/frontend && npm run dev
-
Open Dashboard:
open http://localhost:3000
-
Verify Status:
- All agents show green "Active"
- System metrics display normally
- Live indicator shows "Connected"
-
Prepare Failure Injection:
cd fixbot/backend/demo # Have terminal ready with injection command
[0:00-0:30] Hook + Dashboard Intro
YOU: "Infrastructure breaks. That's a fact of life.
But what if you had a bot that fixed things automatically -
before they wake up your on-call engineer at 3 AM?
That's FixBot."
[Show dashboard on screen - point to it]
"This is FixBot's operations center.
Three AI agents monitoring our infrastructure 24/7."
[0:30-1:00] Architecture Walkthrough
[Point to each agent card]
YOU: "Three specialized agents:
Monitor Agent - detects issues via DigitalOcean MCP and Prometheus
Diagnostic Agent - uses RAG to analyze root causes
Remediation Agent - fixes problems automatically via Terraform
All powered by DigitalOcean Gradient AI with Model Context Protocol."
[1:00-4:00] Live Demo: CPU Spike
# Run in terminal (don't show this to judges, just run it)
python failure_injection.py --inject cpu --target web-app[FOCUS ON DASHBOARD - this is the star]
YOU: "Let me trigger a real incident. I'm overloading our web server..."
[Dashboard comes alive:]
- Monitor Agent: Status changes to "⚠ DETECTING..."
- Incident feed starts scrolling:
"14:32:15 | 🔴 ALERT: High CPU detected (95%)"
[CPU metric bar turns red, shows 95%]
YOU: "Three seconds. FixBot detected it."
[Diagnostic Agent activates:]
"14:32:18 | 🔍 Analyzing root cause..."
"14:32:22 | 💡 Root cause: Undersized droplet"
"14:32:22 | 📋 Recommended: Resize to s-2vcpu-4gb"
YOU: "Now it's using RAG - querying our knowledge base of past incidents,
DigitalOcean documentation, best practices..."
[Remediation Agent executes:]
"14:32:25 | 🔧 Executing: Terraform resize"
"14:32:30 | ⚙️ Applying infrastructure changes..."
"14:33:45 | ✅ RESOLVED: Droplet resized"
[CPU drops to 42%, turns green]
[All agents return to "Active" status]
YOU: "90 seconds total. From detection to resolution.
Completely autonomous. No human intervention."
[Pause for impact]
[4:00-5:00] Show Traceability
[Click on resolved incident in feed]
[Opens trace viewer panel]
YOU: "Here's what makes this special - full traceability.
[Point to trace view showing:]
- Input metrics and system state
- RAG retrieval results from knowledge base
- Decision logic and confidence scores
- Terraform config generated
- Success validation
"Every decision the AI makes is auditable.
This isn't a black box. You can see exactly why FixBot chose this solution."
[5:00-6:00] Quick Second Demo (If Time)
python failure_injection.py --inject disk --target api-serverYOU: "One more. Disk full on API server..."
[Faster walkthrough on dashboard]
- Detect (5s)
- Diagnose (15s)
- Attach new volume (45s)
- Resolved
YOU: "Same pattern. Different problem. Fixed automatically."
[6:00-7:00] Closing
[Return to clean dashboard - all green]
YOU: "FixBot - the infrastructure fixer that never sleeps.
Key features:
• Detects issues in seconds using DigitalOcean MCP
• Diagnoses with AI-powered RAG
• Fixes automatically via Terraform
• Full traceability of every decision
• Built entirely on DigitalOcean Gradient AI
This is the future of infrastructure management.
No more 3 AM wake-up calls.
No more manual emergency fixes.
Just autonomous, intelligent infrastructure.
Questions?"
[Confident smile, pause]
DO:
- ✅ Keep dashboard fullscreen during demo
- ✅ Speak slowly and clearly
- ✅ Pause after key points for impact
- ✅ Point to screen elements as you explain
- ✅ Show enthusiasm - this is cool tech!
- ✅ Have backup video if live demo fails
DON'T:
- ❌ Switch between terminal and browser constantly
- ❌ Rush through the demo
- ❌ Get lost in technical details
- ❌ Apologize for delays (they're normal)
- ❌ Turn your back to audience
┌──────────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ (Next.js Dashboard) │
│ │
│ • Dark Punk Professional Theme │
│ • Real-time WebSocket Updates │
│ • Agent Status Monitoring │
│ • Incident Timeline │
│ • Decision Traceability │
└────────────────┬────────────────────────────────────────────┘
│
│ WebSocket + REST API
│
┌────────────────▼────────────────────────────────────────────┐
│ FASTAPI BACKEND │
│ (Orchestrator) │
│ │
│ Endpoints: │
│ • POST /incidents/detect │
│ • POST /incidents/diagnose │
│ • POST /incidents/remediate │
│ • GET /status │
│ • GET /agents/health │
│ • WS /ws (WebSocket for real-time) │
└────────────────┬────────────────────────────────────────────┘
│
│ Agent API Calls
│
┌────────────────▼────────────────────────────────────────────┐
│ DIGITALOCEAN GRADIENT AI PLATFORM │
│ (Multi-Agent System) │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ MONITOR │──│ DIAGNOSTIC │──│ REMEDIATION │ │
│ │ AGENT │ │ AGENT │ │ AGENT │ │
│ │ │ │ │ │ │ │
│ │ • Detect │ │ • RAG Query │ │ • Terraform │ │
│ │ • Alert │ │ • Analyze │ │ • DO API │ │
│ │ • Triage │ │ • Recommend │ │ • Validate │ │
│ └────┬─────┘ └──────┬───────┘ └────────┬───────┘ │
│ │ │ │ │
│ └───────────────┼────────────────────┘ │
│ │ │
│ ┌────────────────────▼─────────────────────────────────┐ │
│ │ KNOWLEDGE BASE (RAG) │ │
│ │ • DO Documentation (auto-indexed) │ │
│ │ • Runbooks & Best Practices │ │
│ │ • Past Incident History │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
│
│ MCP Protocol
│
┌────────────────▼────────────────────────────────────────────┐
│ MCP SERVERS │
│ │
│ ┌─────────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ DigitalOcean │ │ Terraform │ │ Prometheus │ │
│ │ MCP Server │ │ MCP Server │ │ MCP (Custom) │ │
│ │ │ │ │ │ │ │
│ │ • Droplets │ │ • Validate │ │ • Query │ │
│ │ • Monitoring │ │ • Plan │ │ • Alerts │ │
│ │ • Spaces │ │ • Apply │ │ • Metrics │ │
│ │ • Kubernetes │ │ • State │ │ │ │
│ └─────────────────┘ └──────────────┘ └──────────────┘ │
└──────────────────────────────────────────────────────────────┘
- Framework: Next.js 14 (App Router)
- Language: TypeScript 5+
- Styling: Tailwind CSS 3.4
- Components: shadcn/ui
- State Management: React Hooks (useState, useEffect, useContext)
- Real-time: WebSocket API
- HTTP Client: Fetch API
- Build Tool: Next.js built-in (Turbopack)
- Framework: FastAPI 0.109+
- Language: Python 3.11+
- AI Platform: DigitalOcean Gradient AI
- Multi-agent system
- RAG (Retrieval-Augmented Generation)
- Function calling
- Agent evaluations
- Traceability
- Protocol: Model Context Protocol (MCP)
- Infrastructure: Terraform 1.6+
- Monitoring: Prometheus
- State Management: DO Spaces (S3-compatible)
- WebSocket: FastAPI WebSocket support
- Cloud: DigitalOcean
- Droplets (compute)
- Spaces (object storage)
- Managed Kubernetes (optional)
- Monitoring (built-in)
- IaC: Terraform with DO provider
- Orchestration: FastAPI + asyncio
- Monitoring: Prometheus + node_exporter
- Real-time dashboard with WebSocket
- Dark punk professional theme
- Agent status monitoring
- Live incident feed
- System metrics visualization
- Decision traceability viewer
- Responsive layout (desktop-focused)
- Monitor Agent with DO MCP integration
- Diagnostic Agent with RAG
- Remediation Agent with Terraform
- FastAPI orchestrator
- WebSocket server for real-time updates
- MCP client implementations
- Knowledge base setup
- Demo failure injection scripts
- Agent evaluations
# Install
npm install
# Dev mode
npm run dev
# Build
npm run build
# Production
npm start
# Type check
npm run type-check
# Lint
npm run lint# Install
pip install -r requirements.txt
# Run dev
python main.py
# Run with reload
uvicorn main:app --reload
# Run tests
pytest tests/
# Type check
mypy .- Check
.env.localhas correctNEXT_PUBLIC_API_URL - Verify backend is running on expected port
- Check CORS settings in FastAPI backend
- Look for errors in browser console (F12)
- Check
NEXT_PUBLIC_WS_URLin.env.local - Verify WebSocket endpoint exists in backend
- Check firewall/proxy settings
- Test with:
wscat -c ws://localhost:8000/ws
- Clear browser cache
- Check
globals.cssis imported inlayout.tsx - Verify Tailwind is processing CSS correctly
- Run
npm run devwith clean cache
- Check Gradient AI API keys in backend
.env - Verify agent IDs are correct
- Test agent endpoints individually
- Check Gradient AI dashboard for errors
MIT License - see LICENSE file for details
MLH + DigitalOcean AI Hackathon NYC
December 12-13, 2025
Built with ❤️ and ☕ by [Your Name]
- DigitalOcean Gradient AI
- Model Context Protocol
- Next.js Documentation
- FastAPI Documentation
- Terraform DigitalOcean Provider
Questions? Found a bug? Want to contribute?
Open an issue or PR on GitHub!
🤖 FixBot - Breaking things? We fix them before you notice. 🤖