A Universal LLM API Gateway & Transformation Layer.
Plexus is a high-performance API gateway that unifies access to multiple AI providers (OpenAI, Anthropic, Google, GitHub Copilot, and more) under a single endpoint. Switch models and providers without rewriting client code.
Plexus sits in front of your LLM providers and handles protocol translation, load balancing, failover, and usage tracking — transparently. Send any supported request format to Plexus and it routes to the right provider, transforms as needed, and returns the response in the format your client expects.
Key capabilities:
- Unified API surface — Accept OpenAI (
/v1/chat/completions), Anthropic (/v1/messages), Gemini, Embeddings, Audio, Images, and Responses (/v1/responses) formats - Multi-provider routing — Route to OpenAI, Anthropic, Google Gemini, DeepSeek, Groq, OpenRouter, and any OpenAI-compatible provider
- OAuth providers — Authenticate via GitHub Copilot, Anthropic Claude, OpenAI Codex, Gemini CLI, and Antigravity through OAuth (no API key required)
- Model aliasing & load balancing — Define virtual model names backed by multiple real providers with
random,cost,performance,latency, orin_orderselectors - Intelligent failover — Exponential backoff cooldowns automatically remove unhealthy providers from rotation
- Usage tracking — Per-request cost, token counts, latency, and TPS metrics with a built-in dashboard
- MCP proxy — Proxy Model Context Protocol servers through Plexus with per-request session isolation
- User quotas — Per-API-key rate limiting by requests or tokens with rolling, daily, or weekly windows
- Admin dashboard — Web UI for configuration, usage analytics, debug traces, and quota monitoring
Start with a minimal config file that all options below share:
# config/plexus.yaml
adminKey: "change-me"
providers:
openai:
api_base_url: https://api.openai.com/v1
api_key: "sk-your-openai-key"
models:
- gpt-4o
- gpt-4o-mini
models:
fast:
targets:
- provider: openai
model: gpt-4o-mini
keys:
my-app:
secret: "sk-plexus-my-key"DATABASE_URL is required and tells Plexus where to store usage data. Use a local SQLite file for simple deployments, or a PostgreSQL connection string for production.
docker run -p 4000:4000 \
-v $(pwd)/config/plexus.yaml:/app/config/plexus.yaml \
-v plexus-data:/app/data \
-e DATABASE_URL=sqlite:///app/data/plexus.db \
ghcr.io/mcowger/plexus:latestDownload the latest pre-built binary from GitHub Releases:
# macOS (Apple Silicon)
curl -L https://github.com/mcowger/plexus/releases/latest/download/plexus-macos -o plexus
chmod +x plexus
DATABASE_URL=sqlite://./data/plexus.db ./plexus
# Linux (x64)
curl -L https://github.com/mcowger/plexus/releases/latest/download/plexus-linux -o plexus
chmod +x plexus
DATABASE_URL=sqlite://./data/plexus.db ./plexus
# Windows (x64) — download plexus.exe from the releases page, then:
# set DATABASE_URL=sqlite://./data/plexus.db && plexus.exeThe binary is self-contained (no runtime or dependencies required). By default it looks for config/plexus.yaml relative to the working directory.
curl -X POST http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer sk-plexus-my-key" \
-H "Content-Type: application/json" \
-d '{"model": "fast", "messages": [{"role": "user", "content": "Hello!"}]}'The dashboard is at http://localhost:4000 — log in with your adminKey.
OAuth providers (GitHub Copilot, Anthropic, OpenAI Codex, etc.) use credentials managed through the Admin UI. These are stored in
./auth.jsonby default — no manual setup required. SetAUTH_JSONto override the path. See Configuration: OAuth Providers.
See Installation Guide for Docker Compose, building from source, and all environment variable options.
- Responses API: Full OpenAI
/v1/responsesendpoint with multi-turnprevious_response_idtracking and conversation management - Image & Speech APIs:
/v1/images/generations,/v1/images/edits, and/v1/audio/speechendpoints - Per-Request Pricing: Flat dollar amount per API call, independent of token count
- MCP Proxy Support: Proxy streamable HTTP MCP servers with per-request session isolation
- OAuth Providers: Authenticate to Anthropic, GitHub Copilot, Gemini CLI, Antigravity, and OpenAI Codex via the Admin UI
- User Quota Enforcement: Per-API-key limits using rolling (leaky bucket), daily, or weekly windows
- Escalating Cooldown System: Exponential backoff for provider failures (2 min → 5 hr cap); success resets failure count
- Quota Tracking System: Monitor provider rate limits with configurable per-provider checkers
- Dynamic Key Attribution: Append
:labelto any API key secret to track usage by feature or team
Define model aliases backed by one or more providers. Choose how targets are selected:
| Selector | Behavior |
|---|---|
random |
Distribute requests randomly across healthy targets (default) |
in_order |
Try providers in order; fall back when one is unhealthy |
cost |
Always route to the cheapest configured provider |
performance |
Route to the highest tokens/sec provider (with exploration) |
latency |
Route to the lowest time-to-first-token provider |
Use priority: api_match to prefer providers that natively speak the incoming API format, enabling pass-through optimization.
→ See Configuration: models
Plexus supports protocol translation between:
- OpenAI chat completions format (
/v1/chat/completions) - Anthropic messages format (
/v1/messages) - Google Gemini native format
- Any OpenAI-compatible provider (DeepSeek, Groq, OpenRouter, Together, etc.)
A request sent in Anthropic format can be routed to an OpenAI provider — Plexus handles the transformation in both directions, including streaming and tool use.
→ See API Reference
Use AI services you already have subscriptions to without managing API keys. Plexus integrates with pi-ai to support OAuth-backed providers:
- Anthropic Claude
- OpenAI Codex
- GitHub Copilot
- Google Gemini CLI
- Google Antigravity
OAuth credentials are stored in auth.json and managed through the Admin UI.
→ See Configuration: OAuth Providers
Limit how much each API key can consume using rolling, daily, or weekly windows:
user_quotas:
premium:
type: rolling
limitType: tokens
limit: 100000
duration: 1h
keys:
my-app:
secret: "sk-plexus-app-key"
quota: premium→ See Configuration: user_quotas
When a provider fails, Plexus removes it from rotation using exponential backoff: 2 min → 4 min → 8 min → ... → 5 hr cap. Successful requests reset the counter. Set disable_cooldown: true on a provider to opt it out entirely.
→ See Configuration: cooldown
Proxy Model Context Protocol servers through Plexus. Only streamable HTTP transport is supported. Each request gets an isolated MCP session, preventing tool sprawl across clients.
mcp_servers:
my-tools:
upstream_url: https://my-mcp-server.example.com/mcp→ See Configuration: MCP Servers
Full support for OpenAI's /v1/responses endpoint including stateful multi-turn conversations via previous_response_id, response storage with 7-day TTL, and function calling.
→ See Responses API Reference
MIT License — see LICENSE file.




