An AI-powered assistant for home and business automation. Gilbert combines a modular, interface-driven architecture with an agentic AI core — giving it the ability to control speakers, greet people at the door, manage email, spin up a radio DJ, expose its tools over MCP, and much more, all orchestrated through natural conversation or automated event-driven workflows.
Everything in Gilbert is an abstraction. Swap your AI provider, your speaker system, your presence detector, or your storage backend without touching a single line of business logic. The core ships with only vendor-free backends (local auth, local filesystem documents, local + browser speaker playback, local Whisper speech-to-text, internal-URL via sslip.io, MCP transports); every third-party integration — Anthropic, Sonos, Google, UniFi, ElevenLabs, Tavily, Slack, ngrok, Tesseract — is a plugin. Plugins live in a separate gilbert-plugins repo that's included as a git submodule at std-plugins/, and new plugins can be added at runtime from any GitHub URL.
Gilbert is a multi-user system from the ground up — every piece of state (mailboxes, chat history, documents, MCP servers, scheduled jobs) is owned by a specific user, shared via roles and per-collection ACLs, and gated by a role-based access control layer that consistently applies across the web UI, chat, tools, events, and the MCP endpoint.
- Python 3.12+
- uv (Python package and project manager)
- Git (with submodule support — any recent version)
- Node.js + npm —
gilbert.sh startrebuilds the frontend SPA on every launch (the compiled bundle undersrc/gilbert/web/spa/is gitignored, so a fresh clone has to build it).
Some plugins have additional OS-level prerequisites — e.g. the tesseract plugin needs the Tesseract binary (apt install tesseract-ocr, brew install tesseract, or pacman -S tesseract tesseract-data-eng on Arch — note that Arch ships language data as a separate package). Check std-plugins/README.md for per-plugin requirements.
git clone https://github.com/briandilley/gilbert.git
cd gilbert
# Either let gilbert.sh init the submodule for you...
./gilbert.sh start
# ...or do it manually:
git submodule update --init --recursive
uv syncuv sync resolves the entire workspace — Gilbert core plus every plugin's third-party deps — into a single venv. If you add or update a plugin later, re-running uv sync picks up the changes.
Gilbert ships with sensible bootstrap defaults in gilbert.yaml. Only a small handful of settings live in YAML — storage, logging, web server binding, and the plugin directory list — because those are needed before entity storage is available. Everything else is configured at runtime through the Settings UI under the System → Settings menu at http://localhost:8000/settings.
On first run, non-bootstrap sections from gilbert.yaml (AI config, plugin config, etc.) are seeded into the entity store. After that, the Settings UI is the source of truth — changes there persist to .gilbert/gilbert.db and take effect immediately (or on restart for params marked restart_required).
If you need to override bootstrap values for this specific installation, create a local override file:
mkdir -p .gilbert
cat > .gilbert/config.yaml <<'EOF'
# Only include values you're changing. Deep-merged on top of gilbert.yaml.
auth:
root_password: "pick-a-strong-password"
web:
port: 9000
EOFThe .gilbert/ directory is gitignored — your API keys, database, and logs stay local. Runtime config (AI backend selection, TTS API keys, plugin settings, etc.) is managed through the Settings UI, not this file.
Set a root password before the first boot. On first run Gilbert seeds non-bootstrap YAML values into entity storage and from then on the database is the source of truth — so editing .gilbert/config.yaml after the first start has no effect on already-seeded keys. If you boot without auth.root_password set, the bootstrapped root user is created with no usable password and nobody can log in (local visitors get the everyone role, which can't reach Settings). To recover, stop Gilbert (./gilbert.sh stop), delete .gilbert/gilbert.db*, add auth.root_password to .gilbert/config.yaml, and start again. The admin username is root.
At minimum, before Gilbert is useful, open the Settings UI and configure:
- AI → select the
anthropicbackend and enter your API key (sk-ant-…). - Whatever plugins you care about — e.g. Sonos speakers (discovery is automatic), Google personal-account or Workspace backends, UniFi presence (host + credentials), and so on. Each plugin's settings page has a Test connection button to verify credentials before you commit to them.
# Start Gilbert (auto-inits the std-plugins submodule if empty, then uv sync, then launches)
./gilbert.sh start
# Dev mode (same as start, with verbose logging suitable for iteration)
./gilbert.sh dev
# Stop Gilbert (sends SIGTERM to the running PID)
./gilbert.sh stopgilbert.sh start runs Gilbert under a supervisor loop that distinguishes normal stops from "please restart me" exits — when a runtime-installed plugin needs uv sync to pick up new Python deps, it sets an internal flag, Gilbert exits with code 75 (EX_TEMPFAIL), and the supervisor loop re-syncs and relaunches automatically. Ctrl+C and ./gilbert.sh stop propagate cleanly and do not trigger a restart.
On first run, Gilbert creates the .gilbert/ directory and initializes the SQLite database, log files, and default AI profiles. The web UI is available at http://localhost:8000 — log in as root with the password you set under auth.root_password in .gilbert/config.yaml (see Configure), then head to Security → Users to add more accounts.
Gilbert also serves HTTPS on https://localhost:8443 using a self-signed certificate auto-generated on first boot under .gilbert/credentials/tls.{crt,key}. The HTTPS listener is what unlocks browser features that require a secure context — notably getUserMedia for the voice agent and any other mic/camera-driven flow. Open /setup-https for the cert download and per-OS install steps so phones and other LAN browsers can trust it. Disable by setting web.tls.enabled: false in .gilbert/config.yaml if you terminate TLS elsewhere.
Gilbert is designed for households and small teams, not a single desktop user. Every request — whether from the web UI, a Slack DM, a chat with the AI, or an external MCP client — is attributed to a specific user and passed through a consistent authorization layer.
- Users. Local accounts (
LocalAuthin core) or external identity providers (GoogleAuthBackendin thegoogleplugin). Users are managed under Security → Users in the web UI; an admin can create accounts, assign roles, and configure per-user mailbox access. External directory backends like Google Workspace can auto-sync user lists. - Roles. The built-in role hierarchy is
admin > user > everyone. Every tool, RPC method, event, and entity collection declares arequired_rolethat the caller's effective role level has to meet or exceed. Custom roles can be added on top. Managed under Security → Roles. - Per-capability ACLs. RBAC isn't a single boolean — it's layered:
- Security → Tools — per-tool role requirements (e.g. the
delete_documenttool requiresadmin, theplay_musictool acceptsuser). - Security → AI Profiles — named bundles of tool allowlist + backend + model that any AI call resolves through. Built-in profiles (
light/standard/advanced) are tier-shaped; services that drive the AI (chat, greeting, scheduler, Slack, MCP sampling, etc.) each declare anai_profileconfig so admins can route different use cases to different tiers/models. Profiles control which tools are available; RBAC controls who can invoke them. Both always apply. Pure text-generation callers (greetings, roasts) force zero tools at the call site (complete_one_shot(tools_override=[])) so their profile choice only selects backend/model — they can't accidentally invoke tools likeannounce. - Security → Collections — per-entity-collection read/write ACLs for the generic storage layer, so e.g.
inbox.mailboxescan be read byuserbut only written by the mailbox owner. - Security → Events — per-event-type visibility so sensitive event types (e.g. presence updates, doorbell events) don't leak to clients who shouldn't see them.
- Security → RPC — per-WebSocket-method permissions for direct RPC frames outside the tool pipeline.
- Security → Tools — per-tool role requirements (e.g. the
- Ownership. Mailboxes, knowledge sources, MCP servers, and scheduled jobs are owned by a user and can be shared with individual users or roles. Shared items respect the owner's access chain — an admin sharing their mailbox with a
usergrants read/send but not ownership transfer. - MCP multi-user support.
- Client side (Gilbert → external servers): each MCP server record has a
scope(private/shared/public) and an optionalallowed_userslist. The rule is "if you can see it, you can use it" — discovery and invocation are gated by the same visibility check. - Server side (external clients → Gilbert): Gilbert exposes its own tools as an MCP server at
/api/mcp. Admins register external clients under MCP → Clients, pick an owner user and an AI profile, and hand the client a one-time bearer token. Every tool call from that client runs under the owner'sUserContextwith the profile's tool allowlist applied, so external agents can't see or call anything their owner couldn't. - Browser-bridged local servers (user → their own machine): any user can point Gilbert at MCP servers running on their own laptop (or their LAN) under MCP → Local without opening firewall holes. The browser tab acts as a transport proxy — Gilbert sends MCP JSON-RPC calls over the authenticated WebSocket, the tab POSTs them to a URL the user configured locally, and the results flow back the same way. These entries are session-ephemeral, strictly private to the owning user (invisible to admins), and disappear the moment the tab closes. No server-side config, no tunnel, no extra process to run.
- Client side (Gilbert → external servers): each MCP server record has a
- Consistent across surfaces. The same authorization layer filters the web UI nav (items you can't access simply don't appear), the tool set the AI sees in chat, the commands available in slash-command autocomplete, the events streamed over WebSocket, and the tool list returned to an MCP client's
tools/listcall.
Out of the box — once the std-plugins submodule is initialized — Gilbert provides:
- AI chat with tool use — ask Gilbert to play music, check who's home, search your documents, compose an email, or push content to a wall-mounted display. Claude is the default AI via the
anthropicplugin; swap it for any other backend that implementsAIBackend. - Presence detection — know who's home (and where) via WiFi clients, cameras with facial recognition, and badge readers. The
unifiplugin aggregates UniFi Network, UniFi Protect, and UniFi Access signals into a single presence stream. - Doorbell monitoring — detect ring events from UniFi Protect cameras and announce visitors over your speakers with a custom TTS voice.
- Object-detection cameras (Frigate) — the
frigateplugin subscribes to Frigate's MQTT event stream for live object-detection events ("person at the side gate", "package on the porch", "car in the driveway") and exposes AI tools (list_cameras,latest_clips,get_snapshot,who_was_seen,count_detections). Snapshots and clips proxy through Gilbert withRangesupport so a<video>tag works over a tunnel without exposing your Frigate URL or auth token. Per-camera role overrides admin-gate sensitive cameras (bedrooms, vault) without leaking events to non-admin sockets, and the greeting service can announce package deliveries across adjacent cameras as a single event via configurable zone groups + per-label dedup keys. - Music and speaker control — the
sonosplugin discovers Sonos speakers on the LAN, handles playback/volume/grouping, and uses Spotify's Web API for browse/search. The Music service exposes search, queue, station ("play more like this"), and loop/repeat tools — capabilities are gated per-backend so swappable backends only surface what they actually support. - Video library + casting — the
media_libraryservice aggregates Plex and Jellyfin via a multi-backend ABC (MediaLibraryBackend). Theplexandjellyfinplugins each register a backend; the service fans library queries out across both, merges results, and dispatches playback to whichever server owns the target client. AI tools (/media search,/media recent,/media on-deck,/media now,/media play <title> on <client>,/media pause//media resume//media stop//media seek,/media recommend) cover the common cases. Per-user identity mapping persists Gilbert↔Plex-Home and Gilbert↔Jellyfin-user pairs so resume / on-deck reflect the calling user's library, never an admin-token leak.play_onfor a SHOW resolves to the user's next-unwatched / on-deck episode before dispatch — no silently playing the pilot. Visual UIBlock disambiguation when a search matches multiple high-confidence items, state-aware Play / Resume / Watch-again buttons. - Text-to-speech — the
elevenlabsplugin provides high-quality cloud-hosted synthesized voices; thekokoroplugin provides a local alternative (no API key, runs entirely in-process). Either backend works for announcements, greetings, and any AI-generated spoken output. - Speech-to-text — the bundled
local_whisperbackend (faster-whisper, no API key) transcribes audio files and browser-mic streams. Extensible via the multi-backend aggregator: theopenai,groq, andelevenlabsplugins add batch transcription backends; thedeepgramandelevenlabsplugins add streaming backends; theporcupineandopenwakewordplugins add wake-word detection backends. - Outbound phone calls — the
phoneplugin (carrier-agnosticPhoneCallService) places PSTN calls on the user's behalf via a chosenTelephonyBackend(thetelnyxplugin ships in std-plugins). The corevoice_brainengine drives the live conversation — STT → LLM → TTS, opening disclosure required by federal AI-call rules, per-turn transcript posted to the originating chat, mid-call "intervene with a directive" from the/callsSPA page, hard cap on call duration. Slash command/call <number> <brief>; AI toolmake_phone_call. One active call per user. - Bidirectional text messaging — the
messagingplugin (carrier-agnosticMessagingService) sends and receives messages across three transport tiers via a chosenMessagingBackend(thetelnyxplugin contributes one). The default tier is RCS (Rich Communication Services — typed text, read receipts, media, no segment length limit, end-to-end encryption with Universal Profile 2.0+, supported by Android Messages, Google Messages, and iOS 18+); the carrier downgrades to MMS (media + recipient not RCS-capable) or SMS (plain-text fallback, 160-char-per-segment) as needed. The actual tier the carrier picked round-trips intoMessage.typeso the SPA renders a per-message badge — useful for debugging "why isn't this thread showing read receipts?". The operator can force a different default in/settings → Messaging → default_message_type(e.g.smswhen your carrier prices RCS differently). Inbound messages route to the configured owner-user and persist as threads; the/messagesSPA page renders them as a familiar two-pane conversation view with a live compose box. Optionalauto_replyflag routes inbound messages throughAIService.chat()so Gilbert can text back on the user's behalf — tagged withsource="messaging"so the AI's history doesn't pollute the chat sidebar. Slash command/msg send <number> <body>; AI toolsend_text_message(optionalmessage_typearg to force a tier). Webhook lands at/api/telnyx/messages/webhook. - Smart glasses (Mentra) — the
mentraplugin connects Gilbert to the MentraOS platform so Gilbert lives on Even Realities G1, Vuzix Z100, Mentra Live, and any other Mentra-supported glasses. The plugin reimplements the Mentra cloud SDK in Python (JSON-over-WebSocket + raw-PCM-audio binary frames) so the integration lives natively in Gilbert with no Node sidecar. When a user launches the Gilbert app from their phone, Mentra Cloud posts a webhook to/api/mentra/webhook; the plugin dials back to the cloud-supplied WS URL, authenticates with the app's API key, and binds per-session managers for transcription, button presses, display layouts (text walls, reference cards), the persistent glance dashboard, and TTS. Voice input on the glasses → final transcription →AIService.chat(with the mapped Gilbert user as the caller) → short reply rendered on the heads-up display + spoken via cloud TTS. Operator-tunable system prompt biases the AI toward glasses-appropriate brevity ("one or two sentences, no markdown"). Email → user mapping table opts each Mentra account into a specific Gilbert user explicitly — no auto-create. Adapts per-device viasession.capabilities(the same code runs on display-only G1 and camera-only Mentra Live, just exposes different surfaces). Camera, LED, location, and livestream managers land in follow-up work. - Voice agent (browser) — the
voice-agentplugin runs the samevoice_brainengine over the browser microphone instead of a carrier. Press "Start Conversation" on/voice(or trigger a wake-word) and the LLM gets the full Gilbert tool ecosystem during a turn — knowledge search, MCP tools, agent dispatch, scheduler. Toggleable per service; disabled by default until a mic source is wired. - Email inbox — multi-mailbox, multi-user. Every mailbox is owned by a user and can be shared with individual users or roles for full read/send access. Messages land in a per-mailbox persisted store; outbound drafts queue through a shared outbox with crash-resilient delayed sends. The
googleplugin's Gmail backend is the reference implementation — add more by implementingEmailBackend. Incoming mail can also be handed off to an AI chat loop via the Inbox AI Chat service. - Calendar — multi-account, multi-user. Every calendar account is owned by a user and can be shared with individual users or roles. The Calendar service runs one backend instance per
poll_enabledaccount, caches events for fastget_schedule/next_event/find_free_time, and emitscalendar.event.upcomingnotifications. Eight AI tools (list_calendar_accounts,get_schedule,next_event,get_event,find_free_time,create_event,update_event,delete_event) handle every common use case; the three mutating tools default to a preview/confirmUIBlockflow so the AI can never silently fire real invite emails. Thegoogleplugin's Google Calendar backend is the reference implementation. - RSS / news feeds + daily briefing — multi-feed, multi-user. Every feed subscription is owned by a user and can be shared with individuals or roles. The Feeds service runs one backend instance per
poll_enabledfeed (cold-start jitter, source-suggested cadence honored, conditional-GETetag/If-Modified-Since, body-size cap, graceful give-up at 20 consecutive failures), scores each item against a configurable AI prompt on a bounded async worker pool, optionally ingests article bodies into the knowledge base for vector search (with robots.txt + paywall + SSRF guards and a per-user-per-day budget), and exposes eight AI tools (news_briefing,search_feeds,summarize_feed,subscribe_feed,unsubscribe_feed,list_feeds,read_feed_item,recommend_knowledge_ingestion). Thesubscribe_feed/unsubscribe_feedtools route through a Confirm/Cancel UI block so the AI can't subscribe you to a hallucinated URL. The built-inRssAtomFeedBackend(usingfeedparser) parses any RSS 2.0 / Atom 1.0 source — provider-specific backends (Reddit, HackerNews, podcasts) slot in as plugins. The companionfeed_briefingservice fans out a daily presence-fallback briefing event so the greeting service can splice today's top stories into the morning arrival announcement. - Tasks / todos — multi-list, multi-user. Every task list is owned by a user and can be shared with individuals or roles. The Tasks service runs one backend instance per list (the local backend is the source of truth for vendor-free lists; external backends like Google Tasks poll on a per-list schedule with
updatedMindeltas), persists tasks in a single cross-backendtaskscollection sodue_today/overdueaggregate across every list a user can see, and uses local-first reconciliation for writes — every mutation lands in storage immediately withsync_status=pending_push, the upstream push happens inline, and a recurringtasks-sync-tickretries pending rows. Ten AI tools (task_lists,add_task,get_task,list_tasks,complete_task,update_task,cancel_task,delete_task,tasks_due,summarize_today) cover every common use case;delete_taskdefaults to a soft-delete with UIBlock confirm (recoverable untilretention_dayselapses), andadd_taskreturns a UIBlock select when the user has multiple lists with no default. Inbox-AI gainsadd_taskfor free — the AI lifts action items out of incoming email and uses theMessage-Idas the idempotency key so re-deliveries don't double-add. Per-user time zones (viaUserContext.tz) drive day-boundary math so a Pacific user gets their "today," not the host's. Thegoogleplugin's Google Tasks backend ships in v1; Todoist and CalDAV are sketched in the spec for v1.1 and slot in without core changes. - Personal health metrics — multi-backend, multi-user. The Health service aggregates data from
apple-health(iOS Shortcut webhook),withings(OAuth 2.0 pull every 6 hours), andhk-webhook(generic catch-all webhook for Garmin Connect IQ widgets, Home Assistant automations, custom scripts) into a single per-user metrics store. Privacy posture is PHI-adjacent: owner-only reads enforced in the service (not in routes); cross-user reads gated behind a dedicatedhealth-adminrole (not auto-granted, not the same asadmin), audit-logged with target-user notification. Webhook tokens are SHA-256 hash-at-rest withhmac.compare_digestconfirmation; OAuth state is server-side-bound to the initiating user (defeats confused-deputy), backend-namespaced, expiry-bound, and one-shot consumed. Right-to-delete is a two-step preview-then-confirm wizard requiring the literal"DELETE"string; the cascade revokes upstream OAuth grants before local cleanup and writes ahealth_auditrow that survives the delete. The hourly TZ-aware daily-summary tick useszoneinfofor DST-correct boundaries (spring-forward 23h, fall-back 25h by construction) and feeds a non-clinical AI summary plus a code-driven flag vocabulary (low_sleep,sedentary,weight_drift) that the proposals integration consumes via the existing observation pipeline. The greeting integration receives a structuredGreetingBriefsnapshot (NOT a pre-canned paragraph) so the greeting model reasons about facts in its own voice, subject to the same forbidden-word rules as the daily-summary prompt. - Personalized greetings — When a known user arrives (via the Presence service), Gilbert generates a 1-2 sentence personalized welcome via the AI, then announces it over speakers. The Greeting service is extensible via the GreetingContextProvider capability — any service can advertise
"greeting_context"and contribute a labeled prose fact that enriches the greeting prompt. The bundled Weather, Feed Briefing, and Health services each contribute context; users control which providers are active via a toggle-per-provider settings page. Installing a new plugin that implements this capability is plug-and-play with no greeting-service edits. The user's custom arrival-greeting prompt template decides what to mention; the AI sees all available context facts and integrates them based on the prompt's instructions. - Knowledge base — index local files (built-in
local_documentsbackend) and Google Drive folders (googleplugin) into a ChromaDB vector store for semantic search. - Web search — the
tavilyplugin surfaces a/web search,/web images, and/web fetchcommand set for up-to-date answers grounded in real results. - Weather — the Weather service exposes
current_weather,forecast,weather_alerts, andgeocode_locationAI tools (plus slash-only/weather set_homeand/weather set_units). The defaultopen-meteoplugin needs no API key and covers global current + hourly + daily forecasts. The interface accommodates NWS (US severe-weather alerts) and OpenWeatherMap as future plugins without breaking changes. - External notification fan-out — every notification persisted by the Notifications service is also dispatched, on a per-user opt-in basis, to external push providers via the
push_notificationsservice. Users add "notification routes" on/account/notifications(ntfy,pushover,discord-webhook, andtelegramplugins ship out of the box) with per-route urgency floors, source allow/deny lists, and tz-aware quiet hours. The fan-out runs through a bounded queue + worker pool so a slow provider can never back-pressure the in-app dispatcher; URGENT-failure exhaustion escalates to an in-apppush_failurenotification so operators see drops without re-implementing alerting. - OCR — the
tesseractplugin extracts text from images locally (no network, no API key) for document indexing and vision workflows. - Public tunnel — the
ngrokplugin provides a public HTTPS URL so OAuth callbacks (Google login, Slack Socket Mode) work behind NAT. - Internal URL (LAN OAuth) — the core internal-URL service hands OAuth flows a LAN-reachable DNS hostname via sslip.io (
192-168-1-50.sslip.io→ that IP), so providers that reject raw IP redirect URIs (e.g. Google) work without a public tunnel. Auto-detects the host's outbound IP; the Google auth backend'scallback_url_sourcepickstunnel,internal_url, or the request origin. Vendor-free, no API key. - Slack bridge — the
slackplugin connects a Socket Mode bot so users can chat with Gilbert from Slack DMs and mentions, with the same tool access as the web UI. - MCP (Model Context Protocol) — Gilbert is both an MCP client (connect to external MCP servers, per-server RBAC, OAuth 2.1 support, stdio + streamable HTTP + SSE transports, with the external tools merged into Gilbert's own AI pipeline) and an MCP server (expose Gilbert's own tools to external agents like Claude Desktop or Cursor over a bearer-authenticated endpoint at
/api/mcp, with per-client owner identity and AI profile filtering). - Remote screens — push content (PDFs, images, HTML) to browser-based displays via SSE (core).
- Personalized greetings, roasts, scheduled jobs, RBAC, interactive tool forms — all core services.
- AI usage reporting — every AI round's token consumption (input / output / cache creation / cache read) and USD cost is recorded to the
ai_token_usageentity collection. Per-round and per-turn totals render inline in chat; an admin-only/usagepage groups by user / backend / model / profile / tool / date with filterable bar and area charts. - Plugin system — add runtime integrations from any GitHub URL via
/plugin install, with automatic dependency resolution through the uv workspace. Plugins that need new Python packages trigger a supervised restart; plugins without extra deps hot-load immediately. - Self-improvement proposals — Gilbert observes his own usage (events, conversations, in-chat notes) and on a schedule asks his most capable AI profile to propose concrete improvements: new plugins, new core services, or config tweaks that would close the gaps he just saw. Each proposal lands in the admin-only
/proposalspage with a self-contained "implementation prompt" — paste it into a fresh Claude Code session and it has everything needed to build the thing. Gilbert can read his own source tree (read-only, path-allowlisted) while reflecting, so suggestions are grounded in what's actually there. He's biased toward additive changes (new plugins) over edits to his core code; an opt-inallow_core_modificationsflag lets him propose deeper changes when you trust him to.
Every component in Gilbert is defined as a Python ABC (abstract base class) with one or more concrete implementations. The core never depends on a specific integration — it depends on the interface. All vendor-specific implementations live in the gilbert-plugins submodule; core ships only with vendor-free backends (local auth, local filesystem documents, local + browser speaker playback, local Whisper speech-to-text, MCP transports).
Interface (core) → Implementation (plugin)
────────────────────────────────────────────────
AIBackend → anthropic plugin → AnthropicAI (Claude)
openai plugin → OpenAIAI (GPT)
qwen plugin → QwenAI (Alibaba DashScope)
deepseek plugin → DeepSeekAI (V3 + R1)
groq plugin → GroqAI (Llama/Qwen/Mixtral on LPUs)
mistral plugin → MistralAI (La Plateforme)
xai plugin → XAIAI (Grok 4 / 3 / 2 Vision)
openrouter plugin → OpenRouterAI (~200 models, multi-provider)
ollama plugin → OllamaAI (local / self-hosted open-weights)
gemini plugin → GeminiAI (Google Gemini 2.5)
bedrock plugin → BedrockAI (AWS Bedrock Converse API)
VisionBackend → anthropic plugin → AnthropicVision
TTSBackend → elevenlabs plugin → ElevenLabsTTS
BatchTranscriptionBackend → core (local_whisper / faster-whisper, no API key)
openai plugin → OpenAIWhisperBackend (whisper-1 / gpt-4o-transcribe)
groq plugin → GroqWhisperBackend (whisper-large-v3 family)
elevenlabs plugin → ElevenLabsScribeBackend (scribe_v1, diarization)
StreamingTranscriptionBackend → elevenlabs plugin → ElevenLabsScribeLiveBackend (WebSocket)
deepgram plugin → DeepgramBackend (Nova-3 WebSocket)
WakeWordBackend → porcupine plugin → PorcupineBackend (Picovoice, custom .ppn)
openwakeword plugin → OpenWakeWordBackend (local ONNX, no API key)
SpeakerBackend → core (LocalSpeaker, BrowserSpeaker) + sonos plugin → SonosSpeaker
MusicBackend → sonos plugin → SonosMusic (Spotify via Sonos)
PresenceBackend → unifi plugin → UniFiPresenceBackend (Network + Protect + Access)
DoorbellBackend → unifi plugin → UniFiDoorbellBackend
CameraEventBackend → frigate plugin → FrigateCameraBackend (MQTT push + HTTP snapshots/clips)
EmailBackend → google plugin → GmailBackend
CalendarBackend → google plugin → GoogleCalendarBackend
TaskBackend → core (LocalTaskBackend) + google plugin (GoogleTasksBackend)
HealthBackend → apple-health plugin → AppleHealthBackend (push via iOS Shortcut)
withings plugin → WithingsBackend (OAuth pull)
hk-webhook plugin → HKWebhookBackend (generic catch-all push)
DocumentBackend → core (LocalDocuments) + google plugin (GDriveDocuments)
AuthBackend → core (LocalAuth) + google plugin (GoogleAuthBackend)
UserProviderBackend → google plugin → GoogleDirectoryBackend
WebSearchBackend → tavily plugin → TavilySearch
WeatherBackend → open-meteo plugin → OpenMeteoWeather
OCRBackend → tesseract plugin → TesseractOCR
TunnelBackend → ngrok plugin → NgrokTunnel
InternalUrlBackend → core (SslipInternalUrlBackend — LAN hostname for OAuth redirects)
MCPBackend → core (stdio, http, sse, browser — consume external MCP servers)
StorageBackend → core → SQLiteStorage
Want to add support for a different speaker system, AI provider, or presence detector? Implement the interface in a new plugin (or an existing one) and Gilbert picks it up through the backend registry on the next boot. Callers never change.
Services are the building blocks of Gilbert. Each service declares its capabilities (what it provides) and dependencies (what it needs), and the service manager handles lifecycle, ordering, and discovery.
class GreetingService(Service):
def info(self) -> ServiceInfo:
return ServiceInfo(
name="greeting",
capabilities=frozenset({"greeting", "ai_tools"}),
dependencies=frozenset({"speaker_control", "presence", "tts"}),
)Services are started in dependency order and stopped in reverse. Any service can discover others at runtime through capability queries — no hardcoded references.
Services communicate through a publish-subscribe event bus with pattern matching. When someone arrives home, the presence service publishes presence.arrived. The greeting service hears it and welcomes them.
presence.arrived → GreetingService (personalized welcome)
doorbell.ring → DoorbellService (announce visitor on speakers)
email.received → InboxAIChatService (AI processes the email)
This decoupled design means new services can react to existing events without modifying the publishers.
Gilbert's AI service runs an agentic tool-use loop. Services that implement the ToolProvider protocol automatically expose their capabilities as AI-callable tools. The AI can chain multiple tools in a single conversation turn — search for a song, play it on a specific speaker group, and announce it over TTS, all from one natural language request.
AI context profiles control which tools are available for different interaction types. A sales agent profile might only see the sales_lead tool, while a human chat profile sees everything except sales tools. Profiles are managed at runtime under Security → AI Profiles or via AI tools themselves.
Subagent types are entity-backed, admin-managed agent definitions. Each type bundles a system prompt, model/backend selection, tool gating, round/time budget, and an execution mode (sync for inline results, background for long-running tasks that deliver a report file). The spawn_agent tool launches a subagent by type ID; the AI picks the type based on the task. Types are seeded with 10 built-ins (general-purpose, deep-research, software-engineer, code-reviewer, etc.) and are editable, resettable, and extensible by admins under Security → Subagent Types.
Tools are filtered through two layers:
- Profile filtering — which tools are available for this type of interaction
- RBAC filtering — which tools this user's role is allowed to invoke
Gilbert sits on both sides of MCP:
- As an MCP client, Gilbert connects out to external MCP servers (stdio subprocesses, HTTP streamable, or SSE) and merges their tools into its own agentic pipeline. Servers are configured per-user with a
scope(private / shared / public) plus an optionalallowed_userslist; a supervisor loop reconnects with exponential backoff; OAuth 2.1 with dynamic client registration handles authenticated servers; each external server can optionally be allowed to request sampling (completions from Gilbert's AI) under a named profile with a token budget cap. - As an MCP server, Gilbert exposes its own tools at
/api/mcpfor external agents (Claude Desktop, Cursor, etc.). Admins register client tokens under MCP → Clients, each bound to an owner user and an AI profile. Tool discovery and invocation run under the owner'sUserContextthrough the exact same profile + RBAC pipeline as chat, so an external agent never sees more than what its owner could. The defaultmcp_server_clientprofile ships empty (include-mode with zero tools) — new clients can authenticate but call nothing until an admin adds tools to the profile, so the fail-safe for untrusted integrations is "no access." - Browser-bridged local servers. Gilbert can also consume MCP servers running on a user's own machine without any tunnel or inbound firewall hole. A user configures
{slug, name, url}entries under MCP → Local (stored in browser localStorage — the URL never reaches the server), and on every WebSocket connect the tab announces the available slugs. When the AI calls a tool from one of those servers, Gilbert sends anmcp.bridge.callframe over the authenticated WebSocket; the browser proxies the JSON-RPC body to the local URL viafetchand streams the response back. These session-ephemeral servers live in a per-user in-memory registry, are strictly private to the owner (invisible even to admins), and are torn down the moment the tab disconnects — the perfect shape for personal tools a user wants available only during an active session.
Both sides honour the same "if you can see it, you can use it" principle and the same multi-user ownership model.
Tools can push structured forms directly into the chat UI — text inputs, dropdowns, radio buttons, checkboxes, range sliders, and button groups. A tool returns a ToolOutput instead of a plain string, and the form renders inline after the AI's response. When the user submits, the values flow back through the normal conversation as a structured message the AI can process.
This enables richer workflows without leaving the chat: configuring settings, confirming actions, picking from options, or filling out multi-field forms — all driven by tools, not by the AI generating markup.
Gilbert exposes a bidirectional WebSocket at /ws/events that serves as the primary real-time channel for the web UI, inter-Gilbert communication, and external integrations.
All messages are JSON frames with a type field using namespace.resource.verb naming:
gilbert.event — server pushes bus events to clients
gilbert.welcome — sent after auth with user identity and roles
gilbert.sub.add — subscribe to event patterns (glob matching)
gilbert.sub.remove — unsubscribe from patterns
gilbert.ping/pong — heartbeat (30s interval)
gilbert.peer.publish — peer instances publish events to the local bus
chat.message.send — send a chat message (RPC with .result response)
chat.form.submit — submit a UI form
chat.history.load — load conversation as turn-grouped history
Events are filtered server-side through three layers: pattern matching (client subscribes to what it cares about), role-based visibility (admin/user/everyone tiers), and content filtering (chat membership and message visibility).
Authentication supports both session cookies (web UI) and bearer tokens via query parameter (peers and integrations). Peer Gilbert instances can authenticate, subscribe to filtered events, and publish events that propagate to the local bus.
Gilbert uses a generic entity store — not raw SQL tables. Entities are stored as typed documents with indexes and foreign keys, all through an abstract StorageBackend interface. The default implementation is SQLite, but the interface is designed to be swappable. New entity types require no migrations.
Every third-party integration is a plugin in the gilbert-plugins repository, included as a git submodule at std-plugins/. The full inventory — with each plugin's backend names, third-party deps, configuration keys, and slash commands — lives in std-plugins/README.md. At a glance:
| Plugin | What it adds |
|---|---|
| anthropic | Claude AI and Vision backends (default for chat and image understanding) |
| arr | Radarr + Sonarr services for movie/TV library management from chat |
| deepgram | Deepgram Nova streaming speech-to-text backend |
| elevenlabs | High-quality TTS + Scribe batch and streaming speech-to-text backends |
| frigate | Frigate NVR object-detection events via MQTT, plus snapshot/clip retrieval over HTTP |
| OAuth login, Workspace directory sync, Gmail backend, Google Drive documents, Google Calendar, Google Tasks | |
| groq | Groq LPU chat backend + Groq Whisper batch speech-to-text backend |
| guess-that-song | Multiplayer music guessing game managed by the AI |
| jellyfin | Jellyfin media library + casting — MediaLibraryBackend parity with Plex |
| ngrok | Public HTTPS tunnel for OAuth callbacks and webhooks |
| ntfy / pushover / discord-webhook / telegram | External notification fan-out backends |
| open-meteo | Weather backend (no API key) — current, hourly, and daily forecasts |
| openai | OpenAI GPT chat backend + Whisper batch speech-to-text backend |
| openai-compatible | Vendor-neutral Chat Completions backend for vLLM, LM Studio, corporate proxies, and other endpoints without a dedicated plugin |
| openwakeword | Local wake-word detection (ONNX models, no API key required) |
| plex | Plex Media Server library + casting — search, recently-added, on-deck, episode resolution, playback dispatch via the media_library service |
| porcupine | Picovoice Porcupine wake-word detection (built-in and custom keywords) |
| slack | Socket Mode bot bridging Slack DMs/mentions to the AI service |
| sonos | Sonos speaker control and Sonos-linked music service access |
| tavily | Web search backend with AI-generated summaries |
| tesseract | Local OCR (offline, no API key) for document indexing |
| unifi | UniFi Network + Protect + Access aggregated into presence and doorbell backends |
Configuration for every plugin is done through the Gilbert Settings UI at System → Settings (/settings) — no file editing needed. The Settings UI reads each plugin's config_params(), renders type-appropriate inputs (with sensitive values masked), and persists changes to entity storage. See std-plugins/README.md for each plugin's configuration keys.
Vector database for the knowledge base (core dependency). Documents from local files and Google Drive are chunked, embedded, and indexed for semantic search. The AI uses this to answer questions grounded in your actual documents. ChromaDB runs in-process via the chromadb Python package — no external service required.
Plugins extend Gilbert without modifying core code. A plugin is a directory containing a plugin.yaml manifest, a plugin.py entry point with a create_plugin() factory, its own pyproject.toml declaring third-party Python deps, and the actual integration source. Gilbert loads every plugin discovered under the configured plugin directories at startup.
See std-plugins/README.md for the full template and a walkthrough of adding a new plugin.
Gilbert scans three directories at startup, each with a distinct role:
| Directory | Purpose | Tracked in gilbert? |
|---|---|---|
std-plugins/ |
First-party plugins from briandilley/gilbert-plugins. Included as a git submodule. |
Submodule pointer |
local-plugins/ |
Plugins you write yourself for this installation. | No (gitignored) |
installed-plugins/ |
Plugins installed at runtime via /plugin install <url>. |
No (gitignored) |
Every subdirectory of each of these that contains a plugin.yaml is loaded at startup. Every plugin also must carry its own pyproject.toml — those are all uv workspace members of the root Gilbert project, so a single uv sync resolves and installs every plugin's third-party Python deps into the shared venv.
First-party plugins live in a separate repository so they can be versioned, reviewed, and released independently of Gilbert's core. Cloning Gilbert fresh:
git clone https://github.com/briandilley/gilbert.git
cd gilbert
git submodule update --init --recursive # populates std-plugins/
uv sync # installs core + every plugin's depsOr just run ./gilbert.sh start — the launcher script auto-runs git submodule update --init --recursive if std-plugins/ is empty, then uv sync, then boots Gilbert.
New integrations should be opened as pull requests against the gilbert-plugins repo, not this one. Plugin PRs should include the plugin directory complete with plugin.yaml, plugin.py, pyproject.toml, backend source, and tests, plus an updated entry in std-plugins/README.md so the plugin inventory stays accurate — see the existing plugins for the pattern.
Admins can install plugins at runtime from any GitHub URL via the /plugin install <url> slash command (or the plugins.install WebSocket RPC). Gilbert fetches the plugin into installed-plugins/<name>/, validates it, and either:
- Hot-loads it immediately if the plugin's
pyproject.tomldeclares no third-party Python dependencies (nothing new for the venv to resolve), or - Defers loading until restart if it has deps. In that case, run
/plugin restartto trigger Gilbert's supervised restart loop —gilbert.shre-runsuv sync(which picks up the new workspace member), relaunches Gilbert, and the boot-time loader imports the plugin with its deps now available.
Uninstall with /plugin uninstall <name>; list with /plugin list.
Gilbert includes a React SPA with pages for chat, inbox, MCP administration, security (users / roles / tool permissions / AI profiles / collection ACLs / event visibility / RPC permissions), and system operations (settings / scheduler / entity browser / plugins / service inspector / AI usage reporting). All data operations use the WebSocket protocol — the only HTTP endpoints are authentication (OAuth callbacks, login), the raw ASGI MCP endpoint (/api/mcp), and static file serving. The SPA connects to /ws/events on load and communicates exclusively via typed RPC frames.
Top-level navigation is organized into dropdown groups (Chat · Inbox · MCP · Security · System) that render as a horizontal nav on desktop and a drawer on mobile. Menu items are filtered per-user by the dashboard.get RPC so users only see what they can actually access — e.g. non-admins don't see Security or System at all, and users without the mcp_server capability enabled don't see MCP → Clients. Clicking a parent group lands on its default child (Users for Security, Settings for System, Servers for MCP).
# Install dev tools (ruff, mypy, pytest-cov)
uv sync --extra dev
# Run all tests — includes every std-plugin's tests via pyproject.toml testpaths
uv run pytest
# Run tests with coverage
uv run pytest --cov=gilbert
# Run tests for a single plugin
uv run pytest std-plugins/tesseract/tests/ -v
# Type checking
uv run mypy src/
# Linting
uv run ruff check src/ tests/
# Formatting
uv run ruff format src/ tests/Plugin tests live inside each plugin's tests/ directory and are collected automatically because pyproject.toml lists std-plugins, local-plugins, and installed-plugins in testpaths. When adding a new plugin, put its tests under <plugin>/tests/ and create a conftest.py that registers the plugin directory as a Python package for pytest — see std-plugins/tesseract/tests/conftest.py for the single-module case or std-plugins/unifi/tests/conftest.py for a multi-module plugin with relative imports.
See CLAUDE.md for full architecture documentation, design decisions, and development guidelines. See std-plugins/CLAUDE.md for plugin-specific conventions.
MIT