This guide is for AI agents developing Remind itself. For using Remind as a memory layer, see docs/AGENTS.md.
Remind is a generalization-capable memory layer for LLMs. It differs from simple RAG by extracting and maintaining generalized concepts from episodic experiences, mimicking how human memory consolidates specific events into abstract knowledge.
Core insight: Episodes → Consolidation (LLM-powered "sleep") → Concepts with relations
src/remind/
├── models.py # Data models (Concept, Episode, Entity, Relation, Topic)
├── store.py # SQLite persistence layer
├── interface.py # MemoryInterface - main public API
├── config.py # Configuration loading (config file, env vars, defaults)
├── consolidation.py # LLM-powered episode → concept transformation
├── extraction.py # Entity/type extraction from episodes
├── retrieval.py # Spreading activation retrieval
├── reranker.py # Optional cross-encoder reranking (requires [rerank] extra)
├── triage.py # Auto-ingest: buffered intake + density scoring
├── cli.py # Command-line interface (project-aware)
├── mcp_server.py # MCP (Model Context Protocol) server
├── background.py # Background consolidation spawning
├── background_worker.py # Subprocess entry point for background consolidation
├── api/ # REST API for web UI
│ ├── __init__.py # Exports api_routes
│ └── routes.py # Starlette route handlers
├── static/ # Web UI assets (compiled)
│ ├── index.html # Entry point
│ └── assets/ # CSS/JS bundles
└── providers/ # LLM and embedding provider implementations
├── base.py # Abstract base classes
├── anthropic.py # Claude
├── openai.py # OpenAI
├── azure_openai.py # Azure OpenAI
└── ollama.py # Local models via Ollama
| Model | Purpose |
|---|---|
Episode |
Raw experience/interaction. Temporary, gets consolidated. |
Concept |
Generalized knowledge with confidence, relations, conditions. Has concept_type: pattern, fact_cluster, or legacy. |
ConceptType |
Enum: LEGACY, PATTERN, FACT_CLUSTER. Determines how concepts are created and displayed. |
Entity |
External referent (file, person, concept, tool). Format: type:name |
Relation |
Typed edge between concepts (implies, contradicts, specializes, etc.) |
Topic |
Named knowledge area grouping episodes/concepts. Has id (slug), name, description. |
Episode types: observation, decision, question, meta, preference, outcome, fact
Episode lifecycle: Created via remember() or ingest() → Entity extraction → Consolidation → Marked consolidated
Fact episodes: Specific factual assertions (config values, names, dates, concrete technical details). Consolidation preserves fact details verbatim in concept summaries rather than generalizing them away.
Outcome episode metadata: strategy (approach used), result (success/failure/partial), prediction_error (low/medium/high)
Entity ID format: type:name (e.g., file:src/auth.ts, person:alice, concept:caching)
Two abstract base classes:
class LLMProvider(ABC):
async def complete(prompt, system, temperature, max_tokens) -> str
async def complete_json(prompt, system, temperature, max_tokens) -> dict
@property name: str
class EmbeddingProvider(ABC):
async def embed(text) -> list[float]
async def embed_batch(texts) -> list[list[float]]
@property dimensions: int
@property name: strAdding a new provider: Implement these interfaces. See ollama.py for a complete example with error handling.
The main entry point. Key design decisions:
remember()is synchronous and fast - no LLM calls, just stores episodeingest()is async with LLM triage - buffers, extracts episodes, immediately consolidatesconsolidate()does all LLM work - extraction + generalization- Two consolidation modes: Automatic (threshold-based) or manual (hook-based)
# Consolidation happens in two phases:
# 1. Extract entities/types from unextracted episodes
# 2. Generalize episodes into concepts (the "sleep" process)The "brain" of the system. Uses dual-track processing:
Fact track (no LLM generalization):
- Fact episodes are clustered by shared entity
- Existing fact_clusters are looked up via entity recall
- Facts are preserved verbatim in
specificsfield - Conflicts are detected and flagged (not auto-resolved)
Pattern track (LLM generalization):
- Classify episode types (observation, decision, question, meta, preference, outcome)
- Extract entity mentions from natural language
- Identify patterns across episodes
- Create generalized concepts with relations
- Detect contradictions
- Identify causal patterns in outcome episodes (strategy-outcome relations)
The "input selection" subsystem. Two classes:
IngestionBuffer— character-threshold accumulator. Buffers raw text until threshold (~4000 chars).IngestionTriager— LLM-based episode extraction. The LLM decides directly what's worth remembering, extracts distilled episodes, and detects action-result pairs as outcome episodes. A density score (0.0-1.0) is produced for diagnostics but doesn't gate extraction.
Topics: Topics are explicit and optional. When ingest() is called with an explicit topic, all extracted episodes are stamped with that topic. When topic is omitted, episodes get topic_id=None. Sub-chunks are triaged concurrently, bounded by llm_concurrency.
Instructions: The instructions parameter (optional) lets callers steer what the triage LLM extracts. When provided, instructions are appended to the triage system prompt as a prioritized directive. This enables focused ingestion (e.g. "extract only architectural decisions", "capture all config values"). Threaded through the full pipeline: ingest()/flush_ingest() → _process_ingest_chunk() → _triage_sub_chunk() → IngestionTriager.triage(). Also serialized in the background queue JSON payload for CLI background workers.
Pipeline: ingest() → buffer → triage + extract (LLM) → remember() → consolidate(force=True)
Hybrid recall with spreading activation + entity-based episode retrieval:
- Query is embedded and matched to concepts via cosine similarity (native vector index when available)
- Embedding scores are optionally fused with keyword overlap scores (
hybrid_keyword_weightconfig, default 0.3) - Matched concepts activate related concepts through the graph
- Activation spreads with decay over multiple hops
- Optional cross-encoder reranking blends activation scores with query-document relevance (
reranking_enabledconfig, requires[rerank]extra) - Highest-activation concepts are returned with source episodes (including type labels and entity context)
Key class: MemoryRetriever with ActivatedConcept results.
Helper function: _keyword_score(query, text) — normalized token overlap for hybrid scoring.
Reranking (reranker.py): Optional cross-encoder post-processing. When reranking_enabled=True, a Reranker wrapping sentence-transformers CrossEncoder rescores candidates after spreading activation. Blending: 0.4 × activation + 0.6 × rerank_score. Model is lazy-loaded on first recall. Requires pip install "remind-mcp[rerank]".
Multi-database persistence via SQLAlchemy Core. Supports SQLite (default), PostgreSQL, and MySQL. Tables:
concepts- Stores concepts with JSON-serialized relationsepisodes- Raw episodes with consolidation statusentities- Entity registrymentions- Episode-entity junction tablerelations- Concept-to-concept graph edgesentity_relations- Entity-to-entity relationshipsmetadata- Persistent key-value store
Vector search backends (auto-detected at startup):
- sqlite-vec (SQLite):
vec0virtual tables (vec_concepts,vec_episodes) with cosine distance KNN. Extension is loaded viasqlite_vec.load()on connection creation. - pgvector (PostgreSQL):
vector(N)columns (embedding_vec) with HNSW indexes and<=>cosine distance operator. - Brute-force fallback: If neither extension is available, falls back to Python-side numpy cosine similarity (O(n) per query).
Embedding dimensions are recorded in the metadata table on first write. Vector tables/columns are created lazily when the first embedding is stored.
The MemoryStore ABC defines the interface. SQLAlchemyMemoryStore is the concrete implementation (aliased as SQLiteMemoryStore for backward compatibility).
Centralized configuration management with provider-specific settings:
REMIND_DIR = Path.home() / ".remind"
CONFIG_FILE = REMIND_DIR / "remind.config.json"
@dataclass
class AnthropicConfig:
api_key: Optional[str] = None
model: str = "claude-sonnet-4-20250514"
ingest_model: Optional[str] = None
@dataclass
class OpenAIConfig:
api_key: Optional[str] = None
base_url: Optional[str] = None
model: str = "gpt-4.1"
embedding_model: str = "text-embedding-3-small"
ingest_model: Optional[str] = None
@dataclass
class AzureOpenAIConfig:
api_key: Optional[str] = None
base_url: Optional[str] = None # e.g. https://myresource.openai.azure.com (/openai/v1 appended automatically)
deployment_name: Optional[str] = None
embedding_deployment_name: Optional[str] = None
embedding_size: int = 1536
ingest_deployment_name: Optional[str] = None
@dataclass
class OllamaConfig:
url: str = "http://localhost:11434"
llm_model: str = "llama3.2"
embedding_model: str = "nomic-embed-text"
ingest_model: Optional[str] = None
@dataclass
class RemindConfig:
llm_provider: str = "anthropic"
embedding_provider: str = "openai"
consolidation_threshold: int = 5
concepts_per_pass: int = 64
auto_consolidate: bool = True
extraction_batch_size: int = 50
extraction_llm_batch_size: int = 10
consolidation_batch_size: int = 25
llm_concurrency: int = 3
# Database URL (SQLAlchemy format; None = SQLite default)
db_url: Optional[str] = None
# Auto-ingest settings
ingest_buffer_size: int = 4000
# Retrieval tuning
hybrid_keyword_weight: float = 0.3
recall_initial_candidates: int = 10
# Reranking (requires `pip install "remind-mcp[rerank]"`)
reranking_enabled: bool = False
reranking_model: str = "cross-encoder/ms-marco-MiniLM-L-6-v2"
# Logging
logging_enabled: bool = False
# Episode types (controls which types are valid + gates CLI/MCP features)
episode_types: list[str] = field(default_factory=lambda: list(DEFAULT_EPISODE_TYPES))
# Nested provider configs
anthropic: AnthropicConfig
openai: OpenAIConfig
azure_openai: AzureOpenAIConfig
ollama: OllamaConfig
def load_config() -> RemindConfig:
"""Priority: env vars > config file > defaults"""
def resolve_db_url(db_name: Optional[str], project_aware: bool = False) -> str:
"""Resolve a database name/path/URL to a SQLAlchemy database URL.
- Full URL (postgresql://..., mysql://...): returned as-is
- db_name provided: sqlite:///~/.remind/{name}.db
- db_name=None + project_aware=True: sqlite:///<cwd>/.remind/remind.db
- db_name=None + project_aware=False: sqlite:///~/.remind/memory.db
"""
def resolve_db_path(db_name: Optional[str], project_aware: bool = False) -> str:
"""Legacy function - resolves to a file path (SQLite only)."""Database URL (db_url): Supports any SQLAlchemy-compatible database URL. Env var: REMIND_DB_URL. Examples:
- SQLite (default):
sqlite:///~/.remind/memory.db - PostgreSQL:
postgresql+psycopg://user:pass@localhost:5432/remind - MySQL:
mysql+pymysql://user:pass@localhost:3306/remind
Config file format (~/.remind/remind.config.json):
{
"llm_provider": "anthropic",
"embedding_provider": "openai",
"consolidation_threshold": 5,
"auto_consolidate": true,
"hybrid_keyword_weight": 0.3,
"logging_enabled": false,
"cli_output_mode": "table",
"db_url": null,
"anthropic": { "api_key": "sk-ant-..." },
"openai": { "api_key": "sk-..." },
"azure_openai": { "api_key": "...", "base_url": "..." },
"ollama": { "url": "http://localhost:11434" }
}cli_output_mode may be table (default), json, or compact-json (minimal id/title/summary for browse commands).
Non-blocking consolidation for CLI:
spawn_background_consolidation()- Spawns a subprocess to consolidate- Uses
filelockfor cross-platform file locking to prevent concurrent runs - Lock files stored at
~/.remind/.consolidate-{hash}.lock - Logs to
~/.remind/logs/consolidation.log
The CLI automatically triggers background consolidation after remember when the episode threshold is reached.
- All LLM operations are async
remember()is deliberately sync (fast path)- Use
asyncio.run()in CLI, native async in MCP server
- Full type hints on all public APIs
- Use
Optional[T]explicitly, notT | Nonefor consistency - Dataclasses with
field(default_factory=...)for mutable defaults
- Providers handle their own retries and rate limiting
- Store operations raise on critical errors, return None for "not found"
- Consolidation is fault-tolerant (individual episode failures don't abort)
- Use
logging.getLogger(__name__)pattern - Debug for operation traces, Info for consolidation events, Warning for recoverable issues
- All models have
to_dict()andfrom_dict()class methods - Enums serialize to their string value
- Datetimes serialize to ISO format
Tests are in tests/ using pytest. Key patterns:
# Temporary database fixture
@pytest.fixture
def store():
fd, path = tempfile.mkstemp(suffix=".db")
os.close(fd)
store = SQLiteMemoryStore(path)
yield store
os.unlink(path)Run tests:
pytest # All tests
pytest tests/test_store.py # Specific file
pytest -v # VerboseNote: Tests requiring LLM calls should use mocks or be marked as integration tests.
- Create
providers/newprovider.py - Implement
LLMProviderand/orEmbeddingProvider - Add to
providers/__init__.pyexports - Add to
interface.pyfactory map increate_memory() - Update CLI in
cli.pyif needed - Document in README.md
- Add to
EpisodeTypeenum inmodels.py - Update extraction prompt in
extraction.py - Update consolidation prompts in
consolidation.py - Add MCP tool if type-specific querying is useful
Note: episode_types config controls which types are active. Custom types (strings not in EpisodeType enum) are also supported.
- Add to
EntityTypeenum inmodels.py - Update extraction prompt in
extraction.py - No other changes needed (entities are dynamically typed)
- Add to
RelationTypeenum inmodels.py - Update consolidation prompts to use new relation
- Consider retrieval implications (spreading activation weights)
- Add function to
mcp_server.pyusing FastMCP decorators - Document in
docs/AGENTS.md - Test via MCP client
- Add route handler function to
api/routes.py - Add route to
api_routeslist at bottom of file - Use
_get_memory_from_request()helper to get MemoryInterface - Return
JSONResponsefor data orStreamingResponsefor SSE
The REST API uses Starlette and serves the web UI. Endpoints:
GET /api/v1/stats- Memory statisticsGET /api/v1/concepts- Paginated concepts listGET /api/v1/concepts/{id}- Concept detail with source episodesGET /api/v1/episodes- Paginated episodes with filtersGET /api/v1/episodes/{id}- Episode detailGET /api/v1/entities- All entities with mention countsGET /api/v1/entities/{id}- Entity detailGET /api/v1/entities/{id}/episodes- Episodes mentioning entityGET /api/v1/entities/{id}/concepts- Concepts derived from entity's episodesGET /api/v1/graph- Full concept graph for D3 visualizationPOST /api/v1/query- Execute recall queryPOST /api/v1/chat- Streaming chat with memory context (SSE)GET /api/v1/databases- List available databases
# Clone and install in development mode
git clone <repo>
cd remind
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# Set up environment
cp .env.example .env
# Edit .env with API keys
# Run tests
pytest
# Run CLI
remind --help
# Run MCP server
remind-mcp --port 8765uv is a fast Python package manager that simplifies development:
# Install dependencies and run tests
uv run pytest
# Run CLI commands
uv run remind --help
uv run remind remember "Some observation"
uv run remind recall "query"
# Run MCP server
uv run remind-mcp --port 8765With uv, you don't need to manually create a virtual environment or install dependencies - it handles everything automatically.
IMPORTANT: Prefix all shell commands, including 'uv' ones with oo.
so e.g. 'oo uv run pytest'.
- Separation of concerns: Storage, providers, consolidation, retrieval are independent
- Fast path for writes:
remember()never blocks on LLM - Batch LLM work: Consolidation processes multiple episodes together
- Graceful degradation: Missing embeddings fall back to keyword matching
- Provider agnostic: Core logic doesn't depend on specific LLM/embedding provider
- Explicit over implicit: Episode types/entities can be auto-detected or manually specified
When making changes, these files are most commonly modified together:
| Change Type | Files |
|---|---|
| New data structure | models.py, store.py |
| New provider | providers/newprovider.py, providers/__init__.py, interface.py |
| Consolidation logic | consolidation.py, extraction.py |
| Retrieval behavior | retrieval.py, reranker.py, config.py |
| Auto-ingest pipeline | triage.py, llm_protocol.py, interface.py, config.py |
| CLI commands | cli.py |
| Configuration | config.py, interface.py, mcp_server.py, cli.py |
| Background consolidation | background.py, background_worker.py, cli.py |
| MCP tools | mcp_server.py, docs/AGENTS.md |
| REST API endpoints | api/routes.py |
| Public API | interface.py, __init__.py, README.md |
- Consolidation issues: Check episode
entities_extractedandconsolidatedflags - Retrieval misses: Verify concepts have embeddings (
embeddingfield not None) - Entity linking: Entity IDs are case-sensitive, use canonical forms
- MCP issues: Check
dbquery parameter in MCP URL, verify server is running - Config issues: Check
~/.remind/remind.config.jsonexists and is valid JSON - Background consolidation: Check
~/.remind/logs/consolidation.logfor errors - CLI database path: Without
--db, uses<cwd>/.remind/remind.db(project-aware). Accepts database URLs (e.g.--db postgresql+psycopg://...)
CREATE TABLE concepts (
id TEXT PRIMARY KEY,
data JSON NOT NULL -- Serialized Concept
);
CREATE TABLE episodes (
id TEXT PRIMARY KEY,
data JSON NOT NULL -- Serialized Episode
);
CREATE TABLE entities (
id TEXT PRIMARY KEY,
data JSON NOT NULL -- Serialized Entity
);
CREATE TABLE mentions (
episode_id TEXT,
entity_id TEXT,
PRIMARY KEY (episode_id, entity_id)
);The store handles JSON serialization/deserialization transparently.