This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
High-performance embedded segmented log for edge observability. Built for single-digit microsecond latency and bounded resources.
-
Client - The main entry point for the package. Manages shards, handles writes/reads, and coordinates retention policies. Thread-safe for concurrent access.
-
Shard - Represents a single data stream partition. Each shard has its own set of files, index, and manages its own writes. Shards enable horizontal scaling and parallel processing.
-
Consumer - Provides high-level reading interface with consumer groups, automatic offset management, and batch processing capabilities. Supports exactly-once semantics through ACK tracking.
-
Reader - Low-level memory-mapped file reader with lock-free concurrent access. Uses atomic operations for dynamic file remapping as files grow.
-
ShardIndex - Maintains metadata for each shard including file locations, consumer offsets, and a binary searchable index for O(log n) entry lookups.
-
BinarySearchableIndex - Enables fast entry lookups by maintaining periodic checkpoints (every N entries). Reduces memory usage while providing efficient access patterns.
-
FileInfo - Tracks individual data files including byte ranges, entry counts, and timestamps. Used for file rotation and retention management.
-
CometState - Memory-mapped state file (1KB) per shard providing lock-free metrics and coordination. Cache-line aligned structure with comprehensive metrics across 15 cache lines.
-
StateRecovery - Validates and recovers from corrupted state files. Provides automatic recovery with synchronization between state and index.
-
ProcessID - Manages process slot assignment for multi-process deployments. Supports automatic dead process detection and slot reclamation.
-
Retention - Background goroutine for automated data lifecycle management. Implements time and size-based retention with consumer protection.
- Wire Format - Each entry consists of:
[uint32 length][uint64 timestamp][byte[] data]where length is data size only (excludes 12-byte header) - Compression - Automatic zstd compression for entries above configurable threshold. Detected by magic bytes (0x28B52FFD) at start of data
- File Rotation - Automatic rotation when files exceed size limits (default: 1GB)
- Smart Sharding - Built-in support for consistent hash-based sharding with helper functions for shard selection and distribution
- Multi-Process Safety - Process-based shard ownership for safe concurrent access from multiple processes
- Retention Management - Configurable time and size-based retention with protection for unconsumed data
- Metrics - Comprehensive metrics tracking including write latency, compression ratios, and consumer lag
From a clean architecture perspective:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Writer │────▶│ Buffer │────▶│ Disk │
└─────────────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Index │
└─────────────┘
The index should only be updated AFTER data is durable. Otherwise, we're lying about the system state.
- Index = Durable State (what's persisted to disk via index.CurrentEntryNumber)
- Consumers = Read via Index (only see durable data)
- Writers = Use
nextEntryNumber(for volatile entry assignment) - File Rotation = Uses
pendingWriteOffset, includes both synced and pending data - Periodic Flush = Updates both data AND index (making data visible to consumers)
- Tests = Call
Sync()explicitly when consumers need to read data, except when we're doing integration tests to simulate real flush and sync behaviors.
This architecture ensures:
- Crash Safety: Consumers never see data that could be lost on crash
- Consistency: Index always reflects what's actually persisted
- Performance: Writers don't block on disk I/O for each entry
- Reliability: System can recover to last known good state
For a more detailed explanation, please refer to the Architecture document.
IMPORTANT: Consumer offsets are stored separately from the writer's index to maintain architectural separation:
- Location:
shard-XXXX/offsets.state(memory-mapped file) - Structure: Fixed-size slots for up to 512 consumer groups per shard
- Access: Lock-free using atomic operations for multi-process safety
- Migration: Automatically migrates from old
offsets.binformat if found
Key points:
- Consumers NEVER write to the writer's
index.binfile - Offsets are immediately visible across processes (memory-mapped)
- No explicit sync needed - OS handles memory-mapped file persistence
- Consumer group names limited to 48 bytes
- Maximum 512 consumer groups per shard
The package uses a hierarchical configuration structure:
config := comet.DefaultCometConfig()
config.Compression.MinCompressSize = 1024 // Compression settings
config.Indexing.BoundaryInterval = 100 // Indexing behavior
config.Storage.MaxFileSize = 1 << 30 // File storage
config.Retention.MaxAge = 4 * time.Hour // Retention policyConvenience constructors are available:
HighCompressionConfig()- Optimizes for storage efficiencyMultiProcessConfig()- Enables multi-process coordinationHighThroughputConfig()- Optimizes for write performance
After making any significant change, run the following commands:
mise run lint # Run all linting (staticcheck, go vet, deadcode)
mise run test # Run tests with race detection
mise run bench # Run benchmarksOr the individual commands:
go vet ./...
go test ./... -race
go test ./... -bench=. -benchmem
deadcode -test ./...IMPORTANT: Always use the -race flag when running tests to detect data races and ensure thread safety.
- Generally speaking, do not write any code comments unless it's for documenting the function, e.g. jsdoc or godoc comments.
- Do not worry about formatting. We use
dprintfor this. - Prefer to use modern Golang, e.g.
anyoverinterface{}. We are using the latest versions of Go. - Try to add tests to existing test files if possible. If it really doesn't fit or files are getting too big, then you can create a new one.
- To avoid hanging tests/deadlocks, use the
-timeoutflag with a reasonable value, e.g.go test -timeout 10s ./.... - If you're writing Go code with structs that will be heavily used, make sure they are properly memory-aligned / padded. We have a test for this now.
- Our philosophy is "minimum effective abstraction". Keep things simple and avoid unnecessary complexity. Justify your decisions with benchmarks aka data and ensure that the code is easy to understand and maintain. Software should elegant.
- If you want to know if some Go code is likely to compile, use
go vet ./...instead ofgo build ./... - If you're writing tests and the tests are failing, before changing the test code, make sure it's the test that needs fixing and not the implementation.
- Tests should be comprehensive and cover failure modes, resource exhaustion, and the critical edge cases.
- To run the benchmarks we care most about not regressing, use
mise run bench:core - Run
go vet ./...,go test ./...,go test ./... -bench=. -benchmemafter significant changes. - Use
COMET_DEBUG=1to enable debug logging.