GitHub - creadone/karma: DB for hot time-series counters

Karma

Karma is a small TCP database for hot time-series counters. It is designed for cases where an application needs fresh pre-aggregated counters without hitting a heavier analytical store on every request.

Typical use case:

application reads link metadata
  -> application asks Karma for counters for many link ids
  -> client receives links with fresh click counters

Karma keeps data in memory, persists it with snapshots and WAL, and exposes a newline-delimited JSON protocol over TCP.

Russian version: README.ru.md.

Status

Karma is currently best understood as a production-oriented hot counter read model, not as a general-purpose TSDB.

Supported today:

day-bucketed counters with YYYYMMDD buckets;
single-key writes and reads;
batch reads and batch writes;
streaming ingest for rebuild/backfill flows;
atomic snapshots and WAL replay;
recovery checkpoints and reconciliation reporting;
async master -> slave replication through snapshot bootstrap and WAL polling;
Prometheus-style operational metrics.

Important boundaries:

command execution is serialized by one process-local state lock;
replication is asynchronous and manual-failover only;
there is no automatic leader election, quorum, or master-master mode;
object-storage snapshot transport and replication.subscribe are not part of the current implementation.

For production use, run with a persistent volume, WAL enabled, --wal-fsync=true, health checks, metrics scraping, and regular snapshot.create_all or SIGUSR1 snapshots.

Build

Requirements:

Crystal 1.17.1
Shards

Build:

shards build --release

The binary is created at:

bin/karma

Docker

Build:

docker build -t karma:local .

Run:

docker run --rm \
  -p 8080:8080 \
  -v karma-data:/data \
  karma:local \
  --bind=0.0.0.0 \
  --port=8080 \
  --directory=/data \
  --restore=true \
  --wal=true \
  --wal-fsync=true

Run

Recommended master node:

bin/karma \
  --bind=0.0.0.0 \
  --port=8080 \
  --directory=/var/lib/karma \
  --role=master \
  --restore=true \
  --wal=true \
  --wal-fsync=true \
  --auth-token=write-secret \
  --read-auth-token=read-secret

The same configuration can be provided through environment variables. Command line options are applied after environment variables and override them:

KARMA_HOST=0.0.0.0 \
KARMA_PORT=8080 \
KARMA_DUMP_DIR=/var/lib/karma \
KARMA_RESTORE=true \
KARMA_WAL=true \
KARMA_WAL_FSYNC=true \
bin/karma

Configuration

Boolean values use true or false. Timeout values use seconds unless the option name ends with -ms.

CLI option	Env var	Default	Description
`--bind=host`	`KARMA_HOST`	`0.0.0.0`	Host to bind.
`--port=port`	`KARMA_PORT`	`8080`	TCP port.
`--directory=path`	`KARMA_DUMP_DIR`	`.`	Directory for snapshots, WAL, and metadata.
`--role=master\|slave`	`KARMA_ROLE`	`master`	Node role.
`--restore=true\|false`	`KARMA_RESTORE`	`true`	Restore snapshots and replay WAL on startup.
`--nodelay=true\|false`	`KARMA_TCP_NODELAY`	`true`	Enable TCP_NODELAY.
`--wal=true\|false`	`KARMA_WAL`	`true`	Persist mutating commands to WAL.
`--wal-fsync=true\|false`	`KARMA_WAL_FSYNC`	`true`	Fsync every WAL append/truncate.
`--max-request-bytes=bytes`	`KARMA_MAX_REQUEST_BYTES`	`4096`	Maximum JSON request line size. Must be greater than 0.
`--max-response-bytes=bytes`	`KARMA_MAX_RESPONSE_BYTES`	`1048576`	Maximum JSON response size. Use 0 to disable.
`--read-timeout=seconds`	`KARMA_READ_TIMEOUT_SECONDS`	`5`	Client socket read timeout. Use 0 to disable.
`--write-timeout=seconds`	`KARMA_WRITE_TIMEOUT_SECONDS`	`5`	Client socket write timeout. Use 0 to disable.
`--query-timeout-ms=ms`	`KARMA_QUERY_TIMEOUT_MS`	`1000`	Timeout for expensive tree-level reads. Use 0 to disable.
`--shutdown-timeout=seconds`	`KARMA_SHUTDOWN_TIMEOUT_SECONDS`	`5`	Graceful shutdown drain timeout.
`--auth-token=token`	`KARMA_AUTH_TOKEN`	unset	Token required for all commands. Empty env value disables it.
`--read-auth-token=token`	`KARMA_READ_AUTH_TOKEN`	unset	Token allowed only for read-only commands. Empty env value disables it.
`--dump-retention-per-tree=count`	`KARMA_DUMP_RETENTION_PER_TREE`	`5`	Snapshots to keep per series after `snapshot.create_all`.
`--replication-source-host=host`	`KARMA_REPLICATION_SOURCE_HOST`	unset	Master host used by slave polling.
`--replication-source-port=port`	`KARMA_REPLICATION_SOURCE_PORT`	`8080`	Master port used by slave polling.
`--replication-token=token`	`KARMA_REPLICATION_TOKEN`	unset	Token used by slave replication requests.
`--replication-poll-interval-ms=ms`	`KARMA_REPLICATION_POLL_INTERVAL_MS`	`1000`	Slave polling interval.
`--replication-batch-size=count`	`KARMA_REPLICATION_BATCH_SIZE`	`1000`	Maximum WAL entries fetched by one slave poll. Max: 10000.
`--log=true\|false`	`KARMA_LOG`	`true`	Emit structured JSON logs.

Protocol

Karma speaks newline-delimited JSON over TCP:

one request is one JSON object followed by \n;
one response is one JSON object followed by \r\n.

Protocol v2 is the preferred protocol for new clients. It uses v: 2, namespaced op values, and series/key/bucket/value terminology:

{"v":2,"op":"counter.increment","series":"links","key":42,"bucket":20260505,"value":1}

The legacy v1 protocol remains supported for compatibility and WAL replay. Legacy requests use command, tree_name, date, and time_from/time_to. New clients should use v2.

Response:

{
  "protocol_version": 2,
  "success": true,
  "response": "OK",
  "error_code": null
}

Error response:

{
  "protocol_version": 2,
  "success": false,
  "response": "Field tree or series is required",
  "error_code": "validation_error"
}

Stable error codes:

invalid_json
unknown_command
validation_error
not_found
unauthorized
forbidden
request_too_large
response_too_large
query_timeout
replication_gap
replication_error
internal_error

If --auth-token is configured, include token in every client request. If --read-auth-token is configured, that token can execute read-only commands only. Tokens are not written to WAL.

Data Model

A series is a named collection of counters. The storage layer and legacy API still use the word tree.
A key is an unsigned 64-bit integer inside a series.
A bucket is a UTC day in YYYYMMDD format, for example 20260505.
A value is an unsigned 64-bit integer.
Increment/decrement commands use today's UTC bucket when bucket is omitted.
Counter values never go below zero.

Read commands do not create missing series. Missing series return not_found. For an existing series, reading a missing key returns zero or an empty result.

Command Examples

Basic Counters

Create a series:

{"v":2,"op":"tree.create","series":"links"}

Increment today's counter:

{"v":2,"op":"counter.increment","series":"links","key":42,"value":1}

Increment an explicit bucket:

{"v":2,"op":"counter.increment","series":"links","key":42,"bucket":20260505,"value":1}

Decrement:

{"v":2,"op":"counter.decrement","series":"links","key":42,"bucket":20260505,"value":1}

Read a key total:

{"v":2,"op":"counter.sum","series":"links","key":42}

Read a date range:

{"v":2,"op":"counter.sum","series":"links","key":42,"range":{"from":20260501,"to":20260505}}

Read daily points:

{"v":2,"op":"counter.series","series":"links","key":42,"range":{"from":20260501,"to":20260505}}

Batch Reads and Writes

Read many totals in one request:

{"v":2,"op":"counter.batch_sum","series":"links","keys":[41,42,43]}

Read many totals for a range:

{"v":2,"op":"counter.batch_sum","series":"links","keys":[41,42,43],"range":{"from":20260501,"to":20260505}}

Add many [key, bucket, value] items:

{"v":2,"op":"series.batch_add","series":"links","items":[[42,20260505,10],[43,20260505,3]]}

Large batch requests must fit --max-request-bytes.

Tree/Series Inspection

List series:

{"v":2,"op":"tree.list"}

Inspect one series:

{"v":2,"op":"tree.info","series":"links"}

Return keys with cursor pagination:

{"v":2,"op":"tree.keys","series":"links","limit":1000,"cursor":0}

Return top keys:

{"v":2,"op":"tree.top","series":"links","limit":100}

Return summary:

{"v":2,"op":"tree.summary","series":"links","range":{"from":20260501,"to":20260505}}

Retention and Maintenance

Delete old buckets:

{"v":2,"op":"series.delete_before","series":"links","before":20260401}

Compact a series:

{"v":2,"op":"series.compact","series":"links"}

Compact all series:

{"v":2,"op":"system.compact"}

Reset one key or a whole series:

{"v":2,"op":"counter.reset","series":"links","key":42}
{"v":2,"op":"tree.reset","series":"links"}

Delete a date range:

{"v":2,"op":"counter.delete_range","series":"links","key":42,"range":{"from":20260501,"to":20260505}}
{"v":2,"op":"tree.delete_range","series":"links","range":{"from":20260501,"to":20260505}}

Streaming Ingest

Streaming ingest is useful for rebuilds, backfills, and large imports. Supported modes:

add: add item values to the live series;
set: set item bucket values in the live series;
replace_series: build a staged series and atomically replace the live series on commit.

Example:

{"v":2,"op":"ingest.begin","stream_id":"import-20260505","mode":"add","granularity":"day"}
{"v":2,"op":"ingest.chunk","stream_id":"import-20260505","series":"links","chunk_seq":1,"items":[[42,20260505,10]]}
{"v":2,"op":"ingest.commit","stream_id":"import-20260505"}

Abort an active stream:

{"v":2,"op":"ingest.abort","stream_id":"import-20260505"}

Duplicate chunks are skipped. Out-of-order chunks are rejected before they are applied. A stream is bound to the series used by its first chunk.

Snapshots, WAL, and Recovery

Karma uses two persistence mechanisms:

snapshots: MessagePack .tree files, one per series;
WAL: newline-delimited JSON entries in karma.wal.

Create and inspect snapshots:

{"v":2,"op":"snapshot.create","series":"links"}
{"v":2,"op":"snapshot.create_all"}
{"v":2,"op":"snapshot.list"}
{"v":2,"op":"snapshot.info"}

Load and fetch snapshots:

{"v":2,"op":"snapshot.load","file":"1777925811_links.tree"}
{"v":2,"op":"snapshot.fetch","file":"1777925811_links.tree"}
{"v":2,"op":"snapshot.fetch_chunk","file":"1777925811_links.tree","offset":0,"limit":262144}

Verify the restore path:

{"v":2,"op":"snapshot.verify"}

snapshot.verify restores data into a temporary cluster and checks:

snapshot sidecar metadata;
latest snapshot last_lsn consistency;
WAL LSN continuity;
snapshot/WAL boundaries;
persisted karma.wal.lsn.

New WAL lines use an LSN envelope:

{"v":2,"lsn":1,"entry":{"v":2,"op":"counter.increment","tree":"links","key":42,"date":20260505,"value":1}}

Each new snapshot has a sidecar metadata file named <snapshot>.meta.json. It records file, tree, timestamp, bytes, and last_lsn.

Startup with --restore=true:

Load the latest snapshot per series.
Replay WAL entries.
On slave nodes, initialize karma.replication.lsn from snapshot metadata before polling the master.

snapshot.create_all writes atomic snapshots, fsyncs them, truncates WAL after successful snapshotting, and prunes old snapshots according to --dump-retention-per-tree.

Recovery checkpoints can record external source positions such as ClickHouse export ids or durable queue offsets:

{"v":2,"op":"recovery.checkpoint","source":"clickhouse-links","offset":"export-2026-05-05","event_id":"batch-42"}
{"v":2,"op":"recovery.status"}
{"v":2,"op":"recovery.status","source":"clickhouse-links"}

External reconciliation jobs can report drift back to Karma:

{"v":2,"op":"reconciliation.report","checked_points":1000,"mismatch_count":2,"absolute_drift":15,"max_abs_delta":10}

Replication

Karma supports async master -> slave replication through snapshot bootstrap and WAL polling.

Start a slave:

bin/karma \
  --role=slave \
  --port=8081 \
  --directory=/var/lib/karma-slave \
  --restore=true \
  --replication-source-host=127.0.0.1 \
  --replication-source-port=8080 \
  --replication-token=read-secret

If the slave data directory has no local snapshots and --restore=true, the slave fetches the latest master snapshots through snapshot.fetch_chunk, sets karma.replication.lsn from snapshot metadata, and then polls replication.entries.

Useful commands:

{"v":2,"op":"replication.status"}
{"v":2,"op":"replication.entries","after_lsn":120,"limit":1000}

replication.entries is bounded by both limit and the master's max_response_bytes. If the byte budget cuts the page, the response includes truncated_by_bytes: true, and next_lsn points to the last returned entry.

Operational notes:

slave nodes reject direct mutating client commands;
failover is manual;
stop the old master before promoting a slave;
rebuild remaining slaves from the promoted master;
watch karma_replication_lag_entries, karma_replication_poll_errors_total, and karma_replication_last_poll_success_unix.

Detailed runbook: docs/replication-operations-runbook.md.

Metrics and Health

Basic health:

{"v":2,"op":"system.ping"}
{"v":2,"op":"system.health"}

Operational stats:

{"v":2,"op":"system.stats"}

Prometheus-style metrics:

{"v":2,"op":"system.metrics"}

Metric groups include:

uptime, role, memory, trees, keys, snapshots;
WAL bytes and current LSN;
command counts, errors, latency, and protocol v1 usage;
batch read/write counters;
retention and compaction counters;
ingest stream counters and latency;
reconciliation and recovery counters;
replication lag, replayed LSN, polling/bootstrap success and error counters.

Client Examples

Using nc:

printf '{"v":2,"op":"counter.increment","series":"links","key":42,"value":1}\n' | nc 127.0.0.1 8080
printf '{"v":2,"op":"counter.sum","series":"links","key":42}\n' | nc 127.0.0.1 8080

Using Crystal:

require "json"
require "socket"

socket = TCPSocket.new("127.0.0.1", 8080)
socket << {v: 2, op: "counter.increment", series: "links", key: 42_u64, value: 1_u64}.to_json << "\n"
puts socket.gets
socket.close

Using Ruby:

require "json"
require "socket"

socket = TCPSocket.new("127.0.0.1", 8080)
socket.write({v: 2, op: "counter.sum", series: "links", key: 42}.to_json + "\n")
puts socket.gets
socket.close

Performance Checks

Local results depend on CPU, disk, filesystem, container runtime, network, and workload mix. The scripts below are intended as repeatable local checks, not as universal benchmarks.

Last recorded local results from 2026-05-05:

Test	Mode	Throughput	p95 latency
`single_increment`	in-process, WAL off	289,908 ops/sec	0.004 ms
`single_sum`	in-process, WAL off	403,080 ops/sec	0.0026 ms
`series.batch_add`	in-process, WAL off	1,917,010 items/sec	0.9878 ms
`counter.batch_sum`	in-process, WAL off	2,389,520 key reads/sec	0.8685 ms
`tcp_single_increment`	TCP, 4 clients, WAL on, fsync on	12,818 ops/sec	0.5561 ms
`tcp_single_sum`	TCP, 4 clients, WAL on, fsync on	41,340 ops/sec	0.1339 ms
`tcp_series.batch_add`	TCP, 4 clients, WAL on, fsync on	744,551 items/sec	2.9103 ms
`tcp_counter.batch_sum`	TCP, 4 clients, WAL on, fsync on	1,708,312 key reads/sec	1.5604 ms

Volume sensitivity test, in-process, WAL off, 7 daily buckets per key:

Keys	Data points	Heap	Snapshot	Batch sum	Batch p95	Summary	Snapshot	Restore
10,000	70,000	8.23 MiB	0.57 MiB	2,255,027 key reads/sec	0.8820 ms	4.59 ms	4.59 ms	2.87 ms
50,000	350,000	28.48 MiB	2.86 MiB	1,778,740 key reads/sec	0.4863 ms	33.41 ms	20.35 ms	15.18 ms
100,000	700,000	47.70 MiB	5.79 MiB	1,558,846 key reads/sec	0.4809 ms	88.58 ms	42.81 ms	32.21 ms

High-cardinality yearly profile, in-process, WAL off, 356 daily buckets per key:

Keys	Data points	Heap	Snapshot	Batch sum	Batch p95	Summary	Snapshot	Restore
10,000	3,560,000	234.11 MiB	20.58 MiB	2,299,570 key reads/sec	4.4254 ms	140.71 ms	80.07 ms	115.79 ms
25,000	8,900,000	570.14 MiB	51.45 MiB	1,936,145 key reads/sec	8.9875 ms	401.43 ms	201.91 ms	323.58 ms
50,000	17,800,000	1,114.17 MiB	102.90 MiB	1,918,123 key reads/sec	2.3001 ms	960.62 ms	426.69 ms	561.90 ms

Replication load test on the same date used clients=4, keys=10000, batch_size=1000, write_batches=100, read_rounds=100, replication_poll_interval_ms=10, and replication_batch_size=1000. The slave bootstrapped from snapshot, replayed WAL from LSN 10 to LSN 110, ended with final_lag_entries=0, and matched the master total: master_total=110000, slave_total=110000.

In-process command-layer test:

crystal build --release scripts/load_test.cr -o bin/karma_load_test
bin/karma_load_test

TCP loopback test:

crystal build --release scripts/tcp_load_test.cr -o bin/karma_tcp_load_test
bin/karma_tcp_load_test \
  --clients=4 \
  --wal=true \
  --wal-fsync=false

Volume sensitivity test:

crystal build --release scripts/volume_load_test.cr -o bin/karma_volume_load_test
bin/karma_volume_load_test \
  --sizes=10000,50000,100000 \
  --bucket-count=7 \
  --batch-size=1000 \
  --single-rounds=1000 \
  --read-rounds=100

High-cardinality yearly profile with 356 buckets per key:

bin/karma_volume_load_test --profile=year-356

Master/slave replication test:

shards build --release
crystal build --release scripts/replication_load_test.cr -o bin/karma_replication_load_test
bin/karma_replication_load_test \
  --binary=bin/karma \
  --clients=4 \
  --keys=10000 \
  --batch-size=1000 \
  --write-batches=100 \
  --read-rounds=100

CSV reconciliation against exported aggregates:

crystal run scripts/reconcile_csv.cr -- \
  --host=127.0.0.1 \
  --port=8080 \
  --series=links \
  --csv=clickhouse-links.csv \
  --report

Signals

SIGINT: stop accepting new TCP clients, dump all series, truncate WAL after successful snapshots, and exit with status 0.
SIGUSR1: dump all series, truncate WAL after successful snapshots, and keep running.

Development

Run tests:

crystal spec
crystal spec lib/counter_tree/spec

Build:

shards build --release

The counter_tree library is vendored in lib/counter_tree, so counter storage changes can be developed and tested inside this repository.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
docs		docs
lib/counter_tree		lib/counter_tree
scripts		scripts
spec		spec
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README.ru.md		README.ru.md
shard.lock		shard.lock
shard.yml		shard.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Karma

Status

Build

Docker

Run

Configuration

Protocol

Data Model

Command Examples

Basic Counters

Batch Reads and Writes

Tree/Series Inspection

Retention and Maintenance

Streaming Ingest

Snapshots, WAL, and Recovery

Replication

Metrics and Health

Client Examples

Performance Checks

Signals

Development

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Karma

Status

Build

Docker

Run

Configuration

Protocol

Data Model

Command Examples

Basic Counters

Batch Reads and Writes

Tree/Series Inspection

Retention and Maintenance

Streaming Ingest

Snapshots, WAL, and Recovery

Replication

Metrics and Health

Client Examples

Performance Checks

Signals

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages