Skip to content

creadone/karma

Repository files navigation

Karma

Karma is a small TCP database for hot time-series counters. It is designed for cases where an application needs fresh pre-aggregated counters without hitting a heavier analytical store on every request.

Typical use case:

application reads link metadata
  -> application asks Karma for counters for many link ids
  -> client receives links with fresh click counters

Karma keeps data in memory, persists it with snapshots and WAL, and exposes a newline-delimited JSON protocol over TCP.

Russian version: README.ru.md.

Status

Karma is currently best understood as a production-oriented hot counter read model, not as a general-purpose TSDB.

Supported today:

  • day-bucketed counters with YYYYMMDD buckets;
  • single-key writes and reads;
  • batch reads and batch writes;
  • streaming ingest for rebuild/backfill flows;
  • atomic snapshots and WAL replay;
  • recovery checkpoints and reconciliation reporting;
  • async master -> slave replication through snapshot bootstrap and WAL polling;
  • Prometheus-style operational metrics.

Important boundaries:

  • command execution is serialized by one process-local state lock;
  • replication is asynchronous and manual-failover only;
  • there is no automatic leader election, quorum, or master-master mode;
  • object-storage snapshot transport and replication.subscribe are not part of the current implementation.

For production use, run with a persistent volume, WAL enabled, --wal-fsync=true, health checks, metrics scraping, and regular snapshot.create_all or SIGUSR1 snapshots.

Build

Requirements:

  • Crystal 1.17.1
  • Shards

Build:

shards build --release

The binary is created at:

bin/karma

Docker

Build:

docker build -t karma:local .

Run:

docker run --rm \
  -p 8080:8080 \
  -v karma-data:/data \
  karma:local \
  --bind=0.0.0.0 \
  --port=8080 \
  --directory=/data \
  --restore=true \
  --wal=true \
  --wal-fsync=true

Run

Recommended master node:

bin/karma \
  --bind=0.0.0.0 \
  --port=8080 \
  --directory=/var/lib/karma \
  --role=master \
  --restore=true \
  --wal=true \
  --wal-fsync=true \
  --auth-token=write-secret \
  --read-auth-token=read-secret

The same configuration can be provided through environment variables. Command line options are applied after environment variables and override them:

KARMA_HOST=0.0.0.0 \
KARMA_PORT=8080 \
KARMA_DUMP_DIR=/var/lib/karma \
KARMA_RESTORE=true \
KARMA_WAL=true \
KARMA_WAL_FSYNC=true \
bin/karma

Configuration

Boolean values use true or false. Timeout values use seconds unless the option name ends with -ms.

CLI option Env var Default Description
--bind=host KARMA_HOST 0.0.0.0 Host to bind.
--port=port KARMA_PORT 8080 TCP port.
--directory=path KARMA_DUMP_DIR . Directory for snapshots, WAL, and metadata.
--role=master|slave KARMA_ROLE master Node role.
--restore=true|false KARMA_RESTORE true Restore snapshots and replay WAL on startup.
--nodelay=true|false KARMA_TCP_NODELAY true Enable TCP_NODELAY.
--wal=true|false KARMA_WAL true Persist mutating commands to WAL.
--wal-fsync=true|false KARMA_WAL_FSYNC true Fsync every WAL append/truncate.
--max-request-bytes=bytes KARMA_MAX_REQUEST_BYTES 4096 Maximum JSON request line size. Must be greater than 0.
--max-response-bytes=bytes KARMA_MAX_RESPONSE_BYTES 1048576 Maximum JSON response size. Use 0 to disable.
--read-timeout=seconds KARMA_READ_TIMEOUT_SECONDS 5 Client socket read timeout. Use 0 to disable.
--write-timeout=seconds KARMA_WRITE_TIMEOUT_SECONDS 5 Client socket write timeout. Use 0 to disable.
--query-timeout-ms=ms KARMA_QUERY_TIMEOUT_MS 1000 Timeout for expensive tree-level reads. Use 0 to disable.
--shutdown-timeout=seconds KARMA_SHUTDOWN_TIMEOUT_SECONDS 5 Graceful shutdown drain timeout.
--auth-token=token KARMA_AUTH_TOKEN unset Token required for all commands. Empty env value disables it.
--read-auth-token=token KARMA_READ_AUTH_TOKEN unset Token allowed only for read-only commands. Empty env value disables it.
--dump-retention-per-tree=count KARMA_DUMP_RETENTION_PER_TREE 5 Snapshots to keep per series after snapshot.create_all.
--replication-source-host=host KARMA_REPLICATION_SOURCE_HOST unset Master host used by slave polling.
--replication-source-port=port KARMA_REPLICATION_SOURCE_PORT 8080 Master port used by slave polling.
--replication-token=token KARMA_REPLICATION_TOKEN unset Token used by slave replication requests.
--replication-poll-interval-ms=ms KARMA_REPLICATION_POLL_INTERVAL_MS 1000 Slave polling interval.
--replication-batch-size=count KARMA_REPLICATION_BATCH_SIZE 1000 Maximum WAL entries fetched by one slave poll. Max: 10000.
--log=true|false KARMA_LOG true Emit structured JSON logs.

Protocol

Karma speaks newline-delimited JSON over TCP:

  • one request is one JSON object followed by \n;
  • one response is one JSON object followed by \r\n.

Protocol v2 is the preferred protocol for new clients. It uses v: 2, namespaced op values, and series/key/bucket/value terminology:

{"v":2,"op":"counter.increment","series":"links","key":42,"bucket":20260505,"value":1}

The legacy v1 protocol remains supported for compatibility and WAL replay. Legacy requests use command, tree_name, date, and time_from/time_to. New clients should use v2.

Response:

{
  "protocol_version": 2,
  "success": true,
  "response": "OK",
  "error_code": null
}

Error response:

{
  "protocol_version": 2,
  "success": false,
  "response": "Field tree or series is required",
  "error_code": "validation_error"
}

Stable error codes:

  • invalid_json
  • unknown_command
  • validation_error
  • not_found
  • unauthorized
  • forbidden
  • request_too_large
  • response_too_large
  • query_timeout
  • replication_gap
  • replication_error
  • internal_error

If --auth-token is configured, include token in every client request. If --read-auth-token is configured, that token can execute read-only commands only. Tokens are not written to WAL.

Data Model

  • A series is a named collection of counters. The storage layer and legacy API still use the word tree.
  • A key is an unsigned 64-bit integer inside a series.
  • A bucket is a UTC day in YYYYMMDD format, for example 20260505.
  • A value is an unsigned 64-bit integer.
  • Increment/decrement commands use today's UTC bucket when bucket is omitted.
  • Counter values never go below zero.

Read commands do not create missing series. Missing series return not_found. For an existing series, reading a missing key returns zero or an empty result.

Command Examples

Basic Counters

Create a series:

{"v":2,"op":"tree.create","series":"links"}

Increment today's counter:

{"v":2,"op":"counter.increment","series":"links","key":42,"value":1}

Increment an explicit bucket:

{"v":2,"op":"counter.increment","series":"links","key":42,"bucket":20260505,"value":1}

Decrement:

{"v":2,"op":"counter.decrement","series":"links","key":42,"bucket":20260505,"value":1}

Read a key total:

{"v":2,"op":"counter.sum","series":"links","key":42}

Read a date range:

{"v":2,"op":"counter.sum","series":"links","key":42,"range":{"from":20260501,"to":20260505}}

Read daily points:

{"v":2,"op":"counter.series","series":"links","key":42,"range":{"from":20260501,"to":20260505}}

Batch Reads and Writes

Read many totals in one request:

{"v":2,"op":"counter.batch_sum","series":"links","keys":[41,42,43]}

Read many totals for a range:

{"v":2,"op":"counter.batch_sum","series":"links","keys":[41,42,43],"range":{"from":20260501,"to":20260505}}

Add many [key, bucket, value] items:

{"v":2,"op":"series.batch_add","series":"links","items":[[42,20260505,10],[43,20260505,3]]}

Large batch requests must fit --max-request-bytes.

Tree/Series Inspection

List series:

{"v":2,"op":"tree.list"}

Inspect one series:

{"v":2,"op":"tree.info","series":"links"}

Return keys with cursor pagination:

{"v":2,"op":"tree.keys","series":"links","limit":1000,"cursor":0}

Return top keys:

{"v":2,"op":"tree.top","series":"links","limit":100}

Return summary:

{"v":2,"op":"tree.summary","series":"links","range":{"from":20260501,"to":20260505}}

Retention and Maintenance

Delete old buckets:

{"v":2,"op":"series.delete_before","series":"links","before":20260401}

Compact a series:

{"v":2,"op":"series.compact","series":"links"}

Compact all series:

{"v":2,"op":"system.compact"}

Reset one key or a whole series:

{"v":2,"op":"counter.reset","series":"links","key":42}
{"v":2,"op":"tree.reset","series":"links"}

Delete a date range:

{"v":2,"op":"counter.delete_range","series":"links","key":42,"range":{"from":20260501,"to":20260505}}
{"v":2,"op":"tree.delete_range","series":"links","range":{"from":20260501,"to":20260505}}

Streaming Ingest

Streaming ingest is useful for rebuilds, backfills, and large imports. Supported modes:

  • add: add item values to the live series;
  • set: set item bucket values in the live series;
  • replace_series: build a staged series and atomically replace the live series on commit.

Example:

{"v":2,"op":"ingest.begin","stream_id":"import-20260505","mode":"add","granularity":"day"}
{"v":2,"op":"ingest.chunk","stream_id":"import-20260505","series":"links","chunk_seq":1,"items":[[42,20260505,10]]}
{"v":2,"op":"ingest.commit","stream_id":"import-20260505"}

Abort an active stream:

{"v":2,"op":"ingest.abort","stream_id":"import-20260505"}

Duplicate chunks are skipped. Out-of-order chunks are rejected before they are applied. A stream is bound to the series used by its first chunk.

Snapshots, WAL, and Recovery

Karma uses two persistence mechanisms:

  • snapshots: MessagePack .tree files, one per series;
  • WAL: newline-delimited JSON entries in karma.wal.

Create and inspect snapshots:

{"v":2,"op":"snapshot.create","series":"links"}
{"v":2,"op":"snapshot.create_all"}
{"v":2,"op":"snapshot.list"}
{"v":2,"op":"snapshot.info"}

Load and fetch snapshots:

{"v":2,"op":"snapshot.load","file":"1777925811_links.tree"}
{"v":2,"op":"snapshot.fetch","file":"1777925811_links.tree"}
{"v":2,"op":"snapshot.fetch_chunk","file":"1777925811_links.tree","offset":0,"limit":262144}

Verify the restore path:

{"v":2,"op":"snapshot.verify"}

snapshot.verify restores data into a temporary cluster and checks:

  • snapshot sidecar metadata;
  • latest snapshot last_lsn consistency;
  • WAL LSN continuity;
  • snapshot/WAL boundaries;
  • persisted karma.wal.lsn.

New WAL lines use an LSN envelope:

{"v":2,"lsn":1,"entry":{"v":2,"op":"counter.increment","tree":"links","key":42,"date":20260505,"value":1}}

Each new snapshot has a sidecar metadata file named <snapshot>.meta.json. It records file, tree, timestamp, bytes, and last_lsn.

Startup with --restore=true:

  1. Load the latest snapshot per series.
  2. Replay WAL entries.
  3. On slave nodes, initialize karma.replication.lsn from snapshot metadata before polling the master.

snapshot.create_all writes atomic snapshots, fsyncs them, truncates WAL after successful snapshotting, and prunes old snapshots according to --dump-retention-per-tree.

Recovery checkpoints can record external source positions such as ClickHouse export ids or durable queue offsets:

{"v":2,"op":"recovery.checkpoint","source":"clickhouse-links","offset":"export-2026-05-05","event_id":"batch-42"}
{"v":2,"op":"recovery.status"}
{"v":2,"op":"recovery.status","source":"clickhouse-links"}

External reconciliation jobs can report drift back to Karma:

{"v":2,"op":"reconciliation.report","checked_points":1000,"mismatch_count":2,"absolute_drift":15,"max_abs_delta":10}

Replication

Karma supports async master -> slave replication through snapshot bootstrap and WAL polling.

Start a slave:

bin/karma \
  --role=slave \
  --port=8081 \
  --directory=/var/lib/karma-slave \
  --restore=true \
  --replication-source-host=127.0.0.1 \
  --replication-source-port=8080 \
  --replication-token=read-secret

If the slave data directory has no local snapshots and --restore=true, the slave fetches the latest master snapshots through snapshot.fetch_chunk, sets karma.replication.lsn from snapshot metadata, and then polls replication.entries.

Useful commands:

{"v":2,"op":"replication.status"}
{"v":2,"op":"replication.entries","after_lsn":120,"limit":1000}

replication.entries is bounded by both limit and the master's max_response_bytes. If the byte budget cuts the page, the response includes truncated_by_bytes: true, and next_lsn points to the last returned entry.

Operational notes:

  • slave nodes reject direct mutating client commands;
  • failover is manual;
  • stop the old master before promoting a slave;
  • rebuild remaining slaves from the promoted master;
  • watch karma_replication_lag_entries, karma_replication_poll_errors_total, and karma_replication_last_poll_success_unix.

Detailed runbook: docs/replication-operations-runbook.md.

Metrics and Health

Basic health:

{"v":2,"op":"system.ping"}
{"v":2,"op":"system.health"}

Operational stats:

{"v":2,"op":"system.stats"}

Prometheus-style metrics:

{"v":2,"op":"system.metrics"}

Metric groups include:

  • uptime, role, memory, trees, keys, snapshots;
  • WAL bytes and current LSN;
  • command counts, errors, latency, and protocol v1 usage;
  • batch read/write counters;
  • retention and compaction counters;
  • ingest stream counters and latency;
  • reconciliation and recovery counters;
  • replication lag, replayed LSN, polling/bootstrap success and error counters.

Client Examples

Using nc:

printf '{"v":2,"op":"counter.increment","series":"links","key":42,"value":1}\n' | nc 127.0.0.1 8080
printf '{"v":2,"op":"counter.sum","series":"links","key":42}\n' | nc 127.0.0.1 8080

Using Crystal:

require "json"
require "socket"

socket = TCPSocket.new("127.0.0.1", 8080)
socket << {v: 2, op: "counter.increment", series: "links", key: 42_u64, value: 1_u64}.to_json << "\n"
puts socket.gets
socket.close

Using Ruby:

require "json"
require "socket"

socket = TCPSocket.new("127.0.0.1", 8080)
socket.write({v: 2, op: "counter.sum", series: "links", key: 42}.to_json + "\n")
puts socket.gets
socket.close

Performance Checks

Local results depend on CPU, disk, filesystem, container runtime, network, and workload mix. The scripts below are intended as repeatable local checks, not as universal benchmarks.

Last recorded local results from 2026-05-05:

Test Mode Throughput p95 latency
single_increment in-process, WAL off 289,908 ops/sec 0.004 ms
single_sum in-process, WAL off 403,080 ops/sec 0.0026 ms
series.batch_add in-process, WAL off 1,917,010 items/sec 0.9878 ms
counter.batch_sum in-process, WAL off 2,389,520 key reads/sec 0.8685 ms
tcp_single_increment TCP, 4 clients, WAL on, fsync on 12,818 ops/sec 0.5561 ms
tcp_single_sum TCP, 4 clients, WAL on, fsync on 41,340 ops/sec 0.1339 ms
tcp_series.batch_add TCP, 4 clients, WAL on, fsync on 744,551 items/sec 2.9103 ms
tcp_counter.batch_sum TCP, 4 clients, WAL on, fsync on 1,708,312 key reads/sec 1.5604 ms

Volume sensitivity test, in-process, WAL off, 7 daily buckets per key:

Keys Data points Heap Snapshot Batch sum Batch p95 Summary Snapshot Restore
10,000 70,000 8.23 MiB 0.57 MiB 2,255,027 key reads/sec 0.8820 ms 4.59 ms 4.59 ms 2.87 ms
50,000 350,000 28.48 MiB 2.86 MiB 1,778,740 key reads/sec 0.4863 ms 33.41 ms 20.35 ms 15.18 ms
100,000 700,000 47.70 MiB 5.79 MiB 1,558,846 key reads/sec 0.4809 ms 88.58 ms 42.81 ms 32.21 ms

High-cardinality yearly profile, in-process, WAL off, 356 daily buckets per key:

Keys Data points Heap Snapshot Batch sum Batch p95 Summary Snapshot Restore
10,000 3,560,000 234.11 MiB 20.58 MiB 2,299,570 key reads/sec 4.4254 ms 140.71 ms 80.07 ms 115.79 ms
25,000 8,900,000 570.14 MiB 51.45 MiB 1,936,145 key reads/sec 8.9875 ms 401.43 ms 201.91 ms 323.58 ms
50,000 17,800,000 1,114.17 MiB 102.90 MiB 1,918,123 key reads/sec 2.3001 ms 960.62 ms 426.69 ms 561.90 ms

Replication load test on the same date used clients=4, keys=10000, batch_size=1000, write_batches=100, read_rounds=100, replication_poll_interval_ms=10, and replication_batch_size=1000. The slave bootstrapped from snapshot, replayed WAL from LSN 10 to LSN 110, ended with final_lag_entries=0, and matched the master total: master_total=110000, slave_total=110000.

In-process command-layer test:

crystal build --release scripts/load_test.cr -o bin/karma_load_test
bin/karma_load_test

TCP loopback test:

crystal build --release scripts/tcp_load_test.cr -o bin/karma_tcp_load_test
bin/karma_tcp_load_test \
  --clients=4 \
  --wal=true \
  --wal-fsync=false

Volume sensitivity test:

crystal build --release scripts/volume_load_test.cr -o bin/karma_volume_load_test
bin/karma_volume_load_test \
  --sizes=10000,50000,100000 \
  --bucket-count=7 \
  --batch-size=1000 \
  --single-rounds=1000 \
  --read-rounds=100

High-cardinality yearly profile with 356 buckets per key:

bin/karma_volume_load_test --profile=year-356

Master/slave replication test:

shards build --release
crystal build --release scripts/replication_load_test.cr -o bin/karma_replication_load_test
bin/karma_replication_load_test \
  --binary=bin/karma \
  --clients=4 \
  --keys=10000 \
  --batch-size=1000 \
  --write-batches=100 \
  --read-rounds=100

CSV reconciliation against exported aggregates:

crystal run scripts/reconcile_csv.cr -- \
  --host=127.0.0.1 \
  --port=8080 \
  --series=links \
  --csv=clickhouse-links.csv \
  --report

Signals

  • SIGINT: stop accepting new TCP clients, dump all series, truncate WAL after successful snapshots, and exit with status 0.
  • SIGUSR1: dump all series, truncate WAL after successful snapshots, and keep running.

Development

Run tests:

crystal spec
crystal spec lib/counter_tree/spec

Build:

shards build --release

The counter_tree library is vendored in lib/counter_tree, so counter storage changes can be developed and tested inside this repository.

License

MIT

About

DB for hot time-series counters

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors