Observability, tracing, evals, and optimization signals for nullclaw.
nullwatch is the execution-intelligence layer in the null* stack. It does not run agents, it does not schedule work, and it does not manage UI. It ingests execution traces and eval results, stores them durably, and exposes them through a JSON HTTP API and CLI so nullhub or any other client can consume them.
nullclawexecutes work.nullticketsowns durable task state.nullboilerowns orchestration policy.nullhubowns install, config, and UI.nullwatchowns traces, evals, run summaries, costs, latency, and regression signals.
This repository intentionally stays headless. The product surface is:
- JSON HTTP API for ingestion and querying.
- CLI commands for local automation and scripts.
- File-backed storage for the bootstrap implementation.
UI belongs elsewhere, primarily in nullhub.
- Run and span ingest for
nullclawexecution telemetry. - Eval result ingest for scorers, rubrics, regression checks, and datasets.
- Run-level summaries for latency, errors, token usage, and cost.
- Machine-readable capabilities and summary endpoints.
- Headless workflows that a separate UI can compose.
- Agent runtime logic.
- Queue ownership or task lifecycle source of truth.
- Scheduling, balancing, routing, retries, or orchestration policy.
- Web UI, dashboards, or installer flows.
The implementation is intentionally small but already usable:
- Single Zig binary.
- Local JSONL persistence under
~/.nullwatch/databy default. - HTTP API on
127.0.0.1:7710by default. - CLI commands for ingesting spans/evals and querying runs, spans, evals, and summaries.
- OTLP/HTTP JSON ingest on
/v1/tracesand/otlp/v1/traces. nullhubintegration via--export-manifestand--from-json.
This gives you a real executable contract now, while keeping room to swap storage later for SQLite or another embedded engine without changing the product boundary.
A span represents one timed execution unit inside a run, for example:
- model call
- tool invocation
- memory lookup
- task transition bridge
- retry or fallback branch
Core fields:
run_idtrace_idspan_idparent_span_idsourceoperationstatusstarted_at_msended_at_msorduration_msmodel,tool_name,prompt_versioninput_tokens,output_tokens,cost_usd
An eval is a scored assertion attached to a run, for example:
- helpfulness
- policy compliance
- routing correctness
- tool success rate
- regression gate
Core fields:
run_ideval_keyscorerscoreverdictdatasetnotes
Run summaries are computed views over spans and evals:
- span count
- eval count
- error count
- total duration
- total cost
- total input/output tokens
- pass/fail counts
- overall verdict
Build:
zig buildRun the API server:
zig build run -- serveRun the API server on all interfaces:
zig build run -- serve --host 0.0.0.0 --port 7710Query summary:
zig build run -- summaryList runs:
zig build run -- runs --verdict pass --limit 20List spans:
zig build run -- spans --source nullclaw --tool-name shell --limit 50List evals:
zig build run -- evals --dataset prod-shadow --verdict failSeed local demo runs:
zig build run -- demo-seed
zig build run -- runs --limit 20
zig build run -- run demo-tool-failuredemo-seed creates a deterministic, idempotent local dataset for demos and
manual testing without API keys, hosted services, or a running agent workload.
It includes a passing code-review run, a failed tool-call run, and a
handoff/retry run with checkpoint context.
Ingest a span from the CLI:
zig build run -- ingest-span --json '{
"run_id": "run-123",
"trace_id": "trace-123",
"span_id": "span-1",
"source": "nullclaw",
"operation": "model.call",
"status": "ok",
"started_at_ms": 1710000000000,
"ended_at_ms": 1710000000320,
"model": "gpt-5",
"prompt_version": "reply-v3",
"input_tokens": 420,
"output_tokens": 96,
"cost_usd": 0.018
}'Ingest an eval:
zig build run -- ingest-eval --json '{
"run_id": "run-123",
"eval_key": "helpfulness",
"scorer": "llm-judge",
"score": 0.94,
"verdict": "pass",
"dataset": "prod-shadow"
}'Inspect a run:
zig build run -- run run-123curl http://127.0.0.1:7710/healthcurl http://127.0.0.1:7710/v1/capabilitiescurl -X POST http://127.0.0.1:7710/v1/spans \
-H 'content-type: application/json' \
-d '{
"run_id": "run-123",
"trace_id": "trace-123",
"span_id": "span-1",
"source": "nullclaw",
"operation": "tool.call",
"status": "ok",
"started_at_ms": 1710000000000,
"ended_at_ms": 1710000000140,
"tool_name": "bash"
}'curl -X POST http://127.0.0.1:7710/v1/spans/bulk \
-H 'content-type: application/json' \
-d '{
"items": [
{
"run_id": "run-123",
"trace_id": "trace-123",
"span_id": "span-1",
"source": "nullclaw",
"operation": "model.call",
"started_at_ms": 1710000000000,
"ended_at_ms": 1710000000100
}
]
}'curl -X POST http://127.0.0.1:7710/v1/evals \
-H 'content-type: application/json' \
-d '{
"run_id": "run-123",
"eval_key": "tool_success",
"scorer": "heuristic",
"score": 1.0,
"verdict": "pass"
}'Point nullclaw diagnostics OTLP endpoint at http://127.0.0.1:7710.
curl -X POST http://127.0.0.1:7710/v1/traces \
-H 'content-type: application/json' \
-d '{
"resourceSpans": [
{
"resource": {
"attributes": [
{ "key": "service.name", "value": { "stringValue": "nullclaw" } }
]
},
"scopeSpans": [
{
"spans": [
{
"traceId": "trace-otlp",
"spanId": "span-otlp",
"name": "tool.call",
"startTimeUnixNano": "1710000000200000000",
"endTimeUnixNano": "1710000000250000000",
"attributes": [
{ "key": "nullwatch.run_id", "value": { "stringValue": "run-otlp" } },
{ "key": "tool", "value": { "stringValue": "shell" } },
{ "key": "success", "value": { "boolValue": true } }
],
"status": { "code": 1 }
}
]
}
]
}
]
}'curl 'http://127.0.0.1:7710/v1/spans?source=nullclaw&status=error&limit=50'curl 'http://127.0.0.1:7710/v1/evals?verdict=fail&dataset=shadow&limit=50'curl http://127.0.0.1:7710/v1/runs?limit=20curl http://127.0.0.1:7710/v1/runs/run-123Default config path:
~/.nullwatch/config.json
Default config:
{
"host": "127.0.0.1",
"port": 7710,
"data_dir": "data",
"api_token": null
}Because data_dir is resolved relative to the config file, the default data directory becomes ~/.nullwatch/data.
nullwatch exports a nullhub manifest directly from the binary:
zig build run -- --export-manifestAnd it can bootstrap its own config from wizard answers:
zig build run -- --from-json '{"home":"~/.nullwatch","port":7710,"data_dir":"data"}'This keeps the service headless while letting nullhub own install/setup UI.
For a local NullHub flight-recorder demo:
zig build run -- demo-seed
zig build run -- serve --port 7710Start NullHub with NULLWATCH_URL=http://127.0.0.1:7710 and open the
Observability page to inspect the seeded runs, spans, evals, token usage, cost,
and failure context.
tests/test_e2e.shboots a real server and validates auth, ingest, OTLP mapping, and CLI queries..github/workflows/ci.ymldelegates unit tests, Linux E2E, and host builds tonullclaw/nullbuilder..github/workflows/release.ymldelegates tagged release artifacts for Linux, macOS, and Windows tonullclaw/nullbuilder.scripts/build-release.shproduces the same release artifact names locally plusSHA256SUMS.
- Replace JSONL storage with embedded SQLite while preserving the API contract.
- Extend demo fixtures with GenAI/OpenInference attributes and scenario selection.
- Add dataset, prompt version, and experiment entities.
- Add regression diff endpoints for comparing prompt/model/strategy versions.
- Add alert rules and anomaly summaries that
nullhubcan render.
- nullwatch-python-sdk — Python SDK with zero required dependencies. Ships built-in eval scorers for RAG hallucination detection (LettuceDetect) and tool-call schema validation.