nullwatch

Observability, tracing, evals, and optimization signals for nullclaw.

nullwatch is the execution-intelligence layer in the null* stack. It does not run agents, it does not schedule work, and it does not manage UI. It ingests execution traces and eval results, stores them durably, and exposes them through a JSON HTTP API and CLI so nullhub or any other client can consume them.

Role in the stack

nullclaw executes work.
nulltickets owns durable task state.
nullboiler owns orchestration policy.
nullhub owns install, config, and UI.
nullwatch owns traces, evals, run summaries, costs, latency, and regression signals.

This repository intentionally stays headless. The product surface is:

JSON HTTP API for ingestion and querying.
CLI commands for local automation and scripts.
File-backed storage for the bootstrap implementation.

UI belongs elsewhere, primarily in nullhub.

What lives here

Run and span ingest for nullclaw execution telemetry.
Eval result ingest for scorers, rubrics, regression checks, and datasets.
Run-level summaries for latency, errors, token usage, and cost.
Machine-readable capabilities and summary endpoints.
Headless workflows that a separate UI can compose.

What does not live here

Agent runtime logic.
Queue ownership or task lifecycle source of truth.
Scheduling, balancing, routing, retries, or orchestration policy.
Web UI, dashboards, or installer flows.

Current MVP shape

The implementation is intentionally small but already usable:

Single Zig binary.
Local JSONL persistence under ~/.nullwatch/data by default.
HTTP API on 127.0.0.1:7710 by default.
CLI commands for ingesting spans/evals and querying runs, spans, evals, and summaries.
OTLP/HTTP JSON ingest on /v1/traces and /otlp/v1/traces.
nullhub integration via --export-manifest and --from-json.

This gives you a real executable contract now, while keeping room to swap storage later for SQLite or another embedded engine without changing the product boundary.

Data model

Span

A span represents one timed execution unit inside a run, for example:

model call
tool invocation
memory lookup
task transition bridge
retry or fallback branch

Core fields:

run_id
trace_id
span_id
parent_span_id
source
operation
status
started_at_ms
ended_at_ms or duration_ms
model, tool_name, prompt_version
input_tokens, output_tokens, cost_usd

Eval

An eval is a scored assertion attached to a run, for example:

helpfulness
policy compliance
routing correctness
tool success rate
regression gate

Core fields:

run_id
eval_key
scorer
score
verdict
dataset
notes

Run summary

Run summaries are computed views over spans and evals:

span count
eval count
error count
total duration
total cost
total input/output tokens
pass/fail counts
overall verdict

CLI

Build:

zig build

Run the API server:

zig build run -- serve

Run the API server on all interfaces:

zig build run -- serve --host 0.0.0.0 --port 7710

Query summary:

zig build run -- summary

List runs:

zig build run -- runs --verdict pass --limit 20

List spans:

zig build run -- spans --source nullclaw --tool-name shell --limit 50

List evals:

zig build run -- evals --dataset prod-shadow --verdict fail

Seed local demo runs:

zig build run -- demo-seed
zig build run -- runs --limit 20
zig build run -- run demo-tool-failure

demo-seed creates a deterministic, idempotent local dataset for demos and manual testing without API keys, hosted services, or a running agent workload. It includes a passing code-review run, a failed tool-call run, and a handoff/retry run with checkpoint context.

Ingest a span from the CLI:

zig build run -- ingest-span --json '{
  "run_id": "run-123",
  "trace_id": "trace-123",
  "span_id": "span-1",
  "source": "nullclaw",
  "operation": "model.call",
  "status": "ok",
  "started_at_ms": 1710000000000,
  "ended_at_ms": 1710000000320,
  "model": "gpt-5",
  "prompt_version": "reply-v3",
  "input_tokens": 420,
  "output_tokens": 96,
  "cost_usd": 0.018
}'

Ingest an eval:

zig build run -- ingest-eval --json '{
  "run_id": "run-123",
  "eval_key": "helpfulness",
  "scorer": "llm-judge",
  "score": 0.94,
  "verdict": "pass",
  "dataset": "prod-shadow"
}'

Inspect a run:

zig build run -- run run-123

HTTP API

Health

curl http://127.0.0.1:7710/health

Capabilities

curl http://127.0.0.1:7710/v1/capabilities

Ingest span

curl -X POST http://127.0.0.1:7710/v1/spans \
  -H 'content-type: application/json' \
  -d '{
    "run_id": "run-123",
    "trace_id": "trace-123",
    "span_id": "span-1",
    "source": "nullclaw",
    "operation": "tool.call",
    "status": "ok",
    "started_at_ms": 1710000000000,
    "ended_at_ms": 1710000000140,
    "tool_name": "bash"
  }'

Ingest spans in bulk

curl -X POST http://127.0.0.1:7710/v1/spans/bulk \
  -H 'content-type: application/json' \
  -d '{
    "items": [
      {
        "run_id": "run-123",
        "trace_id": "trace-123",
        "span_id": "span-1",
        "source": "nullclaw",
        "operation": "model.call",
        "started_at_ms": 1710000000000,
        "ended_at_ms": 1710000000100
      }
    ]
  }'

Ingest eval

curl -X POST http://127.0.0.1:7710/v1/evals \
  -H 'content-type: application/json' \
  -d '{
    "run_id": "run-123",
    "eval_key": "tool_success",
    "scorer": "heuristic",
    "score": 1.0,
    "verdict": "pass"
  }'

Ingest OTLP traces from `nullclaw`

Point nullclaw diagnostics OTLP endpoint at http://127.0.0.1:7710.

curl -X POST http://127.0.0.1:7710/v1/traces \
  -H 'content-type: application/json' \
  -d '{
    "resourceSpans": [
      {
        "resource": {
          "attributes": [
            { "key": "service.name", "value": { "stringValue": "nullclaw" } }
          ]
        },
        "scopeSpans": [
          {
            "spans": [
              {
                "traceId": "trace-otlp",
                "spanId": "span-otlp",
                "name": "tool.call",
                "startTimeUnixNano": "1710000000200000000",
                "endTimeUnixNano": "1710000000250000000",
                "attributes": [
                  { "key": "nullwatch.run_id", "value": { "stringValue": "run-otlp" } },
                  { "key": "tool", "value": { "stringValue": "shell" } },
                  { "key": "success", "value": { "boolValue": true } }
                ],
                "status": { "code": 1 }
              }
            ]
          }
        ]
      }
    ]
  }'

List spans

curl 'http://127.0.0.1:7710/v1/spans?source=nullclaw&status=error&limit=50'

List evals

curl 'http://127.0.0.1:7710/v1/evals?verdict=fail&dataset=shadow&limit=50'

List runs

curl http://127.0.0.1:7710/v1/runs?limit=20

Get run detail

curl http://127.0.0.1:7710/v1/runs/run-123

Config

Default config path:

~/.nullwatch/config.json

Default config:

{
  "host": "127.0.0.1",
  "port": 7710,
  "data_dir": "data",
  "api_token": null
}

Because data_dir is resolved relative to the config file, the default data directory becomes ~/.nullwatch/data.

NullHub integration

nullwatch exports a nullhub manifest directly from the binary:

zig build run -- --export-manifest

And it can bootstrap its own config from wizard answers:

zig build run -- --from-json '{"home":"~/.nullwatch","port":7710,"data_dir":"data"}'

This keeps the service headless while letting nullhub own install/setup UI.

For a local NullHub flight-recorder demo:

zig build run -- demo-seed
zig build run -- serve --port 7710

Start NullHub with NULLWATCH_URL=http://127.0.0.1:7710 and open the Observability page to inspect the seeded runs, spans, evals, token usage, cost, and failure context.

CI and releases

tests/test_e2e.sh boots a real server and validates auth, ingest, OTLP mapping, and CLI queries.
.github/workflows/ci.yml delegates unit tests, Linux E2E, and host builds to nullclaw/nullbuilder.
.github/workflows/release.yml delegates tagged release artifacts for Linux, macOS, and Windows to nullclaw/nullbuilder.
scripts/build-release.sh produces the same release artifact names locally plus SHA256SUMS.

Near-term next steps

Replace JSONL storage with embedded SQLite while preserving the API contract.
Extend demo fixtures with GenAI/OpenInference attributes and scenario selection.
Add dataset, prompt version, and experiment entities.
Add regression diff endpoints for comparing prompt/model/strategy versions.
Add alert rules and anomaly summaries that nullhub can render.

Community SDKs

nullwatch-python-sdk — Python SDK with zero required dependencies. Ships built-in eval scorers for RAG hallucination detection (LettuceDetect) and tool-call schema validation.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nullwatch

Role in the stack

What lives here

What does not live here

Current MVP shape

Data model

Span

Eval

Run summary

CLI

HTTP API

Health

Capabilities

Ingest span

Ingest spans in bulk

Ingest eval

Ingest OTLP traces from `nullclaw`

List spans

List evals

List runs

Get run detail

Config

NullHub integration

CI and releases

Near-term next steps

Community SDKs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nullwatch

Role in the stack

What lives here

What does not live here

Current MVP shape

Data model

Span

Eval

Run summary

CLI

HTTP API

Health

Capabilities

Ingest span

Ingest spans in bulk

Ingest eval

Ingest OTLP traces from nullclaw

List spans

List evals

List runs

Get run detail

Config

NullHub integration

CI and releases

Near-term next steps

Community SDKs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Ingest OTLP traces from `nullclaw`

Packages