tsuga-dev · ArthurVerrez · Jun 17, 2026 · Jun 17, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -15,6 +15,7 @@ plugins/telemetry/
 ```
 
 Notes:
+
 - **No top-level `skills/` directory.** Every skill lives under its plugin's subdir.
 - **No `.codex-plugin/` directory.** Codex consumes the same `.claude-plugin/marketplace.json` (verified: `~/.codex/.tmp/marketplaces/<name>/.claude-plugin/marketplace.json` is what `codex plugin list` reads).
 - **No `skills[]` array in marketplace.json.** Skills are auto-discovered from each plugin's `skills/` directory. The `skills[]` filter is only meaningful in single-plugin marketplaces and is silently ignored when multiple plugins share a `source`.
@@ -31,6 +32,7 @@ Notes:
    - Patch (`0.6.1 → 0.6.2`): content tweaks, doc fixes, clarifications.
    - Minor (`0.6.x → 0.7.0`): new skill, new reference doc, new behavior.
    - Major (`0.x → 1.0`): breaking change to a SKILL.md contract (rare).
+
 3. Open the PR as a draft. Use the `open-pr` skill if available.
 
 ## When adding a new skill
@@ -164,7 +166,7 @@ Code-reading skills (`signal-choice-advisor`, the `tsuga-audit-*` and `tsuga-smo
 
 ## Runtime behavior rules
 
-These govern how skills *behave when executing* (distinct from the authoring/repo conventions above). They apply to all Tsuga skills; each skill's SKILL.md inlines the rules critical to its workflow. This section is the canonical source.
+These govern how skills _behave when executing_ (distinct from the authoring/repo conventions above). They apply to all Tsuga skills; each skill's SKILL.md inlines the rules critical to its workflow. This section is the canonical source.
 
 1. **CLI-first, no external calls.** Only `tsuga` commands. No curl, no direct API calls, no web browsing during skill execution.
 
@@ -184,7 +186,7 @@ These govern how skills *behave when executing* (distinct from the authoring/rep
 
 9. **Scope containment.** If a request spans multiple skills, address your skill's scope, state what evidence you found, then direct the user to the other skill for the rest.
 
-10. **Stale data acknowledgment.** `monitors list`, `services list`, and `quality-reports list` return config/snapshot state, not live state. State the query time in output. For quality reports, flag if `generatedAt` > 48h ago.
+10. **Stale data acknowledgment.** `monitors list`, `services list`, and `quality-reports list` return config/snapshot state, not live state. State the query time in output. For quality reports, derive the report timestamp as `min(rows.createdAt)` and flag if > 48h ago.
 
 ### Addendum: instrumentation-quality skills
 

diff --git a/plugins/telemetry/skills/otel-go/references/audit-checklist.md b/plugins/telemetry/skills/otel-go/references/audit-checklist.md
@@ -1,6 +1,7 @@
 # Audit Checklist — Go OTel
 
 > This file covers two audit modes:
+>
 > 1. **Code anti-patterns** (static) — section "Anti-Patterns to Flag"
 > 2. **Cross-signal live audit** (tsuga CLI) — section "Cross-Signal Audit Workflow"
 >
@@ -28,12 +29,12 @@ go list -m go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
 
 Expected minimum versions:
 
-| Module | Minimum |
-|---|---|
-| `go.opentelemetry.io/otel` | v1.x (latest stable) |
-| `go.opentelemetry.io/otel/sdk` | v1.x (latest stable) |
-| `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` | v1.x |
-| `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc` | v1.x |
+| Module                                                              | Minimum              |
+| ------------------------------------------------------------------- | -------------------- |
+| `go.opentelemetry.io/otel`                                          | v1.x (latest stable) |
+| `go.opentelemetry.io/otel/sdk`                                      | v1.x (latest stable) |
+| `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`   | v1.x                 |
+| `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc` | v1.x                 |
 
 Check for module version consistency — `otel`, `otel/sdk`, and `otel/exporters` packages should use the same major version.
 
@@ -156,6 +157,7 @@ tsuga metrics list --filter "service.name=<your-service>"
 ```
 
 If no spans appear:
+
 1. Check `OTEL_EXPORTER_OTLP_ENDPOINT` — for gRPC, do not include `http://`
 2. Verify the collector is reachable: `grpc_cli ls localhost:4317`
 3. Confirm `otel.SetTracerProvider(tp)` is called before any `otel.Tracer()` calls
@@ -179,6 +181,7 @@ tsuga services get <id>
 Capture `sources[]`, `logsCount24h`, `tracesCount24h`, `errorLogsCount24h`, `errorTracesCount24h`.
 
 Classify:
+
 - No signals at all → stop; direct to `otel-instrumentation`
 - Logs only / Traces only / Metrics only → note what's missing; continue what's present
 - All 3 present → continue full audit
@@ -230,20 +233,22 @@ In sampled metric attributes: flag `user.id`, `request.id`, raw URL paths, trace
 ### Step 9 — Quality report
 
 ```bash
-tsuga quality-reports list
+tsuga quality-reports list                        # add `--cluster <id>` in multi-cluster orgs
 ```
 
-Note `generatedAt` — flag as stale if > 48h ago. Note rule failures attributed to this service.
+Returns a flat array of rule-evaluation rows. Filter `.[] | select(.status == "failed")` for rule failures attributed to this service. Report timestamp = `min(.[].createdAt)`; flag as stale if > 48h ago.
 
 ### Step 10 — Source code check (if path provided)
 
 Read SDK init file. Check for:
+
 - `TracerProvider`, `MeterProvider`, `LoggerProvider` initialization
 - Log bridge setup (`otelslog.NewHandler` or `otelzap.NewCore`)
 - Resource attributes set at init (`service.name`, `service.version`, `deployment.environment.name`)
 - Exporter endpoint: **fail** if hardcoded (`WithEndpoint("http://localhost:4317")` in code); **pass** if zero-arg constructor AND `OTEL_EXPORTER_OTLP_ENDPOINT` is in deployment config
 
 If metric creation code is available, check instrument type:
+
 - Description containing "current", "active", "open", "in-flight", or "pending" → must use `UpDownCounter` (not `Counter`)
 - Description containing "duration", "latency", "time", or "size" → must use `Histogram` (not `Counter` or `Gauge`)
 
@@ -300,7 +305,7 @@ service.name: <present/MISSING> | service.version: <present/MISSING>
 Endpoint hardcoded: ✅ no / ❌ yes (file:line) | OTEL_EXPORTER_OTLP_ENDPOINT configured: ✅ / ❌ / ⚠️ not checked
 
 ## Quality Report
-<N rule failures / No failures / ⚠️ Report stale — generated: <generatedAt>>
+<N rule failures / No failures / ⚠️ Report stale — generated: <min(rows.createdAt)>>
 
 ## Recommended Actions (in order of impact)
 1. <highest-impact fix> — run `<specific skill>` for detailed guidance
@@ -310,7 +315,7 @@ Endpoint hardcoded: ✅ no / ❌ yes (file:line) | OTEL_EXPORTER_OTLP_ENDPOINT c
 - Condensed audit: 5-record samples per signal; use tsuga-audit-metrics / tsuga-audit-logs / tsuga-audit-traces for full per-signal analysis
 - Metric check covers spot-check of up to 3 metric names only
 - Source code findings (if any) are from the provided path only
-- Quality report reflects state at generatedAt, not live state
+- Quality report reflects state at the row `createdAt` timestamps, not live state
 ```
 
 ---

diff --git a/plugins/telemetry/skills/tsuga-audit-metrics/SKILL.md b/plugins/telemetry/skills/tsuga-audit-metrics/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: tsuga-audit-metrics
-description: "Use when asked to review metric design, check metric naming, audit metric quality, or validate a custom metric."
+description: 'Use when asked to review metric design, check metric naming, audit metric quality, or validate a custom metric.'
 ---
 
 # Tsuga: Audit Metrics
@@ -29,7 +29,7 @@ description: "Use when asked to review metric design, check metric naming, audit
 
 5. For code-side audit (instrument type correctness, OTel SDK init, metric view configuration) → `otel-<lang>/references/audit-checklist.md`
 
-6. `tsuga quality-reports list` — check if any rule failures relate to metric naming or instrumentation for this service. Note `generatedAt`; flag if > 48h ago.
+6. `tsuga quality-reports list` (pass `--cluster <id>` in multi-cluster orgs) — flat array of rule-evaluation rows. Filter `.[] | select(.status == "failed")` for failures related to metric naming or instrumentation for this service. Report timestamp = `min(.[].createdAt)`; flag if > 48h ago.
 
 7. **Stop and validate with user.** Before presenting final findings, share preliminary observations: "Here is what I found so far — [summary of naming issues, instrument type concerns, cardinality risks]. Does this match your understanding of how metrics are instrumented in this service?" Adjust findings based on user context before concluding.
 
@@ -47,15 +47,15 @@ Source: OTel specification (https://opentelemetry.io/docs/specs/otel/metrics/sup
 | `{request}` | Request counts |
 Never encode units in the metric name — set the `unit` field instead.
 
-| Rule | Bad example | Good example |
-|---|---|---|
-| No service name in metric name | `web_backend_request_count` | `http.server.request.count` (service identity in `context.service.name`) |
-| No environment/version in metric name | `prod_latency_ms`, `v2_errors` | `http.server.request.duration` |
-| No units in metric name | `latency_ms`, `memory_bytes` | `request.duration` with `unit: ms` |
-| OTel dot notation (not underscores) | `http_server_request_duration` | `http.server.request.duration` |
-| Verb-object pattern | `duration` (missing namespace) | `http.server.request.duration` |
-| Counter must not describe current state | `active_connections` as Counter | `active_connections` as UpDownCounter |
-| Histogram for distributions | `request_duration` as Counter | `request.duration` as Histogram |
+| Rule                                    | Bad example                     | Good example                                                             |
+| --------------------------------------- | ------------------------------- | ------------------------------------------------------------------------ |
+| No service name in metric name          | `web_backend_request_count`     | `http.server.request.count` (service identity in `context.service.name`) |
+| No environment/version in metric name   | `prod_latency_ms`, `v2_errors`  | `http.server.request.duration`                                           |
+| No units in metric name                 | `latency_ms`, `memory_bytes`    | `request.duration` with `unit: ms`                                       |
+| OTel dot notation (not underscores)     | `http_server_request_duration`  | `http.server.request.duration`                                           |
+| Verb-object pattern                     | `duration` (missing namespace)  | `http.server.request.duration`                                           |
+| Counter must not describe current state | `active_connections` as Counter | `active_connections` as UpDownCounter                                    |
+| Histogram for distributions             | `request_duration` as Counter   | `request.duration` as Histogram                                          |
 
 ## GOOD/BAD Examples
 
@@ -80,11 +80,14 @@ Never encode units in the metric name — set the `unit` field instead.
 ### Cardinality Examples
 
 **BAD:** Metric dimension `user_id` (or `user.id`)
+
 ```
 histogram.record(duration, {"user.id": userId, "http.route": "/api/orders"})
 ```
+
 **Why wrong:** `user_id` has unbounded cardinality — one unique value per user → millions of time series.
 **GOOD:** Remove `user.id` from metric dimensions; use it as a span attribute instead:
+
 ```
 histogram.record(duration, {"http.route": "/api/orders"})  // Low-cardinality dims only
 span.setAttribute("user.id", userId)                        // High-cardinality → spans
@@ -98,12 +101,12 @@ span.setAttribute("user.id", userId)                        // High-cardinality
 
 ## Instrument Type Rules
 
-| Instrument | Correct use |
-|---|---|
-| Counter | Monotonically increasing totals (requests processed, errors, retries) |
-| UpDownCounter | Fluctuating current values (queue depth, active connections, cache size) |
-| Histogram | Distributions — latency, payload size, queue wait time |
-| Observable Gauge | Values sampled at collection time (CPU %, memory usage, thread count) |
+| Instrument       | Correct use                                                              |
+| ---------------- | ------------------------------------------------------------------------ |
+| Counter          | Monotonically increasing totals (requests processed, errors, retries)    |
+| UpDownCounter    | Fluctuating current values (queue depth, active connections, cache size) |
+| Histogram        | Distributions — latency, payload size, queue wait time                   |
+| Observable Gauge | Values sampled at collection time (CPU %, memory usage, thread count)    |
 
 **Detection heuristic (source: CLI):** If a metric's description contains "current", "active", "open", "in-flight", or "pending" AND its type is Counter: likely wrong instrument type.
 
@@ -141,8 +144,8 @@ Metrics audited: <N> | Naming issues: <N> | Instrument type issues: <N> | Cardin
 | <metric.name> | <attr> | >100 (limit reached) | High — possible per-request identifier | tsuga CLI |
 
 ## Quality Report Correlation
-<N> quality rule failures relate to metric instrumentation (report generated: <generatedAt>)
-[If generatedAt > 48h ago: ⚠️ Quality report is stale — results may not reflect current state]
+<N> quality rule failures relate to metric instrumentation (report generated: <min(rows.createdAt)>)
+[If derived timestamp > 48h ago: ⚠️ Quality report is stale — results may not reflect current state]
 
 ## Recommended Actions
 1. Rename <metric.name> to <corrected.name> — eliminates service-identity-in-name violation
@@ -157,6 +160,7 @@ Metrics audited: <N> | Naming issues: <N> | Instrument type issues: <N> | Cardin
 ```
 
 ## Related Skills / Next Steps
+
 - `tsuga-metric-naming-fix` — apply renames found in this audit
 - `tsuga-smoke-test` — verify metrics after fixes
 - `otel-<lang>/references/audit-checklist.md` — code-side audit (instrument type, SDK init, metric views)

diff --git a/plugins/tsuga/skills/tsuga-cli/SKILL.md b/plugins/tsuga/skills/tsuga-cli/SKILL.md
@@ -101,12 +101,13 @@ tsuga investigations create -d '{
 ## Quality Reports
 
 ```bash
-tsuga quality-reports list              # get the latest quality report
-tsuga quality-reports list | jq '.report.overallScore'
-tsuga quality-reports list | jq '.report.teamResults[].teamName'
+tsuga quality-reports list                                    # rows for the cluster's latest report
+tsuga quality-reports list --cluster <id>                     # multi-cluster orgs MUST pass --cluster (else 400)
+tsuga quality-reports list | jq '.[] | select(.status=="failed") | {ruleId, owner}'
+tsuga quality-reports list | jq 'group_by(.owner) | map({owner: .[0].owner, score: .[0].reportOverallScore})'
 ```
 
-Key fields: `report.generatedAt`, `report.overallScore`, `report.teamResults[]{teamId, teamName, score, weightedScore, ruleResults[]}`. Generated at least once per 24h. Only supports `list`.
+Returns a flat list of rows (one per persisted rule evaluation). Key fields per row: `id`, `reportId`, `ruleId`, `owner` (team id or absent for global), `status` (`passed`/`failed`/`ignored`), `score` (0..1), `weight`, `reportOverallScore` and `reportTotalWeight` (per-owner aggregates, identical across rows of the same `reportId`), `createdAt`, optional `recommendation` and `examples`. Generated at least once per 24h. Only supports `list`.
 
 ## Services
 
@@ -163,18 +164,18 @@ Output: `{"nodes": [{"serviceName": ...}], "edges": [{"from": <svc>, "to": <svc>
 
 Applies to all `--query` flags and aggregation `filter` fields.
 
-| Syntax                 | Meaning                         | Example                                                          |
-| ---------------------- | ------------------------------- | ---------------------------------------------------------------- |
-| `field:value`          | Exact match                     | `level:ERROR`                                                    |
-| `term1 term2`          | AND (default)                   | `level:ERROR context.service.name:api`                           |
-| `term1 OR term2`       | OR (must be uppercase)          | `level:ERROR OR level:WARN`                                      |
-| `NOT term`             | Negation (uppercase)            | `level:ERROR NOT context.env:staging`                            |
-| `(...)`                | Grouping                        | `(level:ERROR OR level:WARN) context.service.name:api`           |
-| `(field:a OR field:b)` | Multiple values for one field   | `(context.service.name:web-backend OR context.service.name:api)` |
-| `field:*`              | Field exists                    | `trace_id:*`                                                     |
-| `field:[A TO B]`       | Range, inclusive                | `duration:[100 TO 500]`                                          |
-| `field:>N`             | Numeric (also `>=`, `<`)        | `duration:>100`                                                  |
-| `<bare token>`         | Free-text token in log message  | `refused`, `connection refused`, `(timeout OR refused)`          |
+| Syntax                 | Meaning                        | Example                                                          |
+| ---------------------- | ------------------------------ | ---------------------------------------------------------------- |
+| `field:value`          | Exact match                    | `level:ERROR`                                                    |
+| `term1 term2`          | AND (default)                  | `level:ERROR context.service.name:api`                           |
+| `term1 OR term2`       | OR (must be uppercase)         | `level:ERROR OR level:WARN`                                      |
+| `NOT term`             | Negation (uppercase)           | `level:ERROR NOT context.env:staging`                            |
+| `(...)`                | Grouping                       | `(level:ERROR OR level:WARN) context.service.name:api`           |
+| `(field:a OR field:b)` | Multiple values for one field  | `(context.service.name:web-backend OR context.service.name:api)` |
+| `field:*`              | Field exists                   | `trace_id:*`                                                     |
+| `field:[A TO B]`       | Range, inclusive               | `duration:[100 TO 500]`                                          |
+| `field:>N`             | Numeric (also `>=`, `<`)       | `duration:>100`                                                  |
+| `<bare token>`         | Free-text token in log message | `refused`, `connection refused`, `(timeout OR refused)`          |
 
 AND is the default (space-separated terms); OR binds looser than AND. `OR` and `NOT` must be uppercase — lowercase `or`/`not` become literal search terms.
 
@@ -272,7 +273,7 @@ Example — per-second rate of a counter:
 ```
 
 - `timeRange` uses **Unix seconds** (not relative strings like `"-1h"`)
-- On a **multi-cluster tenant**, `aggregation scalar|timeseries` needs a top-level `"clusterId": "<id>"` **in the body** — the global `--cluster` flag and `defaults.cluster` are *ignored* for `aggregation` (every other command honors them). Without it: `400 clusterId must be provided when the organization has multiple clusters`.
+- On a **multi-cluster tenant**, `aggregation scalar|timeseries` needs a top-level `"clusterId": "<id>"` **in the body** — the global `--cluster` flag and `defaults.cluster` are _ignored_ for `aggregation` (every other command honors them). Without it: `400 clusterId must be provided when the organization has multiple clusters`.
 - `dataSource` and `formula` are **body-level** fields; query items do not have `id` or `dataSource`
 - `formula` references queries by position: `"q1"` = first query, `"q2"` = second, etc.
 - `groupBy` is at **body level** (not inside query items): `[{"fields": ["field.name"], "limit": N}]`