From d9a2513eeda5ff9947d7bc22c5b034f2daa60d5d Mon Sep 17 00:00:00 2001 From: ArthurVerrez Date: Wed, 17 Jun 2026 09:56:55 -0700 Subject: [PATCH] tsuga-cli: update quality-reports docs for flat-row API shape --- AGENTS.md | 6 ++- .../otel-go/references/audit-checklist.md | 25 ++++++----- .../skills/tsuga-audit-metrics/SKILL.md | 42 ++++++++++--------- plugins/tsuga/skills/tsuga-cli/SKILL.md | 35 ++++++++-------- .../playbooks/reliability-review.md | 31 +++++++------- 5 files changed, 76 insertions(+), 63 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 59912a7..ceb0fbb 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -15,6 +15,7 @@ plugins/telemetry/ ``` Notes: + - **No top-level `skills/` directory.** Every skill lives under its plugin's subdir. - **No `.codex-plugin/` directory.** Codex consumes the same `.claude-plugin/marketplace.json` (verified: `~/.codex/.tmp/marketplaces//.claude-plugin/marketplace.json` is what `codex plugin list` reads). - **No `skills[]` array in marketplace.json.** Skills are auto-discovered from each plugin's `skills/` directory. The `skills[]` filter is only meaningful in single-plugin marketplaces and is silently ignored when multiple plugins share a `source`. @@ -31,6 +32,7 @@ Notes: - Patch (`0.6.1 → 0.6.2`): content tweaks, doc fixes, clarifications. - Minor (`0.6.x → 0.7.0`): new skill, new reference doc, new behavior. - Major (`0.x → 1.0`): breaking change to a SKILL.md contract (rare). + 3. Open the PR as a draft. Use the `open-pr` skill if available. ## When adding a new skill @@ -164,7 +166,7 @@ Code-reading skills (`signal-choice-advisor`, the `tsuga-audit-*` and `tsuga-smo ## Runtime behavior rules -These govern how skills *behave when executing* (distinct from the authoring/repo conventions above). They apply to all Tsuga skills; each skill's SKILL.md inlines the rules critical to its workflow. This section is the canonical source. +These govern how skills _behave when executing_ (distinct from the authoring/repo conventions above). They apply to all Tsuga skills; each skill's SKILL.md inlines the rules critical to its workflow. This section is the canonical source. 1. **CLI-first, no external calls.** Only `tsuga` commands. No curl, no direct API calls, no web browsing during skill execution. @@ -184,7 +186,7 @@ These govern how skills *behave when executing* (distinct from the authoring/rep 9. **Scope containment.** If a request spans multiple skills, address your skill's scope, state what evidence you found, then direct the user to the other skill for the rest. -10. **Stale data acknowledgment.** `monitors list`, `services list`, and `quality-reports list` return config/snapshot state, not live state. State the query time in output. For quality reports, flag if `generatedAt` > 48h ago. +10. **Stale data acknowledgment.** `monitors list`, `services list`, and `quality-reports list` return config/snapshot state, not live state. State the query time in output. For quality reports, derive the report timestamp as `min(rows.createdAt)` and flag if > 48h ago. ### Addendum: instrumentation-quality skills diff --git a/plugins/telemetry/skills/otel-go/references/audit-checklist.md b/plugins/telemetry/skills/otel-go/references/audit-checklist.md index 7a08f7c..4fe854c 100644 --- a/plugins/telemetry/skills/otel-go/references/audit-checklist.md +++ b/plugins/telemetry/skills/otel-go/references/audit-checklist.md @@ -1,6 +1,7 @@ # Audit Checklist — Go OTel > This file covers two audit modes: +> > 1. **Code anti-patterns** (static) — section "Anti-Patterns to Flag" > 2. **Cross-signal live audit** (tsuga CLI) — section "Cross-Signal Audit Workflow" > @@ -28,12 +29,12 @@ go list -m go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc Expected minimum versions: -| Module | Minimum | -|---|---| -| `go.opentelemetry.io/otel` | v1.x (latest stable) | -| `go.opentelemetry.io/otel/sdk` | v1.x (latest stable) | -| `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` | v1.x | -| `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc` | v1.x | +| Module | Minimum | +| ------------------------------------------------------------------- | -------------------- | +| `go.opentelemetry.io/otel` | v1.x (latest stable) | +| `go.opentelemetry.io/otel/sdk` | v1.x (latest stable) | +| `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` | v1.x | +| `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc` | v1.x | Check for module version consistency — `otel`, `otel/sdk`, and `otel/exporters` packages should use the same major version. @@ -156,6 +157,7 @@ tsuga metrics list --filter "service.name=" ``` If no spans appear: + 1. Check `OTEL_EXPORTER_OTLP_ENDPOINT` — for gRPC, do not include `http://` 2. Verify the collector is reachable: `grpc_cli ls localhost:4317` 3. Confirm `otel.SetTracerProvider(tp)` is called before any `otel.Tracer()` calls @@ -179,6 +181,7 @@ tsuga services get Capture `sources[]`, `logsCount24h`, `tracesCount24h`, `errorLogsCount24h`, `errorTracesCount24h`. Classify: + - No signals at all → stop; direct to `otel-instrumentation` - Logs only / Traces only / Metrics only → note what's missing; continue what's present - All 3 present → continue full audit @@ -230,20 +233,22 @@ In sampled metric attributes: flag `user.id`, `request.id`, raw URL paths, trace ### Step 9 — Quality report ```bash -tsuga quality-reports list +tsuga quality-reports list # add `--cluster ` in multi-cluster orgs ``` -Note `generatedAt` — flag as stale if > 48h ago. Note rule failures attributed to this service. +Returns a flat array of rule-evaluation rows. Filter `.[] | select(.status == "failed")` for rule failures attributed to this service. Report timestamp = `min(.[].createdAt)`; flag as stale if > 48h ago. ### Step 10 — Source code check (if path provided) Read SDK init file. Check for: + - `TracerProvider`, `MeterProvider`, `LoggerProvider` initialization - Log bridge setup (`otelslog.NewHandler` or `otelzap.NewCore`) - Resource attributes set at init (`service.name`, `service.version`, `deployment.environment.name`) - Exporter endpoint: **fail** if hardcoded (`WithEndpoint("http://localhost:4317")` in code); **pass** if zero-arg constructor AND `OTEL_EXPORTER_OTLP_ENDPOINT` is in deployment config If metric creation code is available, check instrument type: + - Description containing "current", "active", "open", "in-flight", or "pending" → must use `UpDownCounter` (not `Counter`) - Description containing "duration", "latency", "time", or "size" → must use `Histogram` (not `Counter` or `Gauge`) @@ -300,7 +305,7 @@ service.name: | service.version: Endpoint hardcoded: ✅ no / ❌ yes (file:line) | OTEL_EXPORTER_OTLP_ENDPOINT configured: ✅ / ❌ / ⚠️ not checked ## Quality Report -> +> ## Recommended Actions (in order of impact) 1. — run `` for detailed guidance @@ -310,7 +315,7 @@ Endpoint hardcoded: ✅ no / ❌ yes (file:line) | OTEL_EXPORTER_OTLP_ENDPOINT c - Condensed audit: 5-record samples per signal; use tsuga-audit-metrics / tsuga-audit-logs / tsuga-audit-traces for full per-signal analysis - Metric check covers spot-check of up to 3 metric names only - Source code findings (if any) are from the provided path only -- Quality report reflects state at generatedAt, not live state +- Quality report reflects state at the row `createdAt` timestamps, not live state ``` --- diff --git a/plugins/telemetry/skills/tsuga-audit-metrics/SKILL.md b/plugins/telemetry/skills/tsuga-audit-metrics/SKILL.md index 5051406..42bfe96 100644 --- a/plugins/telemetry/skills/tsuga-audit-metrics/SKILL.md +++ b/plugins/telemetry/skills/tsuga-audit-metrics/SKILL.md @@ -1,6 +1,6 @@ --- name: tsuga-audit-metrics -description: "Use when asked to review metric design, check metric naming, audit metric quality, or validate a custom metric." +description: 'Use when asked to review metric design, check metric naming, audit metric quality, or validate a custom metric.' --- # Tsuga: Audit Metrics @@ -29,7 +29,7 @@ description: "Use when asked to review metric design, check metric naming, audit 5. For code-side audit (instrument type correctness, OTel SDK init, metric view configuration) → `otel-/references/audit-checklist.md` -6. `tsuga quality-reports list` — check if any rule failures relate to metric naming or instrumentation for this service. Note `generatedAt`; flag if > 48h ago. +6. `tsuga quality-reports list` (pass `--cluster ` in multi-cluster orgs) — flat array of rule-evaluation rows. Filter `.[] | select(.status == "failed")` for failures related to metric naming or instrumentation for this service. Report timestamp = `min(.[].createdAt)`; flag if > 48h ago. 7. **Stop and validate with user.** Before presenting final findings, share preliminary observations: "Here is what I found so far — [summary of naming issues, instrument type concerns, cardinality risks]. Does this match your understanding of how metrics are instrumented in this service?" Adjust findings based on user context before concluding. @@ -47,15 +47,15 @@ Source: OTel specification (https://opentelemetry.io/docs/specs/otel/metrics/sup | `{request}` | Request counts | Never encode units in the metric name — set the `unit` field instead. -| Rule | Bad example | Good example | -|---|---|---| -| No service name in metric name | `web_backend_request_count` | `http.server.request.count` (service identity in `context.service.name`) | -| No environment/version in metric name | `prod_latency_ms`, `v2_errors` | `http.server.request.duration` | -| No units in metric name | `latency_ms`, `memory_bytes` | `request.duration` with `unit: ms` | -| OTel dot notation (not underscores) | `http_server_request_duration` | `http.server.request.duration` | -| Verb-object pattern | `duration` (missing namespace) | `http.server.request.duration` | -| Counter must not describe current state | `active_connections` as Counter | `active_connections` as UpDownCounter | -| Histogram for distributions | `request_duration` as Counter | `request.duration` as Histogram | +| Rule | Bad example | Good example | +| --------------------------------------- | ------------------------------- | ------------------------------------------------------------------------ | +| No service name in metric name | `web_backend_request_count` | `http.server.request.count` (service identity in `context.service.name`) | +| No environment/version in metric name | `prod_latency_ms`, `v2_errors` | `http.server.request.duration` | +| No units in metric name | `latency_ms`, `memory_bytes` | `request.duration` with `unit: ms` | +| OTel dot notation (not underscores) | `http_server_request_duration` | `http.server.request.duration` | +| Verb-object pattern | `duration` (missing namespace) | `http.server.request.duration` | +| Counter must not describe current state | `active_connections` as Counter | `active_connections` as UpDownCounter | +| Histogram for distributions | `request_duration` as Counter | `request.duration` as Histogram | ## GOOD/BAD Examples @@ -80,11 +80,14 @@ Never encode units in the metric name — set the `unit` field instead. ### Cardinality Examples **BAD:** Metric dimension `user_id` (or `user.id`) + ``` histogram.record(duration, {"user.id": userId, "http.route": "/api/orders"}) ``` + **Why wrong:** `user_id` has unbounded cardinality — one unique value per user → millions of time series. **GOOD:** Remove `user.id` from metric dimensions; use it as a span attribute instead: + ``` histogram.record(duration, {"http.route": "/api/orders"}) // Low-cardinality dims only span.setAttribute("user.id", userId) // High-cardinality → spans @@ -98,12 +101,12 @@ span.setAttribute("user.id", userId) // High-cardinality ## Instrument Type Rules -| Instrument | Correct use | -|---|---| -| Counter | Monotonically increasing totals (requests processed, errors, retries) | -| UpDownCounter | Fluctuating current values (queue depth, active connections, cache size) | -| Histogram | Distributions — latency, payload size, queue wait time | -| Observable Gauge | Values sampled at collection time (CPU %, memory usage, thread count) | +| Instrument | Correct use | +| ---------------- | ------------------------------------------------------------------------ | +| Counter | Monotonically increasing totals (requests processed, errors, retries) | +| UpDownCounter | Fluctuating current values (queue depth, active connections, cache size) | +| Histogram | Distributions — latency, payload size, queue wait time | +| Observable Gauge | Values sampled at collection time (CPU %, memory usage, thread count) | **Detection heuristic (source: CLI):** If a metric's description contains "current", "active", "open", "in-flight", or "pending" AND its type is Counter: likely wrong instrument type. @@ -141,8 +144,8 @@ Metrics audited: | Naming issues: | Instrument type issues: | Cardin | | | >100 (limit reached) | High — possible per-request identifier | tsuga CLI | ## Quality Report Correlation - quality rule failures relate to metric instrumentation (report generated: ) -[If generatedAt > 48h ago: ⚠️ Quality report is stale — results may not reflect current state] + quality rule failures relate to metric instrumentation (report generated: ) +[If derived timestamp > 48h ago: ⚠️ Quality report is stale — results may not reflect current state] ## Recommended Actions 1. Rename to — eliminates service-identity-in-name violation @@ -157,6 +160,7 @@ Metrics audited: | Naming issues: | Instrument type issues: | Cardin ``` ## Related Skills / Next Steps + - `tsuga-metric-naming-fix` — apply renames found in this audit - `tsuga-smoke-test` — verify metrics after fixes - `otel-/references/audit-checklist.md` — code-side audit (instrument type, SDK init, metric views) diff --git a/plugins/tsuga/skills/tsuga-cli/SKILL.md b/plugins/tsuga/skills/tsuga-cli/SKILL.md index 0daffd8..e04fd7f 100644 --- a/plugins/tsuga/skills/tsuga-cli/SKILL.md +++ b/plugins/tsuga/skills/tsuga-cli/SKILL.md @@ -101,12 +101,13 @@ tsuga investigations create -d '{ ## Quality Reports ```bash -tsuga quality-reports list # get the latest quality report -tsuga quality-reports list | jq '.report.overallScore' -tsuga quality-reports list | jq '.report.teamResults[].teamName' +tsuga quality-reports list # rows for the cluster's latest report +tsuga quality-reports list --cluster # multi-cluster orgs MUST pass --cluster (else 400) +tsuga quality-reports list | jq '.[] | select(.status=="failed") | {ruleId, owner}' +tsuga quality-reports list | jq 'group_by(.owner) | map({owner: .[0].owner, score: .[0].reportOverallScore})' ``` -Key fields: `report.generatedAt`, `report.overallScore`, `report.teamResults[]{teamId, teamName, score, weightedScore, ruleResults[]}`. Generated at least once per 24h. Only supports `list`. +Returns a flat list of rows (one per persisted rule evaluation). Key fields per row: `id`, `reportId`, `ruleId`, `owner` (team id or absent for global), `status` (`passed`/`failed`/`ignored`), `score` (0..1), `weight`, `reportOverallScore` and `reportTotalWeight` (per-owner aggregates, identical across rows of the same `reportId`), `createdAt`, optional `recommendation` and `examples`. Generated at least once per 24h. Only supports `list`. ## Services @@ -163,18 +164,18 @@ Output: `{"nodes": [{"serviceName": ...}], "edges": [{"from": , "to": Applies to all `--query` flags and aggregation `filter` fields. -| Syntax | Meaning | Example | -| ---------------------- | ------------------------------- | ---------------------------------------------------------------- | -| `field:value` | Exact match | `level:ERROR` | -| `term1 term2` | AND (default) | `level:ERROR context.service.name:api` | -| `term1 OR term2` | OR (must be uppercase) | `level:ERROR OR level:WARN` | -| `NOT term` | Negation (uppercase) | `level:ERROR NOT context.env:staging` | -| `(...)` | Grouping | `(level:ERROR OR level:WARN) context.service.name:api` | -| `(field:a OR field:b)` | Multiple values for one field | `(context.service.name:web-backend OR context.service.name:api)` | -| `field:*` | Field exists | `trace_id:*` | -| `field:[A TO B]` | Range, inclusive | `duration:[100 TO 500]` | -| `field:>N` | Numeric (also `>=`, `<`) | `duration:>100` | -| `` | Free-text token in log message | `refused`, `connection refused`, `(timeout OR refused)` | +| Syntax | Meaning | Example | +| ---------------------- | ------------------------------ | ---------------------------------------------------------------- | +| `field:value` | Exact match | `level:ERROR` | +| `term1 term2` | AND (default) | `level:ERROR context.service.name:api` | +| `term1 OR term2` | OR (must be uppercase) | `level:ERROR OR level:WARN` | +| `NOT term` | Negation (uppercase) | `level:ERROR NOT context.env:staging` | +| `(...)` | Grouping | `(level:ERROR OR level:WARN) context.service.name:api` | +| `(field:a OR field:b)` | Multiple values for one field | `(context.service.name:web-backend OR context.service.name:api)` | +| `field:*` | Field exists | `trace_id:*` | +| `field:[A TO B]` | Range, inclusive | `duration:[100 TO 500]` | +| `field:>N` | Numeric (also `>=`, `<`) | `duration:>100` | +| `` | Free-text token in log message | `refused`, `connection refused`, `(timeout OR refused)` | AND is the default (space-separated terms); OR binds looser than AND. `OR` and `NOT` must be uppercase — lowercase `or`/`not` become literal search terms. @@ -272,7 +273,7 @@ Example — per-second rate of a counter: ``` - `timeRange` uses **Unix seconds** (not relative strings like `"-1h"`) -- On a **multi-cluster tenant**, `aggregation scalar|timeseries` needs a top-level `"clusterId": ""` **in the body** — the global `--cluster` flag and `defaults.cluster` are *ignored* for `aggregation` (every other command honors them). Without it: `400 clusterId must be provided when the organization has multiple clusters`. +- On a **multi-cluster tenant**, `aggregation scalar|timeseries` needs a top-level `"clusterId": ""` **in the body** — the global `--cluster` flag and `defaults.cluster` are _ignored_ for `aggregation` (every other command honors them). Without it: `400 clusterId must be provided when the organization has multiple clusters`. - `dataSource` and `formula` are **body-level** fields; query items do not have `id` or `dataSource` - `formula` references queries by position: `"q1"` = first query, `"q2"` = second, etc. - `groupBy` is at **body level** (not inside query items): `[{"fields": ["field.name"], "limit": N}]` diff --git a/plugins/tsuga/skills/tsuga-cli/references/playbooks/reliability-review.md b/plugins/tsuga/skills/tsuga-cli/references/playbooks/reliability-review.md index c76ba00..d5d8daf 100644 --- a/plugins/tsuga/skills/tsuga-cli/references/playbooks/reliability-review.md +++ b/plugins/tsuga/skills/tsuga-cli/references/playbooks/reliability-review.md @@ -9,15 +9,16 @@ Use when asked for an observability posture overview, quality scores, reliabilit ## Workflow -1. `tsuga quality-reports list` — extract `report.generatedAt`, `report.overallScore`, `report.teamResults[]` sorted by `score` ascending. -2. If `report.generatedAt` is more than 48 hours ago: flag as potentially stale before continuing. -3. Flag teams below threshold; for each: extract `ruleResults[]` to identify failing rules — key fields: `key`, `title`, `recommendation`. -4. `tsuga teams list` — cross-reference `teamId` → team name (quality report includes `teamName` but verify against live data). -5. `tsuga monitors list` — identify teams with low quality score **and** few or no monitors; these represent the highest combined risk. +1. `tsuga quality-reports list` (pass `--cluster ` in multi-cluster orgs) — returns a flat array of rule-evaluation rows. Each row has `ruleId`, `owner` (team id or absent for global), `status`, `score`, `weight`, `reportOverallScore` and `reportTotalWeight` (per-owner aggregates), `createdAt`, optional `recommendation` and `examples`. +2. Group rows by `owner` to get per-team scores. Within each team's rows, every row carries the same `reportOverallScore` — use that as the team score. The report timestamp is `min(.[].createdAt)`. Rows where `owner` is absent are global (cluster-level) rules. +3. If the derived report timestamp is more than 48 hours ago: flag as potentially stale before continuing. +4. Flag teams below threshold; for each, list rows where `status == "failed"` — key fields: `ruleId`, `recommendation`, optional `examples`. +5. `tsuga teams list` — cross-reference `owner` (team id) → team name. +6. `tsuga monitors list` — identify teams with low quality score **and** few or no monitors; these represent the highest combined risk. ## Evidence Requirements -- Always state `report.generatedAt` — this is a snapshot, not live data. +- Always state the derived report timestamp — this is a snapshot, not live data. - Score conventions (document as conventions, not official Tsuga severity levels): - ≥ 0.8 = healthy - 0.6–0.79 = at risk @@ -28,29 +29,29 @@ Use when asked for an observability posture overview, quality scores, reliabilit ``` ## Reliability Review -Report generated: | Overall score: -[If generatedAt > 48h ago: "⚠️ Report may be stale — generated hours ago."] +Report generated: +[If derived timestamp > 48h ago: "⚠️ Report may be stale — generated hours ago."] Teams assessed: | Below threshold (): ## Team Scores | Team | Score | Status | Top failing rule | |---|---|---|---| -| | | healthy / at-risk / failing | | +| | | healthy / at-risk / failing | | ## Failing Rules Detail (teams below ) -**Team: ** (score: ) -- ❌ : → Recommendation: +**Team: ** (score: ) +- ❌ : ## Double Risk: Low Quality + Low Monitor Coverage Teams flagged in both this review and a monitor coverage audit: - : score=, no monitors found ## Recommended Actions -1. : address +1. : address ## Limitations -- Report generated at ; changes after this time are not reflected +- Report generated at ; changes after this time are not reflected - Scores measure instrumentation quality, not incident risk or SLO compliance - Score thresholds (healthy/at-risk/failing) are conventions, not official Tsuga severity levels - Monitor coverage check is text-based; glob patterns in monitor filters may cover services not directly matched @@ -58,10 +59,10 @@ Teams flagged in both this review and a monitor coverage audit: ## Safety Rules -- Always state `report.generatedAt` in the output. Never present quality scores without the report timestamp. +- Always state the derived report timestamp in the output. Never present quality scores without it. - Never equate quality score with incident probability or current health. - Score thresholds (0.7 default) are conventions — state this explicitly in output. -- If `generatedAt` > 48h ago: flag as potentially stale before presenting findings. +- If the derived timestamp is > 48h ago: flag as potentially stale before presenting findings. ## Related / Next Steps