Summary
cmd/ox/doctor_types.go defines a DoctorCheckRegistry map that every doctor check registers into via RegisterDoctorCheck() at init time. The registry looks like the source of truth for doctor execution, and cmd/ox/doctor_registry_example.go:77-79 documents a "Phase 5" plan to have runDoctorChecks iterate DefaultRegistry.RunAll().
That migration was never completed. runDoctorChecks() in cmd/ox/doctor.go hand-wires execution via literal categoryChecks := []checkResult{checkFoo(...), checkBar(...)} slices, calling each check function by name. The DoctorCheckRegistry map is only consulted after execution by enrichCheckResult() to attach slug/fixLevel metadata to the already-computed checkResult.
Consequence
A developer adding a new check:
- Writes
checkFoo() in doctor_foo.go
- Calls
RegisterDoctorCheck(DoctorCheck{Slug: CheckSlugFoo, ...}) in the file's init()
- Forgets to also add
checkFoo(...) to the relevant categoryChecks slice in doctor.go
Result: the check is fully registered, has unit test coverage (tests call it directly), shows up under ox doctor --fix-slug=foo (which uses targeted dispatch — a separate code path that DOES consult the registry), but is silently skipped by bare ox doctor / ox doctor --fix. No error. No warning. The check's tests pass. It simply never runs in production.
How it was discovered
The journal-timezone revert added checkTimezoneScrub to auto-scrub dead timezone: keys from .sageox/config.json and team config.toml during the team-timezone feature deletion. The unit test TestCheckTimezoneScrub_ScrubsProjectAndTeamConfigs passed because it called checkTimezoneScrub() directly. The E2E test TZ-04 failed because it invoked the full ox doctor --fix --yes pipeline as a subprocess and the scrub never happened.
The fix for the tz revert case was a one-line wire-up to runDoctorChecks (separate commit). But a systemic scan of DoctorCheckRegistry vs. the runDoctorChecks call sites found 10 more checks with the same pattern.
Confirmed dormant checks (registered but never invoked by default)
All 10 are reachable via ox doctor --fix-slug=<slug> (targeted path), but ox doctor / ox doctor --fix skips them:
| Slug constant |
Function |
Defined in |
CheckSlugLedgerPathMismatch |
checkLedgerPathMismatch |
doctor_git_repos.go:1235 |
CheckSlugSessionStartHookBug |
checkSessionStartHookBug |
doctor_hooks.go:31 |
CheckSlugEndpointNormalization |
checkEndpointNormalization |
doctor_sageox.go:1047 |
CheckSlugDuplicateRepoMarkers |
checkDuplicateRepoMarkers |
doctor_sageox.go:1268 |
CheckSlugScoreThresholdRange |
checkScoreThresholdRange |
doctor_project.go:214 |
CheckSlugGCBlockedUntracked |
checkGCBlockedByUntracked |
doctor_team.go:305 |
CheckSlugTeamSparseCheckout |
checkTeamSparseCheckout |
doctor_team.go:402 |
CheckSlugLedgerSparseCheckout |
checkLedgerSparseCheckout |
doctor_ledger_infra.go:18 |
CheckSlugCodeDBConsistency |
checkCodeDBConsistency |
doctor_ledger_infra.go:82 |
CheckSlugDaemonDedup |
checkDaemonDeduplication |
doctor_daemon_dedup.go:15 |
The worst offenders are checkEndpointNormalization and checkDuplicateRepoMarkers — those catch real outage-causing misconfigurations per .claude/rules/endpoints.md and have been effectively dormant.
Fix options
(a) Finish Phase 5: make runDoctorChecks iterate the registry
The real fix. Convert runDoctorChecks() to iterate DoctorCheckRegistry (or its Category-indexed view) and call each entry, rather than hand-wiring category slices. This touches the execution engine and risks reordering behavior or accidentally activating a long-dormant check in production that no one noticed was broken. Needs a careful rollout — possibly a feature flag for the transition or a per-check audit as each is turned on, since some of the 10 dormant checks may have bit-rotted while no one was running them.
(b) Cheap guard: compile-time or test-time assertion
Add a test that asserts every RegisterDoctorCheck() slug has a corresponding call site visible to the compiler. Can be implemented as a go test that walks the AST of doctor.go looking for checkFoo(...) calls matching the registered function names, or as a build-time check. Doesn't fix the 11 dormant checks — just prevents adding a 12th.
Suggested order
Do (b) first to prevent regression, then tackle (a) incrementally with a per-check audit pass. Each dormant check should be individually verified before activation: does it still work? Does it produce false positives? Does its auto-fix behavior match current expectations?
Context
Discovered during the team-timezone revert PR (internal beads: ox-b27, found alongside the ox-ykz one-line fix for checkTimezoneScrub). The revert PR deliberately scoped this out per CLAUDE.md's "revert PR is not cleanup time" rule; fixing it in the same PR would blow scope and make bisect confusing. Filing here so the work can be picked up properly.
Summary
cmd/ox/doctor_types.godefines aDoctorCheckRegistrymap that every doctor check registers into viaRegisterDoctorCheck()at init time. The registry looks like the source of truth for doctor execution, andcmd/ox/doctor_registry_example.go:77-79documents a "Phase 5" plan to haverunDoctorChecksiterateDefaultRegistry.RunAll().That migration was never completed.
runDoctorChecks()incmd/ox/doctor.gohand-wires execution via literalcategoryChecks := []checkResult{checkFoo(...), checkBar(...)}slices, calling each check function by name. TheDoctorCheckRegistrymap is only consulted after execution byenrichCheckResult()to attach slug/fixLevel metadata to the already-computedcheckResult.Consequence
A developer adding a new check:
checkFoo()indoctor_foo.goRegisterDoctorCheck(DoctorCheck{Slug: CheckSlugFoo, ...})in the file'sinit()checkFoo(...)to the relevantcategoryChecksslice indoctor.goResult: the check is fully registered, has unit test coverage (tests call it directly), shows up under
ox doctor --fix-slug=foo(which uses targeted dispatch — a separate code path that DOES consult the registry), but is silently skipped by bareox doctor/ox doctor --fix. No error. No warning. The check's tests pass. It simply never runs in production.How it was discovered
The
journal-timezonerevert addedcheckTimezoneScrubto auto-scrub deadtimezone:keys from.sageox/config.jsonand teamconfig.tomlduring the team-timezone feature deletion. The unit testTestCheckTimezoneScrub_ScrubsProjectAndTeamConfigspassed because it calledcheckTimezoneScrub()directly. The E2E testTZ-04failed because it invoked the fullox doctor --fix --yespipeline as a subprocess and the scrub never happened.The fix for the tz revert case was a one-line wire-up to
runDoctorChecks(separate commit). But a systemic scan ofDoctorCheckRegistryvs. therunDoctorCheckscall sites found 10 more checks with the same pattern.Confirmed dormant checks (registered but never invoked by default)
All 10 are reachable via
ox doctor --fix-slug=<slug>(targeted path), butox doctor/ox doctor --fixskips them:CheckSlugLedgerPathMismatchcheckLedgerPathMismatchdoctor_git_repos.go:1235CheckSlugSessionStartHookBugcheckSessionStartHookBugdoctor_hooks.go:31CheckSlugEndpointNormalizationcheckEndpointNormalizationdoctor_sageox.go:1047CheckSlugDuplicateRepoMarkerscheckDuplicateRepoMarkersdoctor_sageox.go:1268CheckSlugScoreThresholdRangecheckScoreThresholdRangedoctor_project.go:214CheckSlugGCBlockedUntrackedcheckGCBlockedByUntrackeddoctor_team.go:305CheckSlugTeamSparseCheckoutcheckTeamSparseCheckoutdoctor_team.go:402CheckSlugLedgerSparseCheckoutcheckLedgerSparseCheckoutdoctor_ledger_infra.go:18CheckSlugCodeDBConsistencycheckCodeDBConsistencydoctor_ledger_infra.go:82CheckSlugDaemonDedupcheckDaemonDeduplicationdoctor_daemon_dedup.go:15The worst offenders are
checkEndpointNormalizationandcheckDuplicateRepoMarkers— those catch real outage-causing misconfigurations per.claude/rules/endpoints.mdand have been effectively dormant.Fix options
(a) Finish Phase 5: make
runDoctorChecksiterate the registryThe real fix. Convert
runDoctorChecks()to iterateDoctorCheckRegistry(or its Category-indexed view) and call each entry, rather than hand-wiring category slices. This touches the execution engine and risks reordering behavior or accidentally activating a long-dormant check in production that no one noticed was broken. Needs a careful rollout — possibly a feature flag for the transition or a per-check audit as each is turned on, since some of the 10 dormant checks may have bit-rotted while no one was running them.(b) Cheap guard: compile-time or test-time assertion
Add a test that asserts every
RegisterDoctorCheck()slug has a corresponding call site visible to the compiler. Can be implemented as ago testthat walks the AST ofdoctor.golooking forcheckFoo(...)calls matching the registered function names, or as a build-time check. Doesn't fix the 11 dormant checks — just prevents adding a 12th.Suggested order
Do (b) first to prevent regression, then tackle (a) incrementally with a per-check audit pass. Each dormant check should be individually verified before activation: does it still work? Does it produce false positives? Does its auto-fix behavior match current expectations?
Context
Discovered during the team-timezone revert PR (internal beads:
ox-b27, found alongside theox-ykzone-line fix forcheckTimezoneScrub). The revert PR deliberately scoped this out per CLAUDE.md's "revert PR is not cleanup time" rule; fixing it in the same PR would blow scope and make bisect confusing. Filing here so the work can be picked up properly.