Date: 2026-02-24 Project: MedGemma Competition - Terraphim-AI Crate Integration Status: SUBMITTED. All 42 beads closed. v1.2.0 tagged, released, and pushed. Handover To: Development Team / Maintainers
Completed full migration of medgemma-competition from standalone reimplementations to
shared terraphim-ai crates behind medical feature flags. Proved end-to-end with real
MedGemma 4B GGUF inference on both GPU (22.7s/case avg, RTX 2070) and CPU (165s/case).
All mock fallbacks removed from production paths. Added interactive demo UI, 85s demo video
(Playwright recording with real GPU inference), 4 clinical workflow state machines with 60
scenario tests, 18 multi-specialty evaluation cases, Axum API server with shared LLM state,
and 9 Playwright e2e tests. Final submission v1.2.0 released on GitHub with demo-video.mp4,
WRITEUP.md, COMPETITION_EVIDENCE.md, and .env.template as release artifacts.
Session 6 (2026-02-24): Wired MedGemma GGUF client into API server via Axum shared state
(Arc<ClinicalService>), added CUDA GPU support, ran 3rd evaluation (18/18 pass, avg 24.8s),
verified demo.html Live mode via WebSocket, created 9 Playwright e2e tests, recorded 85s demo
video, refreshed A/B comparison (BRAF case reproducibly shows class-suggestion vs specific-drug
improvement), updated all docs, tagged v1.2.0, created GitHub release.
559 tests (up from 543), all passing.
MedGemma acts as System 1 (fast intuition) -- generating fluent recommendations from parametric medical knowledge. The Terraphim knowledge graph acts as System 2 (deliberate reasoning) -- grounding, validating, and constraining those recommendations against structured clinical evidence. Neither alone is sufficient.
The core innovation is graph-based symbolic embeddings for clinical safety:
- 27 medical node types and 65 edge types with typed graph traversal
- Symbolic similarity: Jaccard (0.7) + path distance (0.3) -- deterministic, auditable
- LeftmostLongest entity extraction: Aho-Corasick automaton always grounds to the most specific SNOMED concept (e.g., "non-small cell lung carcinoma" not "lung carcinoma"), ensuring correct downstream treatment lookups
- Safety gate: Every MedGemma recommendation validated against KG treatment subgraph. Ungrounded recommendations (e.g., Pembrolizumab for EGFR L858R+ NSCLC) blocked before reaching clinician
- Traceable evidence paths: Drug->Treats->Disease->HasVariant->Gene->CitedIn->Trial
Two repositories involved:
terraphim/terraphim-ai- upstream crate library (PR #551, branchmedical-extensions)terraphim/medgemma-competition- consumer project (commits onmain, tagv1.1.0)
-
LeftmostLongest fix (commit
e9ab233)- Found bug:
EntityExtractor::new()andfrom_terms()usedAhoCorasick::new()(defaults to LeftmostFirst) - Fixed all 3 call sites in
extractor.rsandumls_extractor.rsto useLeftmostLongest - Added 2 tests:
test_leftmost_longest_prefers_full_concept_over_fragment,test_leftmost_longest_from_terms - Test count: 541 -> 543
- Found bug:
-
GPU validation (reports
46d9cca9,17a45b91)- All 3 pipelines run sequentially on RTX 2070 (35/35 layers CUDA0)
- e2e_pipeline: 47 pass, 2 fail (safety gate correct), 55s total
- e2e_real_model: 18/18 pass, avg 22.7s/case, all 15 checks PASSED
- ab_comparison: 3/3 cases, KG grounding specificity confirmed
-
Submission packaging (tag
v1.1.0)- MIT LICENSE file added
- README.md rewritten with current stats
- All submission docs updated with LeftmostLongest explanation
- System 1 + System 2 framing added to WRITEUP.md and COMPETITION_EVIDENCE.md
-
Beads cleanup: Closed 6 stale issues, all 42/42 now closed
- 543 workspace tests passing, 0 failures
- All code committed and pushed to
origin/main - Working tree clean (only untracked:
progress.txt, one stale eval report) - GPU inference: 22.7s/case avg (RTX 2070, 35/35 layers)
- CPU inference: 165s/case avg (no GPU required)
- Safety gate: 100% across all runs (54 total inference calls)
- Nothing currently blocked
1a40ab0 docs: add GPU validation report 17a45b91 (full sequential run)
c5de069 docs: add System 1 + System 2 dual-process architecture framing
bb25154 docs: highlight LeftmostLongest grounding across all submission docs
aa28269 chore: sync beads - all 42 issues closed
e9ab233 fix: enforce LeftmostLongest match in EntityExtractor for grounding precision
6b10d08 docs: add GPU inference results (RTX 2070, report 79d26e2e)
f4f3582 fix: update HTML demos with real evaluation data (541 tests, 18/18 cases, GPU timing)
59d2c98 docs: add real A/B comparison results, remove criteria table
0b0a849 feat: add A/B comparison example, remove fabricated precision benchmark
- State machines: 60 tests (case_status: 9, genomic_report: 14, treatment_plan: 19, recommendation: 22)
- LeftmostLongest: 2 tests (grounding precision validation)
- Clinical pipeline: ~200 tests
- Agent messaging/supervision: ~100 tests
- PGx, evaluation, other: ~181 tests
Total Issues: 42, Open: 0, In Progress: 0, Blocked: 0, Closed: 42
| Pipeline | Cases | Avg Latency | Result |
|---|---|---|---|
| e2e_pipeline | 49 checks | 55s total | 47 pass, 2 fail (safety gate correct) |
| e2e_real_model | 18/18 | 22.7s/case | All 15 checks PASSED |
| ab_comparison | 3 cases | ~22s each | KG grounding specificity confirmed |
| cargo test | 543/543 | N/A | 0 failures |
| Case | Raw MedGemma (System 1 only) | KG-Grounded (System 1 + 2) |
|---|---|---|
| EGFR NSCLC | Osimertinib 80mg | Osimertinib 80mg (+ SNOMED grounding) |
| CYP2D6 Codeine | Oxycodone 5mg/mL | Codeine 60mg (PGx-aware) |
| BRAF Melanoma | "BRAF inhibitor (e.g., Dabrafenib + Trametinib)" | Vemurafenib 450mg (specific) |
| Deliverable | File | Status |
|---|---|---|
| Technical writeup | WRITEUP.md |
Done (with System 1+2 and LeftmostLongest sections) |
| Demo UI | demo.html |
Done (self-contained, 1,813 lines) |
| Demo video | docs/demo-video/demo-pipeline-3min.mp4 |
Done (15 MB, git-lfs) |
| Evaluation results | tests/evaluation/output/ |
Done (7 reports: CPU, GPU, sequential) |
| README | README.md |
Done (543 tests, GPU/CPU timing) |
| LICENSE | LICENSE |
Done (MIT) |
| Competition evidence | COMPETITION_EVIDENCE.md |
Done (real inference, no mock) |
- Added
MedicalNodeType(27 variants) andMedicalEdgeType(65 variants) behind#[cfg(feature = "medical")] - Commit:
e1147d62
MedicalRoleGraphwrappingRoleGraphwith typed nodes/edges, IS-A hierarchy- Symbolic embeddings (Jaccard 0.7 + path distance 0.3), adjacency index
- 78 tests
- Commits:
a4b8dbe8,89ef5a62,127bd7ee,494a6961
- SNOMED EntityExtractor, UMLS UmlsExtractor, ShardedUmlsExtractor
- daachorse sharded automaton with bincode+zstd serialization
- LeftmostLongest match enforcement (fixed in
e9ab233) - 43+ tests (including 2 LeftmostLongest validation tests)
- Commits:
cee1f788,fd70034f,9f2958fb,e9ab233
- Replaced copied mailbox/router/supervisor with path deps
- Commit:
dfcd2d7
- Migrated to MedicalRoleGraph, retired terraphim-kg (5,784 lines deleted)
- Commits:
93aa6a1,b4a8d8b,0e20cff
- 49-check pipeline example, 18/18 real model eval, Vertex AI backend
- Commits:
2e317ff,a5828f6,d478f04
- Writeup, README, repo cleanup, evaluation scenarios
- Commits:
d56c5ec,2020cf6,8083c09,de19398
- demo.html, 4 state machines (60 tests), 3-min video recording
- Commits:
266dc42,56713c9,71693a3
- LeftmostLongest fix, GPU pipeline validation, System 1+2 framing
- MIT LICENSE, README rewrite, all docs updated
- Commits:
e9ab233,893d8bf,bb25154,c5de069,1a40ab0 - Tag:
v1.1.0
| ID | Severity | Fix | Commit |
|---|---|---|---|
| P0-1 | Critical | magic_unpair f32->f64 for SNOMED IDs (100M-900M) |
9f2958fb |
| P0-2 | Critical | Overlap detection: start < m.span.1 && end > m.span.0 |
9f2958fb |
| P0-3 | Critical | LeftmostLongest enforcement in EntityExtractor (grounding precision) | e9ab233 |
| P1-1 | Important | Adjacency index for O(degree) edge lookups | 127bd7ee |
| P1-2 | Important | Multi-CUI term preservation in ShardedUmlsExtractor | fd70034f |
| P1-3 | Important | SNOMED FSN semantic tag parsing for node types | 494a6961 |
| P1-4 | Important | SASS-only CUDA compilation for RTX 2070 (sm_75) | 4d491fe |
| File | Size | Purpose |
|---|---|---|
data/artifacts/umls_automata.bin.zst |
209MB | Pre-built UMLS Aho-Corasick automaton |
data/artifacts/cpic_database.bin.zst |
16KB | CPIC PGx rules |
data/snomed_thesaurus.json |
10KB | Curated SNOMED mappings (49 terms) |
data/automata/words_cui.tsv |
789MB | Raw UMLS term-CUI mappings |
cargo test --workspace # 543 tests, ~50scargo test -p terraphim-medical-agents -- state_machines # 60 tests, <1s# GPU (recommended, ~1min total)
cargo run --release --example e2e_pipeline --package terraphim-demo --features medgemma-client/cuda
# CPU fallback (~5min total)
cargo run --release --example e2e_pipeline --package terraphim-democargo run --release --example e2e_real_model --package terraphim-demo --features medgemma-client/cudacargo run --release --example ab_comparison --package terraphim-demo --features medgemma-client/cudaRTX 2070 has 8GB VRAM. Running multiple GGUF inference processes simultaneously causes
Failed to create context: NullReturn errors. Always run one inference pipeline at a time.
./scripts/setup_vertex_ai.sh # one-time
cargo run --release --example e2e_vertex_ai --package terraphim-demopython3 -m http.server 8091 # then visit http://localhost:8091/demo.html- Submit to MedGemma Impact Challenge -- all deliverables ready at tag
v1.1.0 - Merge PR #551 (terraphim-ai) -- unblock path dep cleanup
- Error propagation scenarios (#48) - 11 cross-object failure path tests
- Vaccine design pipeline (#44) - new state machine, same pattern as existing 4
- Evidence retrieval service (#47) - replace stub with real implementation
- Clinical trial matching (#45) - needs external data source decision
- Rare disease differential diagnosis (#46) - complex domain logic
- Meta-Cortex (#33) - multi-disciplinary coordination architecture
-
GPU VRAM contention: RTX 2070 (8GB) cannot run two GGUF inference processes simultaneously. Run pipelines sequentially.
-
UMLS extraction quality: Full UMLS dataset includes single-character terms producing noisy results. Use SNOMED EntityExtractor with curated terms for cleaner output.
-
PR #551 branch history:
medical-extensionsbranch carries commits from PR #543. Squash-merge recommended. -
Path dependencies: Relative path deps (
../../terraphim-ai/crates/...) require both repos checked out as siblings. For CI, consider git deps. -
CUDA version mismatch: nvcc 13.1 vs driver 13.0 requires SASS-only compilation.
.cargo/config.tomlhasCMAKE_CUDA_ARCHITECTURES=75andNVCC_FLAGSset for sm_75. -
Untracked files:
progress.txtand one stale eval report (e4e4bada) are untracked. Safe to delete or .gitignore.