Skip to content

AgentBoundary v0.1 conformance evaluation of AGT — pre-publication review #2449

@sunilp

Description

@sunilp

Hi @imran-siddique and AGT team —

Following up on the conversation in #302 about PolicyDecision schema interop and drop-in compatibility across backends (APS, AGT, YAML), I wanted to share a piece I've been building in parallel and give the AGT team a 7-day right-to-respond window before publication.

I run JamJet Labs and have been authoring an open spec for AI-action receipts called AgentBoundary (jamjet-labs/agentboundary, v0.1 stable + v0.2-alpha draft). Where the PolicyDecision interop work focuses on the decision surface, AgentBoundary focuses on the downstream receipt surface — a portable, tamper-evident JSON record of what the action turned out to be, that a third party can verify without trusting the runtime.

I built a 40-scenario conformance suite and graded it against four prominent agent-governance products including AGT. AGT scored highest of the four.

What I did:

  • Read docs/specs/AUDIT-COMPLIANCE-1.0.md and docs/ARCHITECTURE.md
  • Built an adapter at adapters/microsoft-agt/ that translates AGT AuditEntry (+ optional DecisionBOM + workflow approval event) into an AgentBoundary v0.2-alpha receipt
  • Ran all 40 conformance scenarios against adapter-translated receipts
  • Per-scenario verdicts in results.md; field-by-field mapping in mapping.md

Headline:

PASS         17
PARTIAL       5
DOCS-ONLY     1
NOT COVERED  15
N/A           2
──────────────
TOTAL        40

The 15 NOT COVERED rows reflect AGT-side schema gaps where the AuditEntry doesn't carry data AgentBoundary requires:

  • No normative arguments_hash (mutation defense)
  • No approver identity in the audit row (approval-chain verification)
  • No policy version field (downgrade defense)
  • Single timestamp per entry (no issued-vs-completed split)
  • No environment field (prod/staging/dev distinction)

Each maps to a known design choice on AGT's side — not bugs, but deliberate scoping. The framing in my report is that AGT and AgentBoundary's design centres are complementary: runtime enforcement + decision-lineage reconstruction (AGT) vs portable third-party verification of receipts (AgentBoundary). Two different layers; same compliance picture.

Two things AGT does better than AgentBoundary v0.2-alpha — both v0.3 adoption candidates on my side:

  1. Merkle chain across actions (previous_hash). v0.2-alpha is singly-linked (prior_receipt); weaker against arbitrary-entry-reordering attacks. AGT's approach is structurally stronger.
  2. DecisionBOM.completeness_score with per-BOMField reconstruction confidence. v0.2-alpha has a coarser three-tier provenance enum (observed / inferred / synthesized). A numeric confidence per field is meaningfully richer.

Both feed back into the PolicyDecision interop work cleanly — if AGT entries can map round-trip to AgentBoundary receipts (and vice versa), the receipt format becomes another axis of the drop-in replacement story #302 was about.

The ask: if any per-scenario mapping or factual claim is wrong, corrections are welcome via this issue or via PR to jamjet-labs/agentboundary within 7 days of this post. After that, the report publishes with the data as currently mapped.

Happy to discuss here or directly. The draft report is private until publication; happy to share §7.4 (the AGT section, ~600 words) for a sneak look if either of you wants one.

Thanks for shipping AGT — it raised the bar for everyone in this space.

— Sunil

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-review:HIGHContributor reputation check flagged HIGH risk

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions