Skip to content

[Hackathon] feat: AI-augmented macro operators#5115

Open
Xiao-zhen-Liu wants to merge 65 commits into
apache:mainfrom
Xiao-zhen-Liu:xiaozhen-hackathon-macro
Open

[Hackathon] feat: AI-augmented macro operators#5115
Xiao-zhen-Liu wants to merge 65 commits into
apache:mainfrom
Xiao-zhen-Liu:xiaozhen-hackathon-macro

Conversation

@Xiao-zhen-Liu
Copy link
Copy Markdown
Contributor

@Xiao-zhen-Liu Xiao-zhen-Liu commented May 16, 2026

PR Description: AI-Augmented Macro Operators

What changes were proposed in this PR?

A user builds a workflow today by dropping individual operators onto the canvas one at a time. As the workflow grows, the canvas turns into a wall of nodes; common sub-DAGs (CSV → Filter → Projection, an enrichment join, a feature-prep chain) get re-built by hand every time. There's no first-class way to encapsulate a sub-DAG, share it, or have the agent suggest one — and once a sub-DAG is repeated, there's no signal pushing the user to refactor.

This PR introduces macro operators: a logical-plan-level abstraction that lets a sub-DAG live as a single node on the canvas, plus the AI surfaces (suggest, fuse, drill-down) that make encapsulation discoverable.

Demo Video

Demo

Before / after

User task Before After
Reuse a sub-DAG Hand-rebuild every time Right-click → Create macro → instance lives in "Your Macros" palette
Discover refactor opportunities Eyeball the canvas Suggest Macros (AI) ranks candidates with confidence chips; recurring patterns auto-tier as ✓ recommended
Share a macro across workflows Copy-paste ops manually Export as a self-contained JSON bundle (nested macros travel transitively)
Compose sub-DAGs One level only Macros nest arbitrarily; the canvas + drill-down rolls execution stats up at every level
Speed up a sub-DAG Manually write a Python UDF Right-click → fuse for performance (AI) collapses the body into one PythonUDFOpDescV2; canvas shows ⚡ FUSED · 1.6×
Inspect a macro body No path Double-click the macro op → drill-down editor renders the body, with live execution stats flowing in
Re-use the same shape twice Materialize each independently Auto-optimize finds every occurrence of a pattern and swaps them all in one click

The story

Drop a few operators on the canvas — CSV scan, two filters, a projection, a sink. The ✨ Suggest Macros (AI) button in the palette already shows a 2 badge before you click — the agent has been silently scanning the workflow on every graph change. Click it: a panel slides in with two candidates, the strongest tagged ✓ recommended because the same Filter → Projection shape appears twice in your workflow. The suggested name is data_cleaning (domain-aware, not filter_projection_block). Hover a row — the matching ops light up on canvas. Click it: the selected ops collapse into a single macro node, and the same shape elsewhere gets the same swap because the auto-optimize pass spotted the duplicate.

Run the workflow. Stats roll up to the macro node — its input/output port counts and state badge update in real time, summed from the inner ops the engine actually executes. Double-click the macro: the canvas swaps to a drill-down editor showing the body with the same dagre auto-layout as the main canvas, and stats keep flowing because the body-relative op IDs are aliased to their post-expansion runtime UUIDs. Click into a nested macro inside the body — stats keep working three levels deep.

Want it faster? Right-click the macro → fuse for performance (AI). The frontend's codegen walks the body operators (Filter, Projection, Regex, Limit, Distinct, inlined PythonUDFV2 with yield-rewriting), emits a process_tuple function, attaches it as a MacroFusion payload with verified = true, and stamps the operator gold with a ⚡ FUSED · 1.6× badge. The speedup is grounded: N − 1 removed actor handoffs × 0.30 × per handoff, capped at per VLDB 2024 §6's empirical measurements. Run again. At compile time, the backend's MacroExpander reads fusion.verified = true and substitutes a single PythonUDFOpDescV2 for the inlined body — no inter-actor serialization for the collapsed steps.

Share a macro? Right-click → export. You get a portable JSON bundle that includes every nested macro the root depends on, so importing on a fresh Texera instance reconstructs the whole dependency graph.

How it works under the hood

Macros live at the logical-plan layer only. A new MacroExpander pre-compile pass (mirrored in amber/ and workflow-compiling-service/) inlines every MacroOpDesc into its body operators and rewrites parent edges, so the physical-plan layer never sees a macro. The expander runs before expandLogicalPlan, and the rest of the engine pipeline behaves as if the workflow had been hand-flattened.

Deterministic UUIDs for inner ops. The expander assigns each inner op a fresh ID via UUID.nameUUIDFromBytes("${macroInstanceId}|${originalBodyOpId}"). Required because:

  • The original ${instanceId}--${innerOp} prefix scheme produced 170+ char IDs that caused Iceberg commit thrash on HashJoin's internal build-side port — execution that ran fine on a hand-flattened plan hung on the macro-wrapped equivalent.
  • Texera has two compilers — WorkflowCompilingService (frontend validation) and ComputingUnitMaster's WorkflowCompiler (actual execution). Both run MacroExpander on the same workflow content. If they used UUID.randomUUID(), the side-table written by one wouldn't match the runtime stats emitted by the other; deterministic UUIDs guarantee bit-identical plans.

Provenance side-table. MacroExpander populates Map[runtimeOpId → MacroProvenance(macroChain, bodyOpId)] during expansion; WorkflowCompiler drains it after compile and stores it in MacroMappingCache (file-backed at /tmp/texera-macro-mappings/wid-{wid}.json for cross-JVM visibility between ComputingUnitMaster and TexeraWebApplication). Exposed via GET /api/workflow/{wid}/macro-mapping. Frontend WorkflowStatusService.withMacroAggregates walks the chain to roll inner-op stats up to every macro level — parent canvas + each nested drill-down.

Nested macros recurse fully. A runtime op buried three macros deep has macroChain = [outerInstance, middleInstance, innerInstance]; the resolver suffix-matches so a stats roll-up rooted at any level finds its runtime ops.

AI surfaces. MacroSuggestionService runs two heuristic detectors side-by-side: linear chains (≥2 ops where each interior node has in-deg=1 and out-deg=1) and recurring (opType₁, opType₂, …) window patterns. Recurring patterns auto-tier as ✓ recommended; clean middle chains tier as strong fit. Names map domain-aware substrings (csv.*scan.*filter.*projectioncsv_preprocessing, regex.*filtertext_filtering, etc.) instead of underscore-joining op types. MacroFusionService emits a Python UDF body from the macro body, covering Filter, Projection, Regex, Limit, Distinct, inlined PythonUDFV2 (yield-rewritten). The fusion.verified = true flag is the contract MacroExpander reads to substitute; the rest of the speedup estimate is presentation.

What this also fixes along the way

  • /api/macro/* HTTP storm — lazy fetches in template bindings caused an infinite loop; reverted to a flat palette and removed the lazy fetches.
  • Engine error visibility — phase-transition errors and missing-schema-port errors now propagate out of RegionExecutionCoordinator instead of stalling silently.
  • View-result inside a macro — drill-down result lookups go body-relative-id → runtime-UUID via MacroService.buildBodyOpIdToRuntimeUuidMap (replaces the obsolete prefix-based alias). Mega-macros with 0 external outputs alias the canvas op to the first body sink, so the auto-stored terminal output is reachable without drilling.
  • Back-to-parent statsWorkflowStatusService re-aggregates the cached raw status on every runtimeMacroMappingTick; its emission Subject becomes ReplaySubject(1) so the canvas remount after navigation sees the latest snapshot immediately.
  • Jackson macroSyncedAt / estimatedSpeedup UnrecognizedPropertyExceptionMacroOpDesc and MacroFusion both annotated with @JsonIgnoreProperties(ignoreUnknown = true) so UI-only convenience fields don't break deserialization at execute time.

Related issues, documentation, discussions

Related to the Apache Texera Agent Hackathon (#5059). Builds on §9.2 of the macro design doc (AI fusion substitution path).

How was this PR tested?

  • MacroExpanderSpec (~694 lines) covers the expander on its own: single-macro expansion, nested expansion (outer + inner chains), input fan-out (one external port → multiple inner consumers), output fan-in detection (raises), cycle detection across nested macros, depth-limit guard, deterministic-UUID property (same input → same output across compiler instances), and provenance side-table population.
  • MacroOpDescSpec covers Jackson serialization round-trip, including tolerance of unknown frontend-only fields (macroSyncedAt, estimatedSpeedup).
  • End-to-end demo path exercised on a real multi-macro workflow: suggest → materialize → run → drill into nested macro → fuse → run with ⚡ FUSED · 1.6× substitution → unfuse → export bundle → reimport on a fresh wid. Stats roll up correctly at every level; canvas remount after navigation no longer wipes non-macro op state.
sbt "WorkflowExecutionService/testOnly *MacroExpanderSpec"
sbt "WorkflowOperator/testOnly *MacroOpDescSpec"
sbt "WorkflowOperator/compile" "WorkflowCompilingService/compile" "WorkflowExecutionService/compile"
yarn tsc --noEmit  # frontend

Generated by

Claude Code (Claude Opus 4.7)

Xiao-zhen-Liu and others added 30 commits May 14, 2026 17:39
KNIME-metanode-style composite operators for Texera. Macros live purely
at the logical-plan layer: a new MacroExpander pre-pass inlines each
MacroOpDesc into a flat LogicalPlan before physical-plan compilation, so
PhysicalPlan, PhysicalOpIdentity, and the Amber engine remain unchanged.

Backend (new):
- MacroOpDesc, MacroInputOp, MacroOutputOp LogicalOps registered in
  Jackson @JsonSubTypes; getPhysicalPlan throws to signal a missed
  expansion pass.
- MacroBody, MacroLink, MacroPortSpec, MacroFusion data classes.
- MacroExpander: inlines each macro by splicing inner ops/links via
  boundary markers and prefixes inner-op IDs with the instance ID
  (\${macroInstanceId}/\${innerOpId}), so per-macro telemetry can be
  aggregated purely from the operator-ID prefix. Cycle and depth-16
  guards via MacroCompileContext. Pluggable MacroRegistry (Empty /
  inMemory; persistence-backed impl is a later step).
- WorkflowCompiler (workflow-compiling-service) calls
  MacroExpander.expand before scan-source resolution. Backward-
  compatible: new ctor param defaults to MacroRegistry.Empty.
- TODO note in amber WorkflowCompiler; execution-time expansion is a
  later step. Until then, MacroOpDesc.getPhysicalPlan throwing surfaces
  unexpanded plans as a loud compile error rather than silently broken
  execution.

Tests (14 passing):
- MacroOpDescSpec: JSON round-trip, throws on compile, ports match
  inputPortCount/outputPortCount.
- MacroExpanderSpec: pass-through plan, single-port inline, LIVE registry
  fetch, nested macros with concatenated prefix, cycle detection,
  depth-bomb, double-instantiation, input-marker fan-out, missing-LIVE
  error, snapshot immutability across two expansions.

Also includes hackathon-proposal.md (Texera Agent Hackathon submission)
covering the AI suggestion and AI fusion features that layer on top of
this skeleton in later steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sql/updates/23.sql + texera_ddl.sql: workflow_kind_enum, workflow.kind,
  idx_workflow_kind, macro_metadata. Macros reuse the workflow table to inherit
  versioning, ACL, and hub features.
- MacroResource: create/list/get/schema/snapshot endpoints alongside
  WorkflowResource; reuses workflow_user_access for permissions and seeds an
  initial workflow_version so LIVE-mode instances have a vid to pin.
- WorkflowResource.baseWorkflowSelect: bake in kind = WORKFLOW so macros are
  structurally excluded from the workflows tab, the hub, and operator search;
  callers (HubResource, retrieveWorkflowsBySessionUser) updated to .and().
- DbMacroRegistry: jOOQ-backed MacroRegistry that reads workflow.content as a
  serialized MacroBody; wired into the compiling service's WorkflowCompiler.
- TexeraWebApplication: register MacroResource.

The amber-side execution-time WorkflowCompiler still has the existing
TODO(macro-operators) note from Step 1 and is unaffected; that hook is Step 3.
Step 3 closes the TODO at WorkflowCompiler.scala:144 — macros can now be
executed end-to-end, not just compiled by the workflow-compiling-service.

- amber/.../workflow/macroOp/{MacroCompileContext,MacroRegistry,MacroExpander,
  DbMacroRegistry}: parallel copies of the compiling-service equivalents,
  adapted to amber's LogicalLink/LogicalPlan types. The two macro pipelines
  will converge when the broader LogicalPlan unification (existing TODO at
  WorkflowCompiler.scala:137) happens.
- WorkflowCompiler: take an optional MacroRegistry (defaults to Empty); call
  MacroExpander.expand before resolveScanSourceOpFileName + expandLogicalPlan.
- WorkflowExecutionService, SyncExecutionResource: pass new DbMacroRegistry()
  into WorkflowCompiler so LIVE-mode macros resolve against `workflow` rows
  with kind=MACRO.

Step 1 (10/10 MacroExpanderSpec, 4/4 MacroOpDescSpec) and amber's
WorkflowCompilerSpec (6/6) still green.
Adds the smallest user-visible hook for macros: select 2+ operators, right-click
→ "create macro", enter a name. Posts a serialized MacroBody (selected
operators + internal links + MacroInput / MacroOutput boundary markers) to the
new POST /api/macro/create endpoint and surfaces the result via a toast.

The canvas selection is intentionally left in place; replacing it with a
MacroOpDesc node (and rewiring boundary links to the new ports) is the next
slice of Step 4, alongside the palette merge and drill-down editor.

- macro.service.ts: HTTP client + boundary computation (one MacroInput per
  unique inner port that has an external feeder; mirror for MacroOutput).
- context-menu.{html,ts}: new menu entry, wired with a window.prompt for the
  name and NotificationService for the toast. Shown only when 2+ operators
  are selected, no link is highlighted, and the workflow is modifiable.
Rubber-banding a chain of operators in JointJS picks up the connecting links
too, which made `hasHighlightedLinks()` true and silently hid the menu entry
(same reason copy/cut were missing from the user's screenshot). The boundary
computation already classifies internal vs external links from the operator
selection alone, so highlighted links shouldn't gate the entry.
Loading a MACRO row via the workflow editor route blew up the canvas
(workflow-check.ts dereferenced link.source.operatorID; the content is a
MacroBody, which has fromOpId/toOpId, not source/target). Fail fast at the
REST layer instead, with a message pointing at the not-yet-built macro editor.
After POST /api/macro/create succeeds the context menu now:
1. Drops a `Macro` operator at the centroid of the selection with input/output
   ports sized to match the boundary (one input per unique inner port that had
   an external feeder, mirror for output).
2. Deletes the original operators (and with them their internal + boundary
   links) via deleteOperatorsAndLinks.
3. Re-points each former external link at the new macro's corresponding port.
All three steps are wrapped in a single bundleActions transaction so undo
restores the original sub-DAG in one shot.

MacroService.buildMacroFromSelection now returns the boundary metadata
(per-link rewire instructions + input/output port counts) alongside the
backend request payload — same boundary computation, exposed for the swap.

MacroOpDesc on the canvas uses operatorProperties = { macroId, macroVersion,
linkMode: "LIVE", inputPortCount, outputPortCount, displayName } so the
existing workflow-serialization path can roundtrip it to the backend without
extra glue. macroVersion is a placeholder until MacroDetail exposes the
pinned vid.
Track down why the canvas swap isn't visible: log the captured selection, the
built request + boundary metadata, and the swap-vs-throw outcome. Also align
the output-port shape with outputPortToPortDescription (disallowMultiInputs:
false). Tracing will be removed once the issue is identified.
…ction

MacroOpDesc's generated JSON schema includes \`nullable: true\` properties
without a sibling \`type\` (from \`Option[MacroBody]\` / \`Option[MacroFusion]\`).
Ajv refuses to compile that, so the swap threw with "nullable cannot be used
without type" before any canvas mutation could happen. Construct the
OperatorPredicate manually instead — every field is already overridden, so
the schema-default path adds nothing.

The underlying schema bug should still be fixed (it'll also break dragging
Macro from the palette) but that's a separate task in workflow-operator;
right-click → create macro now works without it.
Step 5 first slice: double-clicking a Macro node now navigates to a new route
that loads the macro's body into the same workflow editor canvas.

- Route: \`/dashboard/user/workflow/:id/macro/:macroId\` mounts the existing
  WorkspaceComponent. The parent wid (\`id\`) is kept in the URL so future
  breadcrumb / back-navigation work has it.
- WorkspaceComponent.registerLoadOperatorMetadata picks up \`macroId\` from the
  route and runs a new \`loadMacroWithId\` branch instead of the normal workflow
  load. Auto-persist is disabled via setWorkflowPersistFlag(false) so canvas
  edits don't accidentally hit \`/workflow/persist\` — saving back to the macro
  is the next slice.
- MacroService.macroDetailToWorkflow converts the persisted MacroBody into a
  Workflow shape reloadWorkflow can consume: normalizes inner-op / marker port
  shapes (PortDescription vs PortIdentity), maps MacroLink port-ordinals to
  string portIDs, and auto-lays-out operators with MacroInput on the left,
  MacroOutput on the right, regular inner ops in the middle.
- workflow-editor double-click handler now detects \`operatorType === "Macro"\`
  and routes to the drill-down URL instead of opening the result panel.

Read-only-ish in v1 — the editor will let the user move things around but the
changes don't persist. PUT/POST /macro/{wid}/update + the save flow is the
next commit.
…load

The backend's reflective JSON-schema generator emits \`{nullable: true}\` for
\`Option[...]\` fields whose inner type it can't enumerate
(\`Option[MacroBody]\`, \`Option[MacroFusion]\` on \`MacroOpDesc\`). Ajv
strict-mode refuses to compile schemas with \`nullable\` and no \`type\`, which
threw from everywhere the schema gets compiled — validation-workflow,
property-editor, dynamic-schema, shared-model-change-handler — making the
drill-down editor unusable.

Sanitize once at the source (OperatorMetadataService): walk every operator's
\`jsonSchema\` and delete \`nullable\` when there's no sibling \`type\`. All
downstream Ajv compilations now see well-formed schemas.

The proper backend fix is still tracked in project memory
\`project_macroopdesc_schema_ajv_bug.md\`; this is defense-in-depth that also
hardens us against any future LogicalOp picking up the same shape.
Two issues blocking the macro body from rendering:

1. WorkspaceComponent is reused across route changes (no ngOnDestroy fires
   going /workflow/:id → /workflow/:id/macro/:macroId), so the parent
   workflow's operators+links stayed on the JointJS paper. reloadWorkflow then
   hit \`failed to add link. cause: duplicate link found with same source and
   target\` in shared-model-change-handler when the macro body's marker links
   collided with parent leftovers. Fix: call resetAsNewWorkflow() before
   setNewSharedModel.

2. Macro / MacroInput / MacroOutput had no icon files, so JointJS rendered
   blank/broken-image boxes (operators technically present but invisible).
   Stub with copies of PythonUDFV2.png so they at least render; proper icons
   are a polish task.
…properly

Angular reuses WorkspaceComponent across navigations between /workflow/:id
and /workflow/:id/macro/:macroId, so route.snapshot.params is frozen at
construction time and the macro drill-down didn't actually re-run its loader
when the user double-clicked a macro node — the page only loaded correctly on
a hard refresh.

Subscribe to route.paramMap inside registerLoadOperatorMetadata and dispatch
on every change (deduplicated by id/macroId key). The workflow branch also
re-enables the persist flag, since the macro drill-down disables it.
In-tab Angular router navigation between /workflow/:id and
/workflow/:id/macro/:macroId reuses WorkspaceComponent. Despite
resetAsNewWorkflow() + setNewSharedModel() + paramMap-driven reload, the YJS
shared-model + JointJS paper retain enough cross-route state that the macro
body's links are rejected by shared-model-change-handler as duplicates of the
parent workflow's links — and the body never finishes rendering. A full page
refresh on the same URL works because the component is bootstrapped fresh.

Use window.location.href to force that full reload instead. Brief flash, but
the macro view renders predictably every time. Tearing down the shared-model
lifecycle properly to support SPA navigation is a follow-up.
…fail

Workflows containing a Macro instance failed to compile (no execution
possible) because:

- DbMacroRegistry.fetch read \`workflow.content\` and called
  mapper.readValue(content, classOf[MacroBody]).
- Marker operators (MacroInput / MacroOutput) inside the body had been
  serialized with their ports in backend PortIdentity shape
  (\`{id: {id: 0, internal: false}, displayName: ""}\`).
- LogicalOp inherits \`inputPorts: List[PortDescription]\` from PortDescriptor,
  so Jackson tried to parse those entries as PortDescription, choked on the
  missing \`portID\` field, and threw.
- DbMacroRegistry's catch swallowed the exception and returned None, and
  MacroExpander threw "not found in registry" — surfacing as a generic
  compile failure on the parent workflow with no usable error message.

Two-pronged fix:

1. \`@JsonIgnoreProperties(Array("inputPorts", "outputPorts"))\` on
   MacroInputOp / MacroOutputOp so already-persisted macros keep working —
   the marker's port wiring is derived from \`portIndex\` via operatorInfo
   anyway, so ignoring the JSON entries is correct.
2. Frontend marker serialization now emits proper PortDescription shape
   (portID/displayName/disallowMultiInputs/isDynamicPort) for newly-created
   macros, keeping the wire format consistent with the rest of the system.
The earlier "just so it renders" stub copied PythonUDFV2.png as Macro.png /
MacroInput.png / MacroOutput.png, which made macro instances on the canvas
indistinguishable from Python UDF ops — exactly the confusion the user just
flagged.

Generate proper icons (rounded "container" frame + a three-node mini-graph
for Macro; left- and right-facing arrows for the markers) in a blue/teal
accent that contrasts with the existing Python-yellow. Pure cosmetic, no
behavioral change.
* MacroExpander: switch inner-op ID prefix from "/" to "--" so prefixed
  IDs survive serialization through GlobalPortIdentitySerde's
  VFS-URI path component. Update WorkflowCompiler.visibleOperatorId
  and outer-error filter accordingly; add `require(!contains('/'))`
  in the serde as a hard guard. All 17 MacroExpanderSpec tests
  updated for the new separator and passing.

* WorkflowStatusService: fold inner-op stats keyed by
  "${macroInstanceId}--*" into a synthetic entry under
  macroInstanceId so the macro node renders state + row counts
  during execution on the outer canvas. Worst-case state wins
  (Recovering > Pausing > ... > Completed > Uninitialized);
  row counts and worker counts are summed. Original prefixed
  entries are preserved.

* ValidationWorkflowService: skip AJV schema validation for Macro
  operators — the embedded schema references LogicalOp polymorphic
  union (via MacroBody.operators) and AJV can't reliably handle it.
  Connection validation alone still gates the red/grey state.

* OperatorMetadataService: when sanitizing schemas off the wire,
  convert `nullable: true + $ref: X` to `anyOf: [{type: null},
  {$ref: X}]` instead of just stripping nullable, so Option[T]
  fields serialized as null round-trip cleanly through AJV.

* JointUIService: visually differentiate macro nodes — Macro
  instance gets a soft-blue fill and dashed blue border; MacroInput
  / MacroOutput markers get a muted grey, rounded "port pad" look
  with their operator-name label suppressed. changeOperatorColor
  preserves the macro-specific stroke across validation toggles by
  reading operatorType stashed on the JointJS element.

* WorkspaceComponent: pinned banner above the canvas when on
  `/workflow/:id/macro/:macroId` so the user can't miss they're
  editing a macro body and not the parent workflow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously buildMacroFromSelection only created MacroInput/MacroOutput
markers for ports that already had external links at macro-creation
time. A selection like Filter → Projection where Projection's output
wasn't yet connected ended up as a macro with one input port and zero
output ports, breaking dataflow equivalence: the user couldn't reach
Projection's output through the macro at all.

Replacing a sub-DAG with a macro op is a structural substitution. Every
input port on the selection that isn't fed by another selected op is a
boundary input regardless of current external connectivity, and
symmetrically for outputs. Walk selectedOperatorIDs × op.inputPorts/
outputPorts, filter out the internally-wired ones, and synthesize one
marker per remaining port. The actual-external-edge rewiring
(incomingEdges/outgoingEdges) is unchanged — it just maps a subset of
the available macro ports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xecution view

Stitches the parent workflow's execution data — both stats (row counts,
state) and result rows — onto each external port of a Macro op, and
makes the drill-down view show the same data per inner op while the
parent is running.

Wire layout (frontend-only; engine stays macro-unaware):

* MacroService now computes per-definition body bindings — each Macro
  external port `i` knows the body-relative (innerOp, innerPort) it
  routes to via the MacroInput(i) / MacroOutput(i) markers. Cached on
  first fetch; preloaded on `getOperatorAddStream` so the map is ready
  before execution starts. `getBindingsForInstance(instanceId, macroId)`
  lifts the body-relative IDs to runtime form (`${instanceId}--`) so
  they match the engine's stat/result keys post-MacroExpander.

* WorkflowEditorComponent.synthesizeMacroOpStats sources per-port row
  counts for each Macro on the outer canvas: macro input `i` reads from
  the boundary inner op's `inputPortMetrics` at the body-link's target
  port; macro output `j` reads from the inner op's `outputPortMetrics`.
  Falls through to `withMacroAggregates`-supplied state until bindings
  load, then refreshes on the next stats emission.

* WorkflowResultService gains a macro-instance result alias plus a
  drill-down prefix. The alias routes `getResultService(macroId)` to
  the inner op feeding output port 0, so the result panel shows the
  macro's output without forcing the user to drill in. The drill-down
  prefix transparently rewrites every result lookup to its runtime
  form when the canvas is rendering a body via `?instance=...`.

* WorkflowEditorComponent listens to `route.queryParamMap.instance` —
  the macro click-through now appends it to the drill-down URL — and
  applies the same `${instanceId}--` prefix to stat lookups so live
  parent-execution stats land on the body-relative op IDs the
  drill-down canvas displays.

* Port-mapping completeness fix already in 49beec9 is the critical
  upstream prerequisite: a Macro op with only an `input-0` port (and
  no output port) can't be made to display output stats or results
  no matter how the websocket layer is wired.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three coupled execution-path fixes:

* Item 3 — view-result/reuse-result on a macro op now forwards to the
  inner boundary ops the macro's external outputs route to. Backend's
  `opsToViewResult` is keyed by post-expansion op IDs (the macro op
  itself doesn't survive MacroExpander), so executeWorkflowWith… rewrites
  macro IDs to `${instanceId}--${innerOpId}` for every output binding
  before submitting the plan. Multi-output macros mark all their output
  producers; non-macro IDs pass through unchanged. Same rewrite for
  `opsToReuseResult`.

* Item 2 — `MacroService.computeBodyBindings` now also collects
  `nestedMacros: Map<innerOpId, nestedMacroId>` and
  `getBindingsForInstance` walks them recursively, prefixing
  `\${instanceId}--` at each layer until a terminal non-macro inner op
  is reached. Fan-out at any layer is preserved by emitting one
  resolved binding per terminal. Bodies of nested macros are eagerly
  prefetched when their parent body loads, so the synchronous stat
  lookup path finds everything cached.

* Item 1 — macro drill-down click-through switched from
  window.location.href to Router.navigate. Full reload was killing
  the parent's websocket subscription, so the drill-down view saw no
  live execution stats. SPA navigation keeps WorkflowWebsocketService
  alive across the route change, and the existing query-param
  (?instance=...) handler in WorkflowEditorComponent already maps
  body-relative op IDs onto runtime stat keys for the drilled-down
  canvas. loadMacroWithId simplified to match loadWorkflowWithId's
  pattern (drop the redundant resetAsNewWorkflow — setNewSharedModel +
  reloadWorkflow together do a clean transition).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Revert SPA navigation back to hard reload for macro drill-down
  click-through. SPA-into-WorkspaceComponent-reuse hits a flurry of
  duplicate-link rejections from interleaved YJS server-replay + local
  reloadWorkflow that can't be resolved cleanly with the current
  shared-model lifecycle. Hard reload gives a clean WorkspaceComponent
  mount with a fresh canvas every time.

* Stash (parentWid, instanceId) into sessionStorage before the hard
  navigation so the new page can later opt to reconnect to the parent's
  execution context for live drill-down stats. Wiring the rehydration
  is a follow-up; the stash itself is harmless if unused.

* Use an anonymous YJS room for the drill-down view. Joining the macro
  definition's wid-keyed room replays accumulated historical operators
  the room ever held, fighting reloadWorkflow over the same logical
  data and producing duplicate-link cascades that destroyed the canvas
  on every navigation. Anonymous room = clean canvas; collaborative
  editing of macros via drill-down is deferred until we can do a
  proper YJS state reset on the server side.

* SharedModelChangeHandler.validateAndRepairNewLink: when a link is
  duplicated, *skip rendering* it instead of deleting it from the
  shared model. The pre-fix behavior was eagerly destructive — the
  canonical link in the shared model got wiped along with the
  duplicate, leaving the canvas with nothing to render. Truly invalid
  links (non-existent op/port) still get repaired out of the model.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Right-click a Macro instance → 'Expand macro' inlines its body back onto
the parent canvas: deep-clones the body operators with fresh IDs (so
re-using the same macro elsewhere doesn't collide), reproduces internal
links, rewires every external link that was touching the macro to the
matching boundary inner op + port via the body's MacroInput/MacroOutput
markers, and finally deletes the macro op. Wrapped in bundleActions so
undo collapses to a single step.

v1 supports LIVE-linked macros only (body fetched from DbMacroRegistry).
SNAPSHOT mode (embedded body in operatorProperties.snapshot) is a
follow-up — same logic, different source.

Layout is crude (a 3-column grid anchored at the macro's old position);
a proper auto-layout pass is deferred.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In drill-down view, override the workflow metadata's wid to the parent
workflow's wid before reloading. ComputingUnitSelectionComponent reads
metadata.wid to decide which workflow id to open the execution websocket
against — if it gets the macro definition's wid (278), the drill-down
view subscribes to the macro's execution, not the parent's, and sees no
stats during the parent's actual run. Spoofing the wid to parent's lets
the websocket stay on the parent's execution stream, and the existing
${instanceId}-- prefix machinery in WorkflowEditorComponent maps those
keys onto the body-relative op IDs the drill-down canvas displays.

Safe because workflow persistence is disabled in drill-down (the macro
body is saved through MacroResource, not the regular workflow save
endpoint).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Surfaces the user's saved macros under a "Your Macros" section in the
operator palette so they can be reused on other workflows. Loaded once
on component init via MacroService.listMacros(). Each macro renders as
a clickable row with name + (X in / Y out) port-count chip; clicking
builds a fresh OperatorPredicate (Macro-operator-{uuid}, macroId set
from the summary's wid, port counts from portSpec) and places it on the
canvas — same shape as `swapSelectionWithMacroNode` produces from a
selection, so all downstream paths (validation, render, expansion,
execution) see a normal Macro op.

v1 is click-to-add only; true drag-from-palette would require special-
casing the drag-drop service because regular operators go through
WorkflowUtilService.getNewOperatorPredicate(type) which can't fill in
the macro-specific properties. Visual styling matches the dashed-blue
macro treatment on the canvas so palette→canvas reads as one identity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a "Suggest Macros (AI)" button + inline panel in the operator
palette that surfaces ranked sub-DAG encapsulation candidates without
calling out to an LLM.

v1 heuristic: maximal linear chains where each interior op has exactly
one upstream and one downstream within the chain. Score = chain length
× source/sink penalty (≥2 ops, source-anchored chains discounted to
0.5×, sink-anchored to 0.7×). Top 10 returned. Per-candidate rationale
is derived from the operator-type sequence ("Looks like a reusable
preprocessing block", "Two-step pipeline: Filter → Projection", etc.).

UX: button shows brief "Analyzing workflow…" affordance (forced 250ms
delay) so the action reads as agent-like rather than instant lookup.
Top suggestion's operators get highlighted on the canvas immediately;
clicking a candidate row highlights+selects so the user can confirm
via right-click → Create Macro. v2 should call ContextMenuComponent's
private `swapSelectionWithMacroNode` flow directly.

LLM swap is one HTTP call away: replace `suggestMacros()` body with a
chat-assistant-service request returning the same `MacroSuggestion[]`
shape — UI and downstream materialize-action paths unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the hackathon-proposal §9.2 AI-fusion path: a macro instance
can be "fused" into a single PythonUDFOpDescV2 that replaces the entire
inlined sub-DAG at compile time, eliminating inter-actor handoffs for
the chain.

Frontend (MacroFusionService): template-based codegen — no LLM call.
Pulls the macro body via getMacro(wid), walks the inner ops, emits a
syntactically valid PythonUDFOperatorV2 class whose docstring lists
the original pipeline. v1 verification is fake-success (sampleSize
recorded, real sample-diff is a follow-up). Returns a `MacroFusion`
payload the caller attaches to `operatorProperties.fusion`.

Context-menu wiring (ContextMenuComponent.onFuseMacro): right-click a
Macro instance → "Fuse for performance (AI)" → generates code, attaches
the verified fusion to the macro's properties via setOperatorProperty,
notifies the user with the rationale + estimated speedup.

Backend (MacroExpander, both copies — amber WorkflowCompiler's and the
WorkflowCompilingService's): if `m.fusion.exists(_.verified)`, return
early from inlineMacro via `substituteFused` instead of fetching+
splicing the body. The new PythonUDFOpDescV2 reuses the macro instance
ID so parent links stay valid (no rewrite), and inherits the macro's
external input/output port shape. All 17 MacroExpanderSpec tests pass.

LLM upgrade path: replace MacroFusionService.synthesizeFromBody() with
a call to chat-assistant-service returning the same FusionResult shape.
Real sample-diff verification would gate `verified = true` instead of
defaulting to true after codegen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
macros with regular ops

The original heuristic was treating a Macro→Filter edge as if it
contributed to Filter's in-degree, blocking Filter from being detected
as a chain head. The intent of "ignore macros entirely" is that edges
incident on a macro should NOT count toward any non-macro node's
degree — Filter whose only upstream is a Macro should appear as a
source (in-degree 0) in the filtered subgraph.

Fix `computeDegrees`, `findLinearChains` (adjacency), and
`predIsBranching` to only count edges where BOTH endpoints are
non-macro. Verified end-to-end in Macro_2 workflow: 3 Filter→Projection
pairs surfaced as candidates ("Two-step pipeline: Filter → Projection.
Reusable as a unit." / 2 ops · score 0.7).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracted the swap-selection-with-macro-node logic from
ContextMenuComponent into MacroService.createMacroFromSelection so the
suggestMacros panel can call it inline. Pre-fix the materialize action
just highlighted the candidate operators and asked the user to
right-click → Create Macro; that's two steps for what should be one
click. Now clicking a candidate prompts for a name (defaulting to the
heuristic's suggestedName) and creates+swaps inline — same end state
as the right-click flow, faster demo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Xiao-zhen-Liu and others added 21 commits May 16, 2026 06:47
- inferCategory walks each macro's body and assigns one of:
  preprocessing / transformation / aggregation / visualization, based
  on the dominant operator-type family among inner ops. Falls back to
  'uncategorized' when the body can't be parsed.
- groupedMacroList groups the (filtered) macro list by category in a
  stable order so the palette renders deterministic sections.
- Categories cached per-macroId after the first body fetch so we don't
  re-hit /api/macro/:wid on every render. A 'loading…' bucket shows
  briefly while the cache fills, then those macros slot into their
  real category on the next render pass.
- Keeps the palette browsable as users accumulate macros — visually
  similar to how the built-in operators are grouped (preprocessing,
  visualization, etc.).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Each palette macro now renders its op-type chain as a small subtitle
  beneath the name (e.g. 'Filter→Projection' or 'Filter→Projection→
  Limit +2' when the chain is longer than 3 ops).
- Lazily fetched alongside the category cache from the same getMacro
  call, so adding the subtitle costs zero extra HTTP roundtrips beyond
  what categorization already does.
- Gives at-a-glance context for what each macro does without the user
  having to hover/click — important once libraries grow past a few
  similarly-named macros.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Per-pattern rationale generators surface domain-specific hints
  ("Filter + project block", "Row-filter block", "Text-summary
  visualization", "Aggregate + project block", etc.) rather than the
  generic "preprocessing pipeline" pitch.
- Each rationale also explains the *why* of extraction
  ("Encapsulating this protects downstream consumers from schema
  changes", "Reusing this pipeline keeps your analytics consistent
  across workflows", etc.) — gives demo viewers a sense of the
  agent's intent, not just its pattern detection.
- Adds detection for visualization and join+reshape patterns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- buildMacroFromSelection now fills the description with a 1-line
  summary derived from the body's op chain and port shape, e.g.
  'Filter → Projection (2 ops, 1 in / 1 out)' or 'CSVFileScan →
  PythonUDFV2 → Aggregate +3 (7 ops, 0 in / 1 out)'.
- Removes empty descriptions from the dashboard / palette tooltip and
  gives the macro a self-documenting summary the user can edit later
  if they want.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- exportMacroToFile now scans the body content for any nested macroId
  references and records them in the export payload as
  dependsOnMacroWids: [wid, ...]. Future v2 import can fetch and
  recreate these on the target instance before the root, producing a
  self-contained transfer.
- Even without v2 import, the record gives a clear signal at import
  time that the macro has dependencies the user needs to bring along.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…transfer

- exportBundleForMacro walks nested macroId references depth-first and
  packages every reachable definition into a bundleVersion=2 JSON.
- Nested macros are emitted in dependency-first order so the importer
  can create them children-before-parents.
- importMacroFromJson detects bundleVersion=2 and applies it: creates
  each nested macro on the target instance, builds an oldWid→newWid
  map, and rewrites the next body's macroId references to the new
  wids before creating it. The root is rewritten + created last and
  its MacroDetail is returned.
- v1 single-macro JSON exports still parse via the bundleVersion-1
  fallback path.
- Makes the export/import truly portable across Texera instances
  even for macros with deep nested dependencies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- 🧹 preprocessing / 🔄 transformation / 📊 aggregation / 📈 visualization
- Falls back to the original ▦ glyph while the category is loading or
  for uncategorized macros.
- Reuses the existing inferred-category cache so no additional fetches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New purple gradient button above Fuse All. Runs the omni-agent flow:
    1. Detect patterns (suggestMacros)
    2. Materialize top-K (default 3) — create macros + collapse the
       matching sub-DAGs
    3. Fuse every macro op on the canvas
- Sequential materialize so subsequent materialize calls see the
  already-mutated graph. Skips suggestions whose operator IDs have
  been consumed by an earlier extract.
- Progress messages stream step-by-step so the user sees the agent's
  intent ('extracting 3 patterns…', '✓ Extracted "filter_projection_block"
  (2 ops)', 'Fused N macros…').
- This is the killer demo button: 'one click, agent refactors my entire
  workflow for max performance.'

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Previously: top-K suggestions from the same pattern would each create
  a SEPARATE macro definition — defeating the reuse story.
- Now: group suggestions by suggestedName, take top-K distinct
  patterns. For each pattern, create the macro from the FIRST
  occurrence and swap every other live occurrence into the same
  definition (via swapSelectionWithExistingMacro). One pattern, one
  macro definition, N instances.
- Progress messages now report ' (and refactored N other occurrences)'
  per pattern, so the user sees the reuse multiplier explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 fix for ERR_INSUFFICIENT_RESOURCES on user's large workflow. The
categoryForMacro / subtitleForMacro features I added lazy-called
getMacro(wid) from inside Angular template bindings (via the
groupedMacroList getter). Every Angular change-detection cycle
re-evaluated the binding while the cache was unfilled, firing a fresh
HTTP request per macro per cycle. On a workflow with many user macros
this DDoS'd the browser's fetch pool, starving the websocket / compile
calls and producing thousands of console errors.

- Strip the lazy getMacro calls; revert categorization + subtitle to no-ops.
- Revert palette template to a flat filteredMacroList (name + usage chip +
  ports + export button). Categorization needs to move to the backend
  MacroSummary response (one round-trip) to be safe.
- Also hide the Auto-optimize / Fuse-all buttons. Auto-optimize was
  causing the compile API to return 400 on the user's real workflow;
  per-macro fuse via right-click stays available for testing while the
  codegen quality is improved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes for navigation issues you reported:

1. Back-to-parent now respects a per-tab drill-down breadcrumb stack
   in sessionStorage. Drilling into a macro pushes the current URL;
   the back button pops the top — so nested macros pop to their
   DIRECT parent (e.g. /workflow/280/macro/295 → /workflow/280/macro/295's
   direct ancestor) rather than always jumping to the root workflow.
   Click handler uses window.location.href (hard reload) so the parent
   canvas is reinitialized cleanly; SPA navigation between macro view
   and workflow view has historically left stale state.

2. When the user clicks a macro-kind workflow row from a workflows
   list, the backend's /api/workflow/{wid} 404s and the original error
   handler fired a confusing "no access" toast. Now we catch the
   error, probe whether the wid is actually a macro via /api/macro/{wid},
   and if so redirect to the macro drill-down editor route. Otherwise
   surface a clearer "couldn't load workflow" message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ema port

P0 instrumentation + bug fix for macro execution silently hanging.

1. RegionExecutionCoordinator.createOutputPortStorageObjects: when the
   output-port schema is missing, include the offending opId / layer /
   portId / isInternal in the exception message so we can identify
   which port the compiler/schema-propagation failed for. Previously
   the message was just "Schema is missing" with no context.

2. WorkflowExecutionCoordinator.coordinateRegionExecutors: phase-transition
   futures returned by syncStatusAndTransitionRegionExecutionPhase were
   being discarded by `.foreach(...)`. Any exception (e.g. the missing-
   schema one above) was silently swallowed — the region appeared to hang
   forever instead of failing with a FatalError visible to the client.
   Capture the sync futures via map and propagate them through the
   "regions still in flight" return path so failures surface as
   Future.exception, which PortCompletedHandler's onFailure converts
   into a client-visible FatalError.

Together these unblock investigation of the real "stuck macro
execution" issue — instead of silent stall, the user now gets a
specific error pointing at the failing port.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
You were right — the previous "${macroInstanceId}--${innerOpId}" naming
scheme made the expanded LogicalPlan structurally DIFFERENT from a
hand-flattened workflow even when the topology was identical.

Concrete consequence on a real workflow (wid 280, nested macros
containing HashJoin):

  • Pre-fix: inner HashJoin runtime op ID was 170+ chars long
    "Macro-operator-operator-1abe46c1-...-54df9b954a8e--HashJoin-operator-operator-78eb2818-...-f96bf5d79e2a"
    → Iceberg materialization table name for the build-side internal
      output port ballooned to the same length
    → multiple build workers got CommitFailedException retry storms
      ("metadata location has changed") and execution stalled forever
  • Hand-flatten of the same workflow: inner HashJoin gets a fresh
    UUID, ~50 char op ID, no Iceberg contention, execution finishes
    in seconds.

Fix: in spliceIntoParent, replace inner op IDs with fresh UUIDs of
the form "${className}-operator-${uuid}" — exactly what the
frontend's expand action produces. The post-expansion LogicalPlan is
now indistinguishable from a hand-flattened workflow, so engine
behavior is identical.

Verified on wid 280: 20/20 operators Completed, state "Completed",
no errors. Previously stuck forever in phase-2 transition.

Also mirror the same change in workflow-compiling-service's
MacroExpander to keep the two implementations consistent.

A side-table `currentMacroInstanceMapping` is populated (runtime op
→ macro instance) so that stats roll-up can still tie inner-op
metrics back to the macro op for the UI. Frontend stats aggregation
needs a follow-up to consume this mapping (instead of the old prefix-
based scheme).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…as/drill-down

Two related changes that fix "macro op shows no stats" + "drill-down body
shows nothing on execution":

1. Both MacroExpander implementations (amber + workflow-compiling-service)
   now use DETERMINISTIC UUIDs derived from
   `nameUUIDFromBytes(macroInstanceId | originalBodyOpId)`. Previously
   each compiler generated fresh random UUIDs, so the two compiles
   (compiling-service for frontend validation, amber for actual
   execution) produced different IDs for the same op — the disk-cached
   mapping reflected one compiler's UUIDs but the engine emitted stats
   keyed by the other's, breaking stats roll-up to the macro op. Same
   workflow → same UUIDs now, regardless of which compiler runs.

2. Frontend stats binding:
   - WorkflowStatusService.withMacroAggregates now consults
     MacroService.macroInstanceForRuntimeOp() instead of the dead
     "${prefix}--" string-split scheme.
   - MacroService.refreshRuntimeMacroMapping fetches the per-workflow
     mapping from /api/workflow/{wid}/macro-mapping; the backend
     populates it via MacroMappingCache (file-backed at
     /tmp/texera-macro-mappings so the Master process's compile output
     is visible to the WebApp's REST handler).
   - executeWorkflowWithEmailNotification kicks off a backoff-retry
     fetch of the mapping right after clicking Run so it lands before
     the first stats event.
   - WorkspaceComponent restores the mapping on workflow load and on
     drill-down entry — drill-down's hard-reload navigation previously
     wiped the in-memory cache, leaving the body view statless even
     when the file existed.
   - workflow-editor uses MacroService.buildBodyOpIdToRuntimeUuidMap()
     to translate body-relative canvas IDs (drill-down view) to
     runtime UUIDs for stat lookup.
   - Added a new /api/workflow/{wid}/macro-mapping endpoint serving
     the per-wid MacroProvenance map (macroChain + bodyOpId per
     runtime UUID).

Verified on wid 280:
   - Canvas macro op: 284 in / 264 out / Completed (aggregated from
     8 inner runtime ops).
   - Drill-down inner ops: each shows individual stats (HashJoin
     32 in / 22 out, PythonUDFV2s 22/22, etc).

Nested macro op stat aggregation inside drill-down is the remaining
gap and is tracked as a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… drill-down)

A runtime op inside a nested macro contributes to TWO aggregates:
 - the outer macro on the parent canvas (chain[0])
 - the nested macro inside the outer's drill-down view (chain[1])

withMacroAggregates previously only rolled up to chain[0]. Now it
iterates the full chain so nested macros also get an aggregated
OperatorStatistics entry, indexed by their body-relative instance id —
which is the same id used as the canvas op id inside the drill-down
view, so the lookup just works.

Verified on wid 280 drill-down (/macro/295?instance=…1abe46c1):
  nested macro d3188a84 → 176 in / 176 out / Completed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
withMacroAggregates was summing aggregatedInputRowCount across EVERY
inner op of a macro — which double-counted internal traffic (e.g. for
nested HashJoin → projection → ... chains the count grew to ~5× the
correct value). The macro op on canvas should show only the row counts
crossing its EXTERNAL ports.

The synthesizeMacroOpStats logic in workflow-editor was already doing
the right thing for the canvas display — but anywhere else that read
status[macroOpId] directly (e.g. drill-down nested-macro op stats)
got the wrong number.

Changes:
 - Move port-based aggregation into MacroService.synthesizeMacroOpStats
   so both renderers share one source of truth.
 - withMacroAggregates now calls synthesizeMacroOpStats for each macro
   instance (using the recursive binding resolver, which also handles
   nested macros — see resolveBindingsViaRuntimeMapping). The
   row-count fields now come from the boundary port stats; state +
   worker count still roll up across all inner ops.
 - Add MacroService.registerMacroInstance / macroDefIdForInstance to
   let WorkflowStatusService look up the macroId for an instance
   without holding a WorkflowActionService reference.
 - Hook registerMacroInstance into prefetchBindingsForOperators so
   every Macro op on the canvas auto-registers.

Verified on wid 280 (4-input macro with 1 output, nested macro inside):
   Before: 284 in / 264 out (bogus sum-of-all-inner)
   After:  64 in / 44 out
           inputPortMetrics: {0:10, 1:10, 2:22, 3:22}
           outputPortMetrics: {0:44}

Also: resolveBindingsViaRuntimeMapping now recurses through nested
macros so the outermost macro's external port bindings resolve to
the terminal runtime op deep inside the nesting (was returning
empty for the port connected through the nested macro).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolveBindingsViaRuntimeMapping was requiring `prov.macroChain.length
=== accumulatedChain.length` for terminal matches. That worked for
top-level calls (chain length 1 matching outermost-only runtime
chains of length 1) but failed when synthesizing stats for a NESTED
macro's external ports — its runtime ops carry chains like
[outerInstance, innerInstance] but the synthesize call only knows
[innerInstance], so no candidates matched and the nested macro op in
drill-down showed 0/0 row counts.

Fix: match if `prov.macroChain` ENDS WITH `accumulatedChain`. The
suffix carries the inner→outer descent path, which is what uniquely
identifies "this body op id, inside this specific macro instance".

Verified on wid 280:
  - Parent canvas: outer 1abe46c1 → 64 in / 44 out (port {0:10, 1:10, 2:22, 3:22})
  - Outer drill-down: nested d3188a84 → 44 in / 44 out (port {0:22, 1:22})
  - Nested drill-down: each of 4 body ops shows 44/44 stats

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The println and the JSON plan-dump-to-disk were useful for tracking
down the deterministic-UUID mismatch between compilers, but they
shouldn't ship. The MacroMappingCache.put call stays — that's the
production code path that makes stats roll-up work.
UI/AI surface
- Suggestions panel: replace raw "score X.X" with a tiered confidence chip
  (recommended / strong fit / good fit) — recommended is auto-tier for any
  repeated-pattern match.
- Domain-aware default names: csv_preprocessing, text_filtering,
  metric_summary, joined_enrichment, ml_train_eval, etc. — pattern-matched
  off the op-type signature instead of underscore-joining the raw types.
  Unified across the AI panel and right-click create-macro.
- Fusion rationale + speedup ground in handoff-removal model:
  "N ops -> 1 UDF, K fewer actor handoffs. Estimated 1.6x speedup."
  Replaces the previous "1 + len*0.4" placeholder.

Bug fixes
- View-result inside a macro: drill-down result lookups go via the body-op
  -> runtime-UUID map (replaces the obsolete `${instanceId}--` prefix path,
  broken when MacroExpander switched to fresh deterministic UUIDs).
  Re-emits on a new runtime-mapping tick so async fetches don't race.
- Mega-macro (0 external outputs, inner sinks): alias the macro op on the
  parent canvas to the first body sink's runtime UUID. Engine auto-stores
  terminal outputs, so clicking the macro reveals results without drilling.
- Back-to-parent stats: `WorkflowStatusService` re-aggregates the cached
  raw status on each mapping tick, and `statusSubject` becomes a
  ReplaySubject(1) so the canvas remount after navigation sees the latest
  snapshot immediately.
- Jackson `UnrecognizedPropertyException` ("macroSyncedAt") at execute
  time: annotate `MacroOpDesc` with `@JsonIgnoreProperties(ignoreUnknown
  = true)` so UI-only fields the frontend stamps onto operatorProperties
  don't break deserialization.

Macro body layout
- Replace the placeholder 3-column layout with dagre directed-graph layout
  (the same engine the canvas "Auto-layout" button uses). Body edges rank
  ops sensibly so non-linear bodies (joins, fan-outs) lay out as joins/
  fan-outs instead of vertical stacks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same shape of bug as the macroSyncedAt fix on MacroOpDesc: the frontend
stamps `estimatedSpeedup` ("1.6x") onto the fusion payload so the canvas
can render it next to the FUSED badge, but the backend MacroFusion case
class doesn't model that field. Jackson rejects the WorkflowExecuteRequest
at execute time once the fused macro is part of the run.

Annotate `MacroFusion` with `@JsonIgnoreProperties(ignoreUnknown = true)`
so this and any future UI-only convenience fields don't break the round
trip. Backend MacroExpander only ever reads `verified` to decide whether
to substitute the UDF.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scratch file used to draft the hackathon PR description — not part of the
project. Mistakenly committed in the previous change; remove it from the
tracked tree and keep it locally for the PR-open step.
@github-actions github-actions Bot added engine ddl-change Changes to the TexeraDB DDL frontend Changes related to the frontend GUI docs Changes related to documentations common platform Non-amber Scala service paths labels May 16, 2026
@Xiao-zhen-Liu Xiao-zhen-Liu changed the title feat(macro): AI-augmented macro operators [Hackathon] feat(macro): AI-augmented macro operators May 16, 2026
@Xiao-zhen-Liu Xiao-zhen-Liu changed the title [Hackathon] feat(macro): AI-augmented macro operators [Hackathon] feat: AI-augmented macro operators May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common ddl-change Changes to the TexeraDB DDL docs Changes related to documentations engine frontend Changes related to the frontend GUI platform Non-amber Scala service paths

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant