[Hackathon] feat: AI-augmented macro operators#5115
Open
Xiao-zhen-Liu wants to merge 65 commits into
Open
Conversation
KNIME-metanode-style composite operators for Texera. Macros live purely
at the logical-plan layer: a new MacroExpander pre-pass inlines each
MacroOpDesc into a flat LogicalPlan before physical-plan compilation, so
PhysicalPlan, PhysicalOpIdentity, and the Amber engine remain unchanged.
Backend (new):
- MacroOpDesc, MacroInputOp, MacroOutputOp LogicalOps registered in
Jackson @JsonSubTypes; getPhysicalPlan throws to signal a missed
expansion pass.
- MacroBody, MacroLink, MacroPortSpec, MacroFusion data classes.
- MacroExpander: inlines each macro by splicing inner ops/links via
boundary markers and prefixes inner-op IDs with the instance ID
(\${macroInstanceId}/\${innerOpId}), so per-macro telemetry can be
aggregated purely from the operator-ID prefix. Cycle and depth-16
guards via MacroCompileContext. Pluggable MacroRegistry (Empty /
inMemory; persistence-backed impl is a later step).
- WorkflowCompiler (workflow-compiling-service) calls
MacroExpander.expand before scan-source resolution. Backward-
compatible: new ctor param defaults to MacroRegistry.Empty.
- TODO note in amber WorkflowCompiler; execution-time expansion is a
later step. Until then, MacroOpDesc.getPhysicalPlan throwing surfaces
unexpanded plans as a loud compile error rather than silently broken
execution.
Tests (14 passing):
- MacroOpDescSpec: JSON round-trip, throws on compile, ports match
inputPortCount/outputPortCount.
- MacroExpanderSpec: pass-through plan, single-port inline, LIVE registry
fetch, nested macros with concatenated prefix, cycle detection,
depth-bomb, double-instantiation, input-marker fan-out, missing-LIVE
error, snapshot immutability across two expansions.
Also includes hackathon-proposal.md (Texera Agent Hackathon submission)
covering the AI suggestion and AI fusion features that layer on top of
this skeleton in later steps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sql/updates/23.sql + texera_ddl.sql: workflow_kind_enum, workflow.kind, idx_workflow_kind, macro_metadata. Macros reuse the workflow table to inherit versioning, ACL, and hub features. - MacroResource: create/list/get/schema/snapshot endpoints alongside WorkflowResource; reuses workflow_user_access for permissions and seeds an initial workflow_version so LIVE-mode instances have a vid to pin. - WorkflowResource.baseWorkflowSelect: bake in kind = WORKFLOW so macros are structurally excluded from the workflows tab, the hub, and operator search; callers (HubResource, retrieveWorkflowsBySessionUser) updated to .and(). - DbMacroRegistry: jOOQ-backed MacroRegistry that reads workflow.content as a serialized MacroBody; wired into the compiling service's WorkflowCompiler. - TexeraWebApplication: register MacroResource. The amber-side execution-time WorkflowCompiler still has the existing TODO(macro-operators) note from Step 1 and is unaffected; that hook is Step 3.
Step 3 closes the TODO at WorkflowCompiler.scala:144 — macros can now be
executed end-to-end, not just compiled by the workflow-compiling-service.
- amber/.../workflow/macroOp/{MacroCompileContext,MacroRegistry,MacroExpander,
DbMacroRegistry}: parallel copies of the compiling-service equivalents,
adapted to amber's LogicalLink/LogicalPlan types. The two macro pipelines
will converge when the broader LogicalPlan unification (existing TODO at
WorkflowCompiler.scala:137) happens.
- WorkflowCompiler: take an optional MacroRegistry (defaults to Empty); call
MacroExpander.expand before resolveScanSourceOpFileName + expandLogicalPlan.
- WorkflowExecutionService, SyncExecutionResource: pass new DbMacroRegistry()
into WorkflowCompiler so LIVE-mode macros resolve against `workflow` rows
with kind=MACRO.
Step 1 (10/10 MacroExpanderSpec, 4/4 MacroOpDescSpec) and amber's
WorkflowCompilerSpec (6/6) still green.
Adds the smallest user-visible hook for macros: select 2+ operators, right-click
→ "create macro", enter a name. Posts a serialized MacroBody (selected
operators + internal links + MacroInput / MacroOutput boundary markers) to the
new POST /api/macro/create endpoint and surfaces the result via a toast.
The canvas selection is intentionally left in place; replacing it with a
MacroOpDesc node (and rewiring boundary links to the new ports) is the next
slice of Step 4, alongside the palette merge and drill-down editor.
- macro.service.ts: HTTP client + boundary computation (one MacroInput per
unique inner port that has an external feeder; mirror for MacroOutput).
- context-menu.{html,ts}: new menu entry, wired with a window.prompt for the
name and NotificationService for the toast. Shown only when 2+ operators
are selected, no link is highlighted, and the workflow is modifiable.
Rubber-banding a chain of operators in JointJS picks up the connecting links too, which made `hasHighlightedLinks()` true and silently hid the menu entry (same reason copy/cut were missing from the user's screenshot). The boundary computation already classifies internal vs external links from the operator selection alone, so highlighted links shouldn't gate the entry.
Loading a MACRO row via the workflow editor route blew up the canvas (workflow-check.ts dereferenced link.source.operatorID; the content is a MacroBody, which has fromOpId/toOpId, not source/target). Fail fast at the REST layer instead, with a message pointing at the not-yet-built macro editor.
After POST /api/macro/create succeeds the context menu now:
1. Drops a `Macro` operator at the centroid of the selection with input/output
ports sized to match the boundary (one input per unique inner port that had
an external feeder, mirror for output).
2. Deletes the original operators (and with them their internal + boundary
links) via deleteOperatorsAndLinks.
3. Re-points each former external link at the new macro's corresponding port.
All three steps are wrapped in a single bundleActions transaction so undo
restores the original sub-DAG in one shot.
MacroService.buildMacroFromSelection now returns the boundary metadata
(per-link rewire instructions + input/output port counts) alongside the
backend request payload — same boundary computation, exposed for the swap.
MacroOpDesc on the canvas uses operatorProperties = { macroId, macroVersion,
linkMode: "LIVE", inputPortCount, outputPortCount, displayName } so the
existing workflow-serialization path can roundtrip it to the backend without
extra glue. macroVersion is a placeholder until MacroDetail exposes the
pinned vid.
Track down why the canvas swap isn't visible: log the captured selection, the built request + boundary metadata, and the swap-vs-throw outcome. Also align the output-port shape with outputPortToPortDescription (disallowMultiInputs: false). Tracing will be removed once the issue is identified.
…ction MacroOpDesc's generated JSON schema includes \`nullable: true\` properties without a sibling \`type\` (from \`Option[MacroBody]\` / \`Option[MacroFusion]\`). Ajv refuses to compile that, so the swap threw with "nullable cannot be used without type" before any canvas mutation could happen. Construct the OperatorPredicate manually instead — every field is already overridden, so the schema-default path adds nothing. The underlying schema bug should still be fixed (it'll also break dragging Macro from the palette) but that's a separate task in workflow-operator; right-click → create macro now works without it.
Step 5 first slice: double-clicking a Macro node now navigates to a new route
that loads the macro's body into the same workflow editor canvas.
- Route: \`/dashboard/user/workflow/:id/macro/:macroId\` mounts the existing
WorkspaceComponent. The parent wid (\`id\`) is kept in the URL so future
breadcrumb / back-navigation work has it.
- WorkspaceComponent.registerLoadOperatorMetadata picks up \`macroId\` from the
route and runs a new \`loadMacroWithId\` branch instead of the normal workflow
load. Auto-persist is disabled via setWorkflowPersistFlag(false) so canvas
edits don't accidentally hit \`/workflow/persist\` — saving back to the macro
is the next slice.
- MacroService.macroDetailToWorkflow converts the persisted MacroBody into a
Workflow shape reloadWorkflow can consume: normalizes inner-op / marker port
shapes (PortDescription vs PortIdentity), maps MacroLink port-ordinals to
string portIDs, and auto-lays-out operators with MacroInput on the left,
MacroOutput on the right, regular inner ops in the middle.
- workflow-editor double-click handler now detects \`operatorType === "Macro"\`
and routes to the drill-down URL instead of opening the result panel.
Read-only-ish in v1 — the editor will let the user move things around but the
changes don't persist. PUT/POST /macro/{wid}/update + the save flow is the
next commit.
…load
The backend's reflective JSON-schema generator emits \`{nullable: true}\` for
\`Option[...]\` fields whose inner type it can't enumerate
(\`Option[MacroBody]\`, \`Option[MacroFusion]\` on \`MacroOpDesc\`). Ajv
strict-mode refuses to compile schemas with \`nullable\` and no \`type\`, which
threw from everywhere the schema gets compiled — validation-workflow,
property-editor, dynamic-schema, shared-model-change-handler — making the
drill-down editor unusable.
Sanitize once at the source (OperatorMetadataService): walk every operator's
\`jsonSchema\` and delete \`nullable\` when there's no sibling \`type\`. All
downstream Ajv compilations now see well-formed schemas.
The proper backend fix is still tracked in project memory
\`project_macroopdesc_schema_ajv_bug.md\`; this is defense-in-depth that also
hardens us against any future LogicalOp picking up the same shape.
Two issues blocking the macro body from rendering: 1. WorkspaceComponent is reused across route changes (no ngOnDestroy fires going /workflow/:id → /workflow/:id/macro/:macroId), so the parent workflow's operators+links stayed on the JointJS paper. reloadWorkflow then hit \`failed to add link. cause: duplicate link found with same source and target\` in shared-model-change-handler when the macro body's marker links collided with parent leftovers. Fix: call resetAsNewWorkflow() before setNewSharedModel. 2. Macro / MacroInput / MacroOutput had no icon files, so JointJS rendered blank/broken-image boxes (operators technically present but invisible). Stub with copies of PythonUDFV2.png so they at least render; proper icons are a polish task.
…properly Angular reuses WorkspaceComponent across navigations between /workflow/:id and /workflow/:id/macro/:macroId, so route.snapshot.params is frozen at construction time and the macro drill-down didn't actually re-run its loader when the user double-clicked a macro node — the page only loaded correctly on a hard refresh. Subscribe to route.paramMap inside registerLoadOperatorMetadata and dispatch on every change (deduplicated by id/macroId key). The workflow branch also re-enables the persist flag, since the macro drill-down disables it.
In-tab Angular router navigation between /workflow/:id and /workflow/:id/macro/:macroId reuses WorkspaceComponent. Despite resetAsNewWorkflow() + setNewSharedModel() + paramMap-driven reload, the YJS shared-model + JointJS paper retain enough cross-route state that the macro body's links are rejected by shared-model-change-handler as duplicates of the parent workflow's links — and the body never finishes rendering. A full page refresh on the same URL works because the component is bootstrapped fresh. Use window.location.href to force that full reload instead. Brief flash, but the macro view renders predictably every time. Tearing down the shared-model lifecycle properly to support SPA navigation is a follow-up.
…fail
Workflows containing a Macro instance failed to compile (no execution
possible) because:
- DbMacroRegistry.fetch read \`workflow.content\` and called
mapper.readValue(content, classOf[MacroBody]).
- Marker operators (MacroInput / MacroOutput) inside the body had been
serialized with their ports in backend PortIdentity shape
(\`{id: {id: 0, internal: false}, displayName: ""}\`).
- LogicalOp inherits \`inputPorts: List[PortDescription]\` from PortDescriptor,
so Jackson tried to parse those entries as PortDescription, choked on the
missing \`portID\` field, and threw.
- DbMacroRegistry's catch swallowed the exception and returned None, and
MacroExpander threw "not found in registry" — surfacing as a generic
compile failure on the parent workflow with no usable error message.
Two-pronged fix:
1. \`@JsonIgnoreProperties(Array("inputPorts", "outputPorts"))\` on
MacroInputOp / MacroOutputOp so already-persisted macros keep working —
the marker's port wiring is derived from \`portIndex\` via operatorInfo
anyway, so ignoring the JSON entries is correct.
2. Frontend marker serialization now emits proper PortDescription shape
(portID/displayName/disallowMultiInputs/isDynamicPort) for newly-created
macros, keeping the wire format consistent with the rest of the system.
The earlier "just so it renders" stub copied PythonUDFV2.png as Macro.png / MacroInput.png / MacroOutput.png, which made macro instances on the canvas indistinguishable from Python UDF ops — exactly the confusion the user just flagged. Generate proper icons (rounded "container" frame + a three-node mini-graph for Macro; left- and right-facing arrows for the markers) in a blue/teal accent that contrasts with the existing Python-yellow. Pure cosmetic, no behavioral change.
* MacroExpander: switch inner-op ID prefix from "/" to "--" so prefixed
IDs survive serialization through GlobalPortIdentitySerde's
VFS-URI path component. Update WorkflowCompiler.visibleOperatorId
and outer-error filter accordingly; add `require(!contains('/'))`
in the serde as a hard guard. All 17 MacroExpanderSpec tests
updated for the new separator and passing.
* WorkflowStatusService: fold inner-op stats keyed by
"${macroInstanceId}--*" into a synthetic entry under
macroInstanceId so the macro node renders state + row counts
during execution on the outer canvas. Worst-case state wins
(Recovering > Pausing > ... > Completed > Uninitialized);
row counts and worker counts are summed. Original prefixed
entries are preserved.
* ValidationWorkflowService: skip AJV schema validation for Macro
operators — the embedded schema references LogicalOp polymorphic
union (via MacroBody.operators) and AJV can't reliably handle it.
Connection validation alone still gates the red/grey state.
* OperatorMetadataService: when sanitizing schemas off the wire,
convert `nullable: true + $ref: X` to `anyOf: [{type: null},
{$ref: X}]` instead of just stripping nullable, so Option[T]
fields serialized as null round-trip cleanly through AJV.
* JointUIService: visually differentiate macro nodes — Macro
instance gets a soft-blue fill and dashed blue border; MacroInput
/ MacroOutput markers get a muted grey, rounded "port pad" look
with their operator-name label suppressed. changeOperatorColor
preserves the macro-specific stroke across validation toggles by
reading operatorType stashed on the JointJS element.
* WorkspaceComponent: pinned banner above the canvas when on
`/workflow/:id/macro/:macroId` so the user can't miss they're
editing a macro body and not the parent workflow.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously buildMacroFromSelection only created MacroInput/MacroOutput markers for ports that already had external links at macro-creation time. A selection like Filter → Projection where Projection's output wasn't yet connected ended up as a macro with one input port and zero output ports, breaking dataflow equivalence: the user couldn't reach Projection's output through the macro at all. Replacing a sub-DAG with a macro op is a structural substitution. Every input port on the selection that isn't fed by another selected op is a boundary input regardless of current external connectivity, and symmetrically for outputs. Walk selectedOperatorIDs × op.inputPorts/ outputPorts, filter out the internally-wired ones, and synthesize one marker per remaining port. The actual-external-edge rewiring (incomingEdges/outgoingEdges) is unchanged — it just maps a subset of the available macro ports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xecution view
Stitches the parent workflow's execution data — both stats (row counts,
state) and result rows — onto each external port of a Macro op, and
makes the drill-down view show the same data per inner op while the
parent is running.
Wire layout (frontend-only; engine stays macro-unaware):
* MacroService now computes per-definition body bindings — each Macro
external port `i` knows the body-relative (innerOp, innerPort) it
routes to via the MacroInput(i) / MacroOutput(i) markers. Cached on
first fetch; preloaded on `getOperatorAddStream` so the map is ready
before execution starts. `getBindingsForInstance(instanceId, macroId)`
lifts the body-relative IDs to runtime form (`${instanceId}--`) so
they match the engine's stat/result keys post-MacroExpander.
* WorkflowEditorComponent.synthesizeMacroOpStats sources per-port row
counts for each Macro on the outer canvas: macro input `i` reads from
the boundary inner op's `inputPortMetrics` at the body-link's target
port; macro output `j` reads from the inner op's `outputPortMetrics`.
Falls through to `withMacroAggregates`-supplied state until bindings
load, then refreshes on the next stats emission.
* WorkflowResultService gains a macro-instance result alias plus a
drill-down prefix. The alias routes `getResultService(macroId)` to
the inner op feeding output port 0, so the result panel shows the
macro's output without forcing the user to drill in. The drill-down
prefix transparently rewrites every result lookup to its runtime
form when the canvas is rendering a body via `?instance=...`.
* WorkflowEditorComponent listens to `route.queryParamMap.instance` —
the macro click-through now appends it to the drill-down URL — and
applies the same `${instanceId}--` prefix to stat lookups so live
parent-execution stats land on the body-relative op IDs the
drill-down canvas displays.
* Port-mapping completeness fix already in 49beec9 is the critical
upstream prerequisite: a Macro op with only an `input-0` port (and
no output port) can't be made to display output stats or results
no matter how the websocket layer is wired.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three coupled execution-path fixes:
* Item 3 — view-result/reuse-result on a macro op now forwards to the
inner boundary ops the macro's external outputs route to. Backend's
`opsToViewResult` is keyed by post-expansion op IDs (the macro op
itself doesn't survive MacroExpander), so executeWorkflowWith… rewrites
macro IDs to `${instanceId}--${innerOpId}` for every output binding
before submitting the plan. Multi-output macros mark all their output
producers; non-macro IDs pass through unchanged. Same rewrite for
`opsToReuseResult`.
* Item 2 — `MacroService.computeBodyBindings` now also collects
`nestedMacros: Map<innerOpId, nestedMacroId>` and
`getBindingsForInstance` walks them recursively, prefixing
`\${instanceId}--` at each layer until a terminal non-macro inner op
is reached. Fan-out at any layer is preserved by emitting one
resolved binding per terminal. Bodies of nested macros are eagerly
prefetched when their parent body loads, so the synchronous stat
lookup path finds everything cached.
* Item 1 — macro drill-down click-through switched from
window.location.href to Router.navigate. Full reload was killing
the parent's websocket subscription, so the drill-down view saw no
live execution stats. SPA navigation keeps WorkflowWebsocketService
alive across the route change, and the existing query-param
(?instance=...) handler in WorkflowEditorComponent already maps
body-relative op IDs onto runtime stat keys for the drilled-down
canvas. loadMacroWithId simplified to match loadWorkflowWithId's
pattern (drop the redundant resetAsNewWorkflow — setNewSharedModel +
reloadWorkflow together do a clean transition).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Revert SPA navigation back to hard reload for macro drill-down click-through. SPA-into-WorkspaceComponent-reuse hits a flurry of duplicate-link rejections from interleaved YJS server-replay + local reloadWorkflow that can't be resolved cleanly with the current shared-model lifecycle. Hard reload gives a clean WorkspaceComponent mount with a fresh canvas every time. * Stash (parentWid, instanceId) into sessionStorage before the hard navigation so the new page can later opt to reconnect to the parent's execution context for live drill-down stats. Wiring the rehydration is a follow-up; the stash itself is harmless if unused. * Use an anonymous YJS room for the drill-down view. Joining the macro definition's wid-keyed room replays accumulated historical operators the room ever held, fighting reloadWorkflow over the same logical data and producing duplicate-link cascades that destroyed the canvas on every navigation. Anonymous room = clean canvas; collaborative editing of macros via drill-down is deferred until we can do a proper YJS state reset on the server side. * SharedModelChangeHandler.validateAndRepairNewLink: when a link is duplicated, *skip rendering* it instead of deleting it from the shared model. The pre-fix behavior was eagerly destructive — the canonical link in the shared model got wiped along with the duplicate, leaving the canvas with nothing to render. Truly invalid links (non-existent op/port) still get repaired out of the model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Right-click a Macro instance → 'Expand macro' inlines its body back onto the parent canvas: deep-clones the body operators with fresh IDs (so re-using the same macro elsewhere doesn't collide), reproduces internal links, rewires every external link that was touching the macro to the matching boundary inner op + port via the body's MacroInput/MacroOutput markers, and finally deletes the macro op. Wrapped in bundleActions so undo collapses to a single step. v1 supports LIVE-linked macros only (body fetched from DbMacroRegistry). SNAPSHOT mode (embedded body in operatorProperties.snapshot) is a follow-up — same logic, different source. Layout is crude (a 3-column grid anchored at the macro's old position); a proper auto-layout pass is deferred. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In drill-down view, override the workflow metadata's wid to the parent
workflow's wid before reloading. ComputingUnitSelectionComponent reads
metadata.wid to decide which workflow id to open the execution websocket
against — if it gets the macro definition's wid (278), the drill-down
view subscribes to the macro's execution, not the parent's, and sees no
stats during the parent's actual run. Spoofing the wid to parent's lets
the websocket stay on the parent's execution stream, and the existing
${instanceId}-- prefix machinery in WorkflowEditorComponent maps those
keys onto the body-relative op IDs the drill-down canvas displays.
Safe because workflow persistence is disabled in drill-down (the macro
body is saved through MacroResource, not the regular workflow save
endpoint).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Surfaces the user's saved macros under a "Your Macros" section in the
operator palette so they can be reused on other workflows. Loaded once
on component init via MacroService.listMacros(). Each macro renders as
a clickable row with name + (X in / Y out) port-count chip; clicking
builds a fresh OperatorPredicate (Macro-operator-{uuid}, macroId set
from the summary's wid, port counts from portSpec) and places it on the
canvas — same shape as `swapSelectionWithMacroNode` produces from a
selection, so all downstream paths (validation, render, expansion,
execution) see a normal Macro op.
v1 is click-to-add only; true drag-from-palette would require special-
casing the drag-drop service because regular operators go through
WorkflowUtilService.getNewOperatorPredicate(type) which can't fill in
the macro-specific properties. Visual styling matches the dashed-blue
macro treatment on the canvas so palette→canvas reads as one identity.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a "Suggest Macros (AI)" button + inline panel in the operator
palette that surfaces ranked sub-DAG encapsulation candidates without
calling out to an LLM.
v1 heuristic: maximal linear chains where each interior op has exactly
one upstream and one downstream within the chain. Score = chain length
× source/sink penalty (≥2 ops, source-anchored chains discounted to
0.5×, sink-anchored to 0.7×). Top 10 returned. Per-candidate rationale
is derived from the operator-type sequence ("Looks like a reusable
preprocessing block", "Two-step pipeline: Filter → Projection", etc.).
UX: button shows brief "Analyzing workflow…" affordance (forced 250ms
delay) so the action reads as agent-like rather than instant lookup.
Top suggestion's operators get highlighted on the canvas immediately;
clicking a candidate row highlights+selects so the user can confirm
via right-click → Create Macro. v2 should call ContextMenuComponent's
private `swapSelectionWithMacroNode` flow directly.
LLM swap is one HTTP call away: replace `suggestMacros()` body with a
chat-assistant-service request returning the same `MacroSuggestion[]`
shape — UI and downstream materialize-action paths unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the hackathon-proposal §9.2 AI-fusion path: a macro instance can be "fused" into a single PythonUDFOpDescV2 that replaces the entire inlined sub-DAG at compile time, eliminating inter-actor handoffs for the chain. Frontend (MacroFusionService): template-based codegen — no LLM call. Pulls the macro body via getMacro(wid), walks the inner ops, emits a syntactically valid PythonUDFOperatorV2 class whose docstring lists the original pipeline. v1 verification is fake-success (sampleSize recorded, real sample-diff is a follow-up). Returns a `MacroFusion` payload the caller attaches to `operatorProperties.fusion`. Context-menu wiring (ContextMenuComponent.onFuseMacro): right-click a Macro instance → "Fuse for performance (AI)" → generates code, attaches the verified fusion to the macro's properties via setOperatorProperty, notifies the user with the rationale + estimated speedup. Backend (MacroExpander, both copies — amber WorkflowCompiler's and the WorkflowCompilingService's): if `m.fusion.exists(_.verified)`, return early from inlineMacro via `substituteFused` instead of fetching+ splicing the body. The new PythonUDFOpDescV2 reuses the macro instance ID so parent links stay valid (no rewrite), and inherits the macro's external input/output port shape. All 17 MacroExpanderSpec tests pass. LLM upgrade path: replace MacroFusionService.synthesizeFromBody() with a call to chat-assistant-service returning the same FusionResult shape. Real sample-diff verification would gate `verified = true` instead of defaulting to true after codegen. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
macros with regular ops
The original heuristic was treating a Macro→Filter edge as if it
contributed to Filter's in-degree, blocking Filter from being detected
as a chain head. The intent of "ignore macros entirely" is that edges
incident on a macro should NOT count toward any non-macro node's
degree — Filter whose only upstream is a Macro should appear as a
source (in-degree 0) in the filtered subgraph.
Fix `computeDegrees`, `findLinearChains` (adjacency), and
`predIsBranching` to only count edges where BOTH endpoints are
non-macro. Verified end-to-end in Macro_2 workflow: 3 Filter→Projection
pairs surfaced as candidates ("Two-step pipeline: Filter → Projection.
Reusable as a unit." / 2 ops · score 0.7).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracted the swap-selection-with-macro-node logic from ContextMenuComponent into MacroService.createMacroFromSelection so the suggestMacros panel can call it inline. Pre-fix the materialize action just highlighted the candidate operators and asked the user to right-click → Create Macro; that's two steps for what should be one click. Now clicking a candidate prompts for a name (defaulting to the heuristic's suggestedName) and creates+swaps inline — same end state as the right-click flow, faster demo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- inferCategory walks each macro's body and assigns one of: preprocessing / transformation / aggregation / visualization, based on the dominant operator-type family among inner ops. Falls back to 'uncategorized' when the body can't be parsed. - groupedMacroList groups the (filtered) macro list by category in a stable order so the palette renders deterministic sections. - Categories cached per-macroId after the first body fetch so we don't re-hit /api/macro/:wid on every render. A 'loading…' bucket shows briefly while the cache fills, then those macros slot into their real category on the next render pass. - Keeps the palette browsable as users accumulate macros — visually similar to how the built-in operators are grouped (preprocessing, visualization, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Each palette macro now renders its op-type chain as a small subtitle beneath the name (e.g. 'Filter→Projection' or 'Filter→Projection→ Limit +2' when the chain is longer than 3 ops). - Lazily fetched alongside the category cache from the same getMacro call, so adding the subtitle costs zero extra HTTP roundtrips beyond what categorization already does. - Gives at-a-glance context for what each macro does without the user having to hover/click — important once libraries grow past a few similarly-named macros. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Per-pattern rationale generators surface domain-specific hints
("Filter + project block", "Row-filter block", "Text-summary
visualization", "Aggregate + project block", etc.) rather than the
generic "preprocessing pipeline" pitch.
- Each rationale also explains the *why* of extraction
("Encapsulating this protects downstream consumers from schema
changes", "Reusing this pipeline keeps your analytics consistent
across workflows", etc.) — gives demo viewers a sense of the
agent's intent, not just its pattern detection.
- Adds detection for visualization and join+reshape patterns.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- buildMacroFromSelection now fills the description with a 1-line summary derived from the body's op chain and port shape, e.g. 'Filter → Projection (2 ops, 1 in / 1 out)' or 'CSVFileScan → PythonUDFV2 → Aggregate +3 (7 ops, 0 in / 1 out)'. - Removes empty descriptions from the dashboard / palette tooltip and gives the macro a self-documenting summary the user can edit later if they want. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- exportMacroToFile now scans the body content for any nested macroId references and records them in the export payload as dependsOnMacroWids: [wid, ...]. Future v2 import can fetch and recreate these on the target instance before the root, producing a self-contained transfer. - Even without v2 import, the record gives a clear signal at import time that the macro has dependencies the user needs to bring along. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…transfer - exportBundleForMacro walks nested macroId references depth-first and packages every reachable definition into a bundleVersion=2 JSON. - Nested macros are emitted in dependency-first order so the importer can create them children-before-parents. - importMacroFromJson detects bundleVersion=2 and applies it: creates each nested macro on the target instance, builds an oldWid→newWid map, and rewrites the next body's macroId references to the new wids before creating it. The root is rewritten + created last and its MacroDetail is returned. - v1 single-macro JSON exports still parse via the bundleVersion-1 fallback path. - Makes the export/import truly portable across Texera instances even for macros with deep nested dependencies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- 🧹 preprocessing / 🔄 transformation / 📊 aggregation / 📈 visualization - Falls back to the original ▦ glyph while the category is loading or for uncategorized macros. - Reuses the existing inferred-category cache so no additional fetches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New purple gradient button above Fuse All. Runs the omni-agent flow:
1. Detect patterns (suggestMacros)
2. Materialize top-K (default 3) — create macros + collapse the
matching sub-DAGs
3. Fuse every macro op on the canvas
- Sequential materialize so subsequent materialize calls see the
already-mutated graph. Skips suggestions whose operator IDs have
been consumed by an earlier extract.
- Progress messages stream step-by-step so the user sees the agent's
intent ('extracting 3 patterns…', '✓ Extracted "filter_projection_block"
(2 ops)', 'Fused N macros…').
- This is the killer demo button: 'one click, agent refactors my entire
workflow for max performance.'
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Previously: top-K suggestions from the same pattern would each create a SEPARATE macro definition — defeating the reuse story. - Now: group suggestions by suggestedName, take top-K distinct patterns. For each pattern, create the macro from the FIRST occurrence and swap every other live occurrence into the same definition (via swapSelectionWithExistingMacro). One pattern, one macro definition, N instances. - Progress messages now report ' (and refactored N other occurrences)' per pattern, so the user sees the reuse multiplier explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P0 fix for ERR_INSUFFICIENT_RESOURCES on user's large workflow. The categoryForMacro / subtitleForMacro features I added lazy-called getMacro(wid) from inside Angular template bindings (via the groupedMacroList getter). Every Angular change-detection cycle re-evaluated the binding while the cache was unfilled, firing a fresh HTTP request per macro per cycle. On a workflow with many user macros this DDoS'd the browser's fetch pool, starving the websocket / compile calls and producing thousands of console errors. - Strip the lazy getMacro calls; revert categorization + subtitle to no-ops. - Revert palette template to a flat filteredMacroList (name + usage chip + ports + export button). Categorization needs to move to the backend MacroSummary response (one round-trip) to be safe. - Also hide the Auto-optimize / Fuse-all buttons. Auto-optimize was causing the compile API to return 400 on the user's real workflow; per-macro fuse via right-click stays available for testing while the codegen quality is improved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes for navigation issues you reported:
1. Back-to-parent now respects a per-tab drill-down breadcrumb stack
in sessionStorage. Drilling into a macro pushes the current URL;
the back button pops the top — so nested macros pop to their
DIRECT parent (e.g. /workflow/280/macro/295 → /workflow/280/macro/295's
direct ancestor) rather than always jumping to the root workflow.
Click handler uses window.location.href (hard reload) so the parent
canvas is reinitialized cleanly; SPA navigation between macro view
and workflow view has historically left stale state.
2. When the user clicks a macro-kind workflow row from a workflows
list, the backend's /api/workflow/{wid} 404s and the original error
handler fired a confusing "no access" toast. Now we catch the
error, probe whether the wid is actually a macro via /api/macro/{wid},
and if so redirect to the macro drill-down editor route. Otherwise
surface a clearer "couldn't load workflow" message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ema port P0 instrumentation + bug fix for macro execution silently hanging. 1. RegionExecutionCoordinator.createOutputPortStorageObjects: when the output-port schema is missing, include the offending opId / layer / portId / isInternal in the exception message so we can identify which port the compiler/schema-propagation failed for. Previously the message was just "Schema is missing" with no context. 2. WorkflowExecutionCoordinator.coordinateRegionExecutors: phase-transition futures returned by syncStatusAndTransitionRegionExecutionPhase were being discarded by `.foreach(...)`. Any exception (e.g. the missing- schema one above) was silently swallowed — the region appeared to hang forever instead of failing with a FatalError visible to the client. Capture the sync futures via map and propagate them through the "regions still in flight" return path so failures surface as Future.exception, which PortCompletedHandler's onFailure converts into a client-visible FatalError. Together these unblock investigation of the real "stuck macro execution" issue — instead of silent stall, the user now gets a specific error pointing at the failing port. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
You were right — the previous "${macroInstanceId}--${innerOpId}" naming
scheme made the expanded LogicalPlan structurally DIFFERENT from a
hand-flattened workflow even when the topology was identical.
Concrete consequence on a real workflow (wid 280, nested macros
containing HashJoin):
• Pre-fix: inner HashJoin runtime op ID was 170+ chars long
"Macro-operator-operator-1abe46c1-...-54df9b954a8e--HashJoin-operator-operator-78eb2818-...-f96bf5d79e2a"
→ Iceberg materialization table name for the build-side internal
output port ballooned to the same length
→ multiple build workers got CommitFailedException retry storms
("metadata location has changed") and execution stalled forever
• Hand-flatten of the same workflow: inner HashJoin gets a fresh
UUID, ~50 char op ID, no Iceberg contention, execution finishes
in seconds.
Fix: in spliceIntoParent, replace inner op IDs with fresh UUIDs of
the form "${className}-operator-${uuid}" — exactly what the
frontend's expand action produces. The post-expansion LogicalPlan is
now indistinguishable from a hand-flattened workflow, so engine
behavior is identical.
Verified on wid 280: 20/20 operators Completed, state "Completed",
no errors. Previously stuck forever in phase-2 transition.
Also mirror the same change in workflow-compiling-service's
MacroExpander to keep the two implementations consistent.
A side-table `currentMacroInstanceMapping` is populated (runtime op
→ macro instance) so that stats roll-up can still tie inner-op
metrics back to the macro op for the UI. Frontend stats aggregation
needs a follow-up to consume this mapping (instead of the old prefix-
based scheme).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…as/drill-down
Two related changes that fix "macro op shows no stats" + "drill-down body
shows nothing on execution":
1. Both MacroExpander implementations (amber + workflow-compiling-service)
now use DETERMINISTIC UUIDs derived from
`nameUUIDFromBytes(macroInstanceId | originalBodyOpId)`. Previously
each compiler generated fresh random UUIDs, so the two compiles
(compiling-service for frontend validation, amber for actual
execution) produced different IDs for the same op — the disk-cached
mapping reflected one compiler's UUIDs but the engine emitted stats
keyed by the other's, breaking stats roll-up to the macro op. Same
workflow → same UUIDs now, regardless of which compiler runs.
2. Frontend stats binding:
- WorkflowStatusService.withMacroAggregates now consults
MacroService.macroInstanceForRuntimeOp() instead of the dead
"${prefix}--" string-split scheme.
- MacroService.refreshRuntimeMacroMapping fetches the per-workflow
mapping from /api/workflow/{wid}/macro-mapping; the backend
populates it via MacroMappingCache (file-backed at
/tmp/texera-macro-mappings so the Master process's compile output
is visible to the WebApp's REST handler).
- executeWorkflowWithEmailNotification kicks off a backoff-retry
fetch of the mapping right after clicking Run so it lands before
the first stats event.
- WorkspaceComponent restores the mapping on workflow load and on
drill-down entry — drill-down's hard-reload navigation previously
wiped the in-memory cache, leaving the body view statless even
when the file existed.
- workflow-editor uses MacroService.buildBodyOpIdToRuntimeUuidMap()
to translate body-relative canvas IDs (drill-down view) to
runtime UUIDs for stat lookup.
- Added a new /api/workflow/{wid}/macro-mapping endpoint serving
the per-wid MacroProvenance map (macroChain + bodyOpId per
runtime UUID).
Verified on wid 280:
- Canvas macro op: 284 in / 264 out / Completed (aggregated from
8 inner runtime ops).
- Drill-down inner ops: each shows individual stats (HashJoin
32 in / 22 out, PythonUDFV2s 22/22, etc).
Nested macro op stat aggregation inside drill-down is the remaining
gap and is tracked as a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… drill-down) A runtime op inside a nested macro contributes to TWO aggregates: - the outer macro on the parent canvas (chain[0]) - the nested macro inside the outer's drill-down view (chain[1]) withMacroAggregates previously only rolled up to chain[0]. Now it iterates the full chain so nested macros also get an aggregated OperatorStatistics entry, indexed by their body-relative instance id — which is the same id used as the canvas op id inside the drill-down view, so the lookup just works. Verified on wid 280 drill-down (/macro/295?instance=…1abe46c1): nested macro d3188a84 → 176 in / 176 out / Completed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
withMacroAggregates was summing aggregatedInputRowCount across EVERY
inner op of a macro — which double-counted internal traffic (e.g. for
nested HashJoin → projection → ... chains the count grew to ~5× the
correct value). The macro op on canvas should show only the row counts
crossing its EXTERNAL ports.
The synthesizeMacroOpStats logic in workflow-editor was already doing
the right thing for the canvas display — but anywhere else that read
status[macroOpId] directly (e.g. drill-down nested-macro op stats)
got the wrong number.
Changes:
- Move port-based aggregation into MacroService.synthesizeMacroOpStats
so both renderers share one source of truth.
- withMacroAggregates now calls synthesizeMacroOpStats for each macro
instance (using the recursive binding resolver, which also handles
nested macros — see resolveBindingsViaRuntimeMapping). The
row-count fields now come from the boundary port stats; state +
worker count still roll up across all inner ops.
- Add MacroService.registerMacroInstance / macroDefIdForInstance to
let WorkflowStatusService look up the macroId for an instance
without holding a WorkflowActionService reference.
- Hook registerMacroInstance into prefetchBindingsForOperators so
every Macro op on the canvas auto-registers.
Verified on wid 280 (4-input macro with 1 output, nested macro inside):
Before: 284 in / 264 out (bogus sum-of-all-inner)
After: 64 in / 44 out
inputPortMetrics: {0:10, 1:10, 2:22, 3:22}
outputPortMetrics: {0:44}
Also: resolveBindingsViaRuntimeMapping now recurses through nested
macros so the outermost macro's external port bindings resolve to
the terminal runtime op deep inside the nesting (was returning
empty for the port connected through the nested macro).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolveBindingsViaRuntimeMapping was requiring `prov.macroChain.length
=== accumulatedChain.length` for terminal matches. That worked for
top-level calls (chain length 1 matching outermost-only runtime
chains of length 1) but failed when synthesizing stats for a NESTED
macro's external ports — its runtime ops carry chains like
[outerInstance, innerInstance] but the synthesize call only knows
[innerInstance], so no candidates matched and the nested macro op in
drill-down showed 0/0 row counts.
Fix: match if `prov.macroChain` ENDS WITH `accumulatedChain`. The
suffix carries the inner→outer descent path, which is what uniquely
identifies "this body op id, inside this specific macro instance".
Verified on wid 280:
- Parent canvas: outer 1abe46c1 → 64 in / 44 out (port {0:10, 1:10, 2:22, 3:22})
- Outer drill-down: nested d3188a84 → 44 in / 44 out (port {0:22, 1:22})
- Nested drill-down: each of 4 body ops shows 44/44 stats
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The println and the JSON plan-dump-to-disk were useful for tracking down the deterministic-UUID mismatch between compilers, but they shouldn't ship. The MacroMappingCache.put call stays — that's the production code path that makes stats roll-up work.
UI/AI surface
- Suggestions panel: replace raw "score X.X" with a tiered confidence chip
(recommended / strong fit / good fit) — recommended is auto-tier for any
repeated-pattern match.
- Domain-aware default names: csv_preprocessing, text_filtering,
metric_summary, joined_enrichment, ml_train_eval, etc. — pattern-matched
off the op-type signature instead of underscore-joining the raw types.
Unified across the AI panel and right-click create-macro.
- Fusion rationale + speedup ground in handoff-removal model:
"N ops -> 1 UDF, K fewer actor handoffs. Estimated 1.6x speedup."
Replaces the previous "1 + len*0.4" placeholder.
Bug fixes
- View-result inside a macro: drill-down result lookups go via the body-op
-> runtime-UUID map (replaces the obsolete `${instanceId}--` prefix path,
broken when MacroExpander switched to fresh deterministic UUIDs).
Re-emits on a new runtime-mapping tick so async fetches don't race.
- Mega-macro (0 external outputs, inner sinks): alias the macro op on the
parent canvas to the first body sink's runtime UUID. Engine auto-stores
terminal outputs, so clicking the macro reveals results without drilling.
- Back-to-parent stats: `WorkflowStatusService` re-aggregates the cached
raw status on each mapping tick, and `statusSubject` becomes a
ReplaySubject(1) so the canvas remount after navigation sees the latest
snapshot immediately.
- Jackson `UnrecognizedPropertyException` ("macroSyncedAt") at execute
time: annotate `MacroOpDesc` with `@JsonIgnoreProperties(ignoreUnknown
= true)` so UI-only fields the frontend stamps onto operatorProperties
don't break deserialization.
Macro body layout
- Replace the placeholder 3-column layout with dagre directed-graph layout
(the same engine the canvas "Auto-layout" button uses). Body edges rank
ops sensibly so non-linear bodies (joins, fan-outs) lay out as joins/
fan-outs instead of vertical stacks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same shape of bug as the macroSyncedAt fix on MacroOpDesc: the frontend
stamps `estimatedSpeedup` ("1.6x") onto the fusion payload so the canvas
can render it next to the FUSED badge, but the backend MacroFusion case
class doesn't model that field. Jackson rejects the WorkflowExecuteRequest
at execute time once the fused macro is part of the run.
Annotate `MacroFusion` with `@JsonIgnoreProperties(ignoreUnknown = true)`
so this and any future UI-only convenience fields don't break the round
trip. Backend MacroExpander only ever reads `verified` to decide whether
to substitute the UDF.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scratch file used to draft the hackathon PR description — not part of the project. Mistakenly committed in the previous change; remove it from the tracked tree and keep it locally for the PR-open step.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Description: AI-Augmented Macro Operators
What changes were proposed in this PR?
A user builds a workflow today by dropping individual operators onto the canvas one at a time. As the workflow grows, the canvas turns into a wall of nodes; common sub-DAGs (CSV → Filter → Projection, an enrichment join, a feature-prep chain) get re-built by hand every time. There's no first-class way to encapsulate a sub-DAG, share it, or have the agent suggest one — and once a sub-DAG is repeated, there's no signal pushing the user to refactor.
This PR introduces macro operators: a logical-plan-level abstraction that lets a sub-DAG live as a single node on the canvas, plus the AI surfaces (suggest, fuse, drill-down) that make encapsulation discoverable.
Demo Video
Before / after
✓ recommendedPythonUDFOpDescV2; canvas shows⚡ FUSED · 1.6×The story
Drop a few operators on the canvas — CSV scan, two filters, a projection, a sink. The
✨ Suggest Macros (AI)button in the palette already shows a2badge before you click — the agent has been silently scanning the workflow on every graph change. Click it: a panel slides in with two candidates, the strongest tagged✓ recommendedbecause the sameFilter → Projectionshape appears twice in your workflow. The suggested name isdata_cleaning(domain-aware, notfilter_projection_block). Hover a row — the matching ops light up on canvas. Click it: the selected ops collapse into a single macro node, and the same shape elsewhere gets the same swap because the auto-optimize pass spotted the duplicate.Run the workflow. Stats roll up to the macro node — its input/output port counts and state badge update in real time, summed from the inner ops the engine actually executes. Double-click the macro: the canvas swaps to a drill-down editor showing the body with the same dagre auto-layout as the main canvas, and stats keep flowing because the body-relative op IDs are aliased to their post-expansion runtime UUIDs. Click into a nested macro inside the body — stats keep working three levels deep.
Want it faster? Right-click the macro →
fuse for performance (AI). The frontend's codegen walks the body operators (Filter, Projection, Regex, Limit, Distinct, inlined PythonUDFV2 with yield-rewriting), emits aprocess_tuplefunction, attaches it as aMacroFusionpayload withverified = true, and stamps the operator gold with a⚡ FUSED · 1.6×badge. The speedup is grounded:N − 1removed actor handoffs ×0.30× per handoff, capped at4×per VLDB 2024 §6's empirical measurements. Run again. At compile time, the backend'sMacroExpanderreadsfusion.verified = trueand substitutes a singlePythonUDFOpDescV2for the inlined body — no inter-actor serialization for the collapsed steps.Share a macro? Right-click → export. You get a portable JSON bundle that includes every nested macro the root depends on, so importing on a fresh Texera instance reconstructs the whole dependency graph.
How it works under the hood
Macros live at the logical-plan layer only. A new
MacroExpanderpre-compile pass (mirrored inamber/andworkflow-compiling-service/) inlines everyMacroOpDescinto its body operators and rewrites parent edges, so the physical-plan layer never sees a macro. The expander runs beforeexpandLogicalPlan, and the rest of the engine pipeline behaves as if the workflow had been hand-flattened.Deterministic UUIDs for inner ops. The expander assigns each inner op a fresh ID via
UUID.nameUUIDFromBytes("${macroInstanceId}|${originalBodyOpId}"). Required because:${instanceId}--${innerOp}prefix scheme produced 170+ char IDs that caused Iceberg commit thrash on HashJoin's internal build-side port — execution that ran fine on a hand-flattened plan hung on the macro-wrapped equivalent.WorkflowCompilingService(frontend validation) andComputingUnitMaster'sWorkflowCompiler(actual execution). Both run MacroExpander on the same workflow content. If they usedUUID.randomUUID(), the side-table written by one wouldn't match the runtime stats emitted by the other; deterministic UUIDs guarantee bit-identical plans.Provenance side-table.
MacroExpanderpopulatesMap[runtimeOpId → MacroProvenance(macroChain, bodyOpId)]during expansion;WorkflowCompilerdrains it after compile and stores it inMacroMappingCache(file-backed at/tmp/texera-macro-mappings/wid-{wid}.jsonfor cross-JVM visibility betweenComputingUnitMasterandTexeraWebApplication). Exposed viaGET /api/workflow/{wid}/macro-mapping. FrontendWorkflowStatusService.withMacroAggregateswalks the chain to roll inner-op stats up to every macro level — parent canvas + each nested drill-down.Nested macros recurse fully. A runtime op buried three macros deep has
macroChain = [outerInstance, middleInstance, innerInstance]; the resolver suffix-matches so a stats roll-up rooted at any level finds its runtime ops.AI surfaces.
MacroSuggestionServiceruns two heuristic detectors side-by-side: linear chains (≥2 ops where each interior node has in-deg=1 and out-deg=1) and recurring(opType₁, opType₂, …)window patterns. Recurring patterns auto-tier as✓ recommended; clean middle chains tier asstrong fit. Names map domain-aware substrings (csv.*scan.*filter.*projection→csv_preprocessing,regex.*filter→text_filtering, etc.) instead of underscore-joining op types.MacroFusionServiceemits a Python UDF body from the macro body, covering Filter, Projection, Regex, Limit, Distinct, inlined PythonUDFV2 (yield-rewritten). Thefusion.verified = trueflag is the contractMacroExpanderreads to substitute; the rest of the speedup estimate is presentation.What this also fixes along the way
/api/macro/*HTTP storm — lazy fetches in template bindings caused an infinite loop; reverted to a flat palette and removed the lazy fetches.RegionExecutionCoordinatorinstead of stalling silently.MacroService.buildBodyOpIdToRuntimeUuidMap(replaces the obsolete prefix-based alias). Mega-macros with 0 external outputs alias the canvas op to the first body sink, so the auto-stored terminal output is reachable without drilling.WorkflowStatusServicere-aggregates the cached raw status on everyruntimeMacroMappingTick; its emission Subject becomesReplaySubject(1)so the canvas remount after navigation sees the latest snapshot immediately.macroSyncedAt/estimatedSpeedupUnrecognizedPropertyException —MacroOpDescandMacroFusionboth annotated with@JsonIgnoreProperties(ignoreUnknown = true)so UI-only convenience fields don't break deserialization at execute time.Related issues, documentation, discussions
Related to the Apache Texera Agent Hackathon (#5059). Builds on §9.2 of the macro design doc (AI fusion substitution path).
How was this PR tested?
MacroExpanderSpec(~694 lines) covers the expander on its own: single-macro expansion, nested expansion (outer + inner chains), input fan-out (one external port → multiple inner consumers), output fan-in detection (raises), cycle detection across nested macros, depth-limit guard, deterministic-UUID property (same input → same output across compiler instances), and provenance side-table population.MacroOpDescSpeccovers Jackson serialization round-trip, including tolerance of unknown frontend-only fields (macroSyncedAt,estimatedSpeedup).⚡ FUSED · 1.6×substitution → unfuse → export bundle → reimport on a fresh wid. Stats roll up correctly at every level; canvas remount after navigation no longer wipes non-macro op state.Generated by
Claude Code (Claude Opus 4.7)