What happened?
ReconfigurationSpec's "Engine should propagate reconfiguration through a source operator in workflow" case (added in #4220) consistently times out at Await.result(completion, Duration.fromMinutes(1)).
The workflow under test is pythonSourceOpDesc(10000) -> pythonOpDesc(), reconfiguring only the downstream UDF. From the worker logs, the reconfiguration ECM does propagate correctly:
- Source receives the
test-reconfigure-1 ECM (cmd=None) and forwards it to the UDF.
- UDF receives and processes the embedded
UpdateExecutor command.
- After resume, source produces all tuples and sends
EndChannel to UDF.
- UDF logs
process channel ECM ... EndChannel and then goes silent for the full 1-minute timeout — no _process_end_channel-driven port_completed / complete() activity, no Region 0 successfully terminated from the controller.
The other four cases in the spec pass (single-op python UDF reconfigure, java operator reconfigure, source-as-target rejection, two-UDF chain). The failure is specific to the multi-worker source-propagation path with a Python source.
#4531 (which restored the web-service entrypoint and added per-worker completion notification) does not fix this — the test still times out on main after that PR.
To unblock CI I'm temporarily marking this case ignore in a follow-up PR; this issue tracks the real fix.
How to reproduce?
sbt "WorkflowExecutionService/testOnly *ReconfigurationSpec"
Expected: 5/5 tests pass.
Observed: 4/5 pass; should propagate reconfiguration through a source operator in workflow fails with:
com.twitter.util.TimeoutException: 1.minutes
at com.twitter.util.Promise.ready(Promise.scala:680)
...
at org.apache.texera.amber.engine.e2e.ReconfigurationSpec.shouldReconfigure(ReconfigurationSpec.scala:180)
Version
1.1.0-incubating (Pre-release/Master)
Commit Hash (Optional)
98ba87e
What happened?
ReconfigurationSpec's"Engine should propagate reconfiguration through a source operator in workflow"case (added in #4220) consistently times out atAwait.result(completion, Duration.fromMinutes(1)).The workflow under test is
pythonSourceOpDesc(10000) -> pythonOpDesc(), reconfiguring only the downstream UDF. From the worker logs, the reconfiguration ECM does propagate correctly:test-reconfigure-1ECM (cmd=None) and forwards it to the UDF.UpdateExecutorcommand.EndChannelto UDF.process channel ECM ... EndChanneland then goes silent for the full 1-minute timeout — no_process_end_channel-drivenport_completed/complete()activity, noRegion 0 successfully terminatedfrom the controller.The other four cases in the spec pass (single-op python UDF reconfigure, java operator reconfigure, source-as-target rejection, two-UDF chain). The failure is specific to the multi-worker source-propagation path with a Python source.
#4531 (which restored the web-service entrypoint and added per-worker completion notification) does not fix this — the test still times out on
mainafter that PR.To unblock CI I'm temporarily marking this case
ignorein a follow-up PR; this issue tracks the real fix.How to reproduce?
Expected: 5/5 tests pass.
Observed: 4/5 pass;
should propagate reconfiguration through a source operator in workflowfails with:Version
1.1.0-incubating (Pre-release/Master)
Commit Hash (Optional)
98ba87e