[trainer,reward] fix: graceful shutdown on Ctrl+C to prevent zombie GPU processes by Jayce-Ping · Pull Request #149 · X-GenGroup/Flow-Factory

Jayce-Ping · 2026-04-30T13:16:20Z

Summary

Add RewardBuffer.shutdown() for non-blocking executor termination on interrupt
Add BaseTrainer.cleanup() that calls shutdown on reward buffers
Wrap trainer.start() with KeyboardInterrupt handler in train.py, using os._exit(0) to bypass NCCL heartbeat noise
Catch SIGINT exit codes in cli.py for clean subprocess teardown

Context

When users press Ctrl+C during multi-GPU training, worker processes could become orphaned holding GPU memory. Root cause: the RewardBuffer's ThreadPoolExecutor (holding GPU references via CUDA streams) was never shut down on interrupt, and NCCL's C++ heartbeat threads kept running during Python's normal shutdown sequence.

Test plan

Launch multi-GPU training with ff-train, press Ctrl+C during evaluation/sampling
Verify nvidia-smi shows no zombie processes after exit
Verify log shows "Training interrupted by user" then clean exit
Verify normal training completion is unaffected

🤖 Generated with Claude Code

When users press Ctrl+C during multi-GPU training, worker processes could become orphaned while still holding GPU memory. The root cause: the RewardBuffer's ThreadPoolExecutor (holding GPU references via CUDA streams) was never shut down on interrupt, and NCCL's C++ heartbeat threads kept running during Python's normal shutdown sequence. Changes: - Add RewardBuffer.shutdown() for non-blocking executor termination - Add BaseTrainer.cleanup() that calls shutdown on reward buffers - Wrap trainer.start() with KeyboardInterrupt handler in train.py - Use os._exit(0) after cleanup to bypass NCCL heartbeat noise - Catch SIGINT exit codes in cli.py for clean subprocess teardown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR improves Ctrl+C handling during (multi-GPU) training to avoid orphaned worker processes that keep GPU memory allocated, by explicitly shutting down async reward executors and exiting cleanly when interrupted.

Changes:

Add RewardBuffer.shutdown() for fast, non-reinitializing teardown of async reward executors.
Add BaseTrainer.cleanup() to shut down training/eval reward buffers on interrupt.
Wrap trainer.start() in train.py with KeyboardInterrupt handling and force-exit to avoid noisy/hanging shutdown paths; handle SIGINT-like subprocess exit codes in cli.py.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
src/flow_factory/trainers/abc.py	Adds `cleanup()` to shut down reward buffers during interrupt-driven teardown.
src/flow_factory/train.py	Catches `KeyboardInterrupt`, calls trainer cleanup, and exits immediately to avoid lingering shutdown noise.
src/flow_factory/rewards/reward_processor.py	Introduces `RewardBuffer.shutdown()` to terminate the underlying executor without resetting state.
src/flow_factory/cli.py	Treats SIGINT-style subprocess exits as a clean interrupt and exits with code 0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        trainer.cleanup()
+        os._exit(0)


+        Called on KeyboardInterrupt to ensure ThreadPoolExecutor threads (which
+        may hold GPU references via reward model CUDA streams) are terminated
+        before the process exits. The OS reclaims all GPU memory on exit, so
+        explicit empty_cache() is unnecessary and would race with executor threads.


Address review feedback: - Wrap trainer.cleanup() in try/finally so os._exit(0) is guaranteed to execute even if cleanup fails on partially-initialized state - Correct cleanup() docstring to reflect non-blocking semantics (shutdown signals threads but os._exit does the actual termination) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds explicit interrupt/teardown hooks to ensure multi-GPU training exits cleanly on Ctrl+C without leaving orphaned processes holding GPU memory, primarily by shutting down async reward executors and force-exiting to avoid noisy/disruptive NCCL shutdown behavior.

Changes:

Add RewardBuffer.shutdown() for fast, non-blocking threadpool teardown on interrupt.
Add BaseTrainer.cleanup() to trigger reward-buffer shutdowns during interrupt handling.
Handle KeyboardInterrupt in train.py with cleanup + os._exit(0), and treat SIGINT-like subprocess exits as clean interrupts in cli.py.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
src/flow_factory/trainers/abc.py	Introduces `BaseTrainer.cleanup()` that shuts down training/eval reward buffers without blocking.
src/flow_factory/train.py	Wraps `trainer.start()` in a `KeyboardInterrupt` handler, calls `trainer.cleanup()`, then `os._exit(0)`.
src/flow_factory/rewards/reward_processor.py	Adds `RewardBuffer.shutdown()` to terminate the async `ThreadPoolExecutor` without reinitialization.
src/flow_factory/cli.py	Treats SIGINT-style subprocess termination as a clean interrupt to avoid noisy tracebacks and ensure exit(0).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Before this change, os._exit(0) bypassed all atexit handlers, causing wandb/swanlab to lose unsynced metrics. Now cleanup() calls logger.finish() before exit, allowing data sync to complete. Double Ctrl+C still force-exits via try/finally + os._exit(0). - Add Logger.finish() base method (cleans up temp files) - Override in WandbLogger: calls wandb.finish(quiet=True) - Override in SwanlabLogger: calls swanlab.finish() - Override in TensorboardLogger: calls self.platform.close() - BaseTrainer.cleanup(): call self.logger.finish() after executor shutdown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wandb.finish() did not reliably sync data during interrupt. Revert to keep the scope focused on graceful GPU process exit only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR improves Ctrl+C handling during training to avoid orphaned/zombie GPU processes by introducing explicit non-blocking shutdown hooks for async reward workers and handling SIGINT exit paths more deliberately in both the training entrypoint and CLI launcher.

Changes:

Add RewardBuffer.shutdown() to stop the reward async executor without waiting/reinitializing.
Add BaseTrainer.cleanup() to trigger reward-buffer shutdown on interruption.
Catch KeyboardInterrupt in train.py and force-exit, plus treat SIGINT returns as clean exits in cli.py.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/flow_factory/trainers/abc.py`	Adds `BaseTrainer.cleanup()` to shutdown reward buffers on interrupt.
`src/flow_factory/train.py`	Wraps training start in `KeyboardInterrupt` handling and force-exits after cleanup.
`src/flow_factory/rewards/reward_processor.py`	Adds `RewardBuffer.shutdown()` for non-blocking executor termination.
`src/flow_factory/cli.py`	Treats SIGINT/KeyboardInterrupt from the training subprocess as a clean exit.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        try:
+            trainer.cleanup()
+        finally:
+            os._exit(0)


    trainer = load_trainer(config)
-    trainer.start()
-
+    try:
+        trainer.start()


If Ctrl+C arrives during load_trainer() (e.g. while loading large models), the ThreadPoolExecutor may already exist but cleanup() would never run. Widen the try/except to include trainer creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Improves Ctrl+C (SIGINT) handling during training to avoid orphaned processes holding GPU memory, primarily by proactively shutting down async reward worker executors and exiting quickly.

Changes:

Add a RewardBuffer.shutdown() method for fast, non-blocking executor teardown on interruption.
Introduce BaseTrainer.cleanup() to shut down reward buffers’ async executors.
Add SIGINT/KeyboardInterrupt handling in train.py (including forced exit) and in cli.py to treat Ctrl+C as a clean termination.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/flow_factory/trainers/abc.py`	Adds `BaseTrainer.cleanup()` to shut down reward buffers’ async executors.
`src/flow_factory/train.py`	Wraps `trainer.start()` with `KeyboardInterrupt` handling and forces immediate process exit.
`src/flow_factory/rewards/reward_processor.py`	Adds `RewardBuffer.shutdown()` for final teardown of the thread pool executor.
`src/flow_factory/cli.py`	Treats SIGINT-like subprocess termination and KeyboardInterrupt as clean exits.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    except KeyboardInterrupt:
+        if local_rank == 0:
+            logger.info("Training interrupted. Cleaning up...")
+        try:
+            if trainer is not None:
+                trainer.cleanup()
+        finally:
+            os._exit(0)


+            sys.exit(0)
+        raise
+    except KeyboardInterrupt:
+        logger.info("Training interrupted.")
+        sys.exit(0)


Exit code 130 (128 + SIGINT) is the standard convention for processes interrupted by Ctrl+C. Using 0 would mask the interrupt from shell scripts and cluster schedulers (SLURM, K8s) that rely on exit codes to distinguish successful completion from user cancellation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolve conflicts between CRD trainer (this PR) and DGPO trainer (X-GenGroup#133) added to main since branching: - training_args.py: Place DGPOTrainingArguments before CRDTrainingArguments, each class retains its own fields and defaults - algorithms.md: Adopt main's reference numbering, add CRD as ref [13] - crd.py: Add _maybe_offload_samples_to_cpu() call matching the pattern established in GRPO/GRPOGuard on main (PR X-GenGroup#149) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…PU processes (X-GenGroup#149) * fix: add graceful shutdown on Ctrl+C to prevent zombie GPU processes When users press Ctrl+C during multi-GPU training, worker processes could become orphaned while still holding GPU memory. The root cause: the RewardBuffer's ThreadPoolExecutor (holding GPU references via CUDA streams) was never shut down on interrupt, and NCCL's C++ heartbeat threads kept running during Python's normal shutdown sequence. Changes: - Add RewardBuffer.shutdown() for non-blocking executor termination - Add BaseTrainer.cleanup() that calls shutdown on reward buffers - Wrap trainer.start() with KeyboardInterrupt handler in train.py - Use os._exit(0) after cleanup to bypass NCCL heartbeat noise - Catch SIGINT exit codes in cli.py for clean subprocess teardown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: ensure os._exit runs even if cleanup raises, fix docstring Address review feedback: - Wrap trainer.cleanup() in try/finally so os._exit(0) is guaranteed to execute even if cleanup fails on partially-initialized state - Correct cleanup() docstring to reflect non-blocking semantics (shutdown signals threads but os._exit does the actual termination) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: finalize logger on Ctrl+C to preserve wandb/swanlab data Before this change, os._exit(0) bypassed all atexit handlers, causing wandb/swanlab to lose unsynced metrics. Now cleanup() calls logger.finish() before exit, allowing data sync to complete. Double Ctrl+C still force-exits via try/finally + os._exit(0). - Add Logger.finish() base method (cleans up temp files) - Override in WandbLogger: calls wandb.finish(quiet=True) - Override in SwanlabLogger: calls swanlab.finish() - Override in TensorboardLogger: calls self.platform.close() - BaseTrainer.cleanup(): call self.logger.finish() after executor shutdown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: remove logger finalization on Ctrl+C wandb.finish() did not reliably sync data during interrupt. Revert to keep the scope focused on graceful GPU process exit only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cover trainer construction in KeyboardInterrupt handler If Ctrl+C arrives during load_trainer() (e.g. while loading large models), the ThreadPoolExecutor may already exist but cleanup() would never run. Widen the try/except to include trainer creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move success log inside try block for cleaner control flow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update * fix: use exit code 130 for SIGINT in cli launcher Exit code 130 (128 + SIGINT) is the standard convention for processes interrupted by Ctrl+C. Using 0 would mask the interrupt from shell scripts and cluster schedulers (SLURM, K8s) that rely on exit codes to distinguish successful completion from user cancellation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 30, 2026 13:16

Copilot started reviewing on behalf of Jayce-Ping April 30, 2026 13:17 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Jayce-Ping requested a review from Copilot April 30, 2026 13:24

Copilot started reviewing on behalf of Jayce-Ping April 30, 2026 13:25 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Jayce-Ping force-pushed the feat/graceful-shutdown-ctrl-c branch from 4928bc2 to c10f9b1 Compare April 30, 2026 14:00

revert: remove logger finalization on Ctrl+C

f979bd6

wandb.finish() did not reliably sync data during interrupt. Revert to keep the scope focused on graceful GPU process exit only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Jayce-Ping requested a review from Copilot April 30, 2026 14:17

Copilot started reviewing on behalf of Jayce-Ping April 30, 2026 14:18 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread src/flow_factory/train.py

try:

trainer.cleanup()

finally:

os._exit(0)

Comment thread src/flow_factory/train.py Outdated

Comment on lines +54 to +56

trainer = load_trainer(config)

trainer.start()

try:

trainer.start()

Jayce-Ping and others added 3 commits April 30, 2026 22:26

fix: move success log inside try block for cleaner control flow

cca9f48

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update

4e9211a

Jayce-Ping requested a review from Copilot April 30, 2026 14:44

Copilot started reviewing on behalf of Jayce-Ping April 30, 2026 14:44 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Jayce-Ping force-pushed the feat/graceful-shutdown-ctrl-c branch from e953d99 to 2a54b8f Compare April 30, 2026 14:52

Jayce-Ping merged commit 5b66da0 into main Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[trainer,reward] fix: graceful shutdown on Ctrl+C to prevent zombie GPU processes#149

[trainer,reward] fix: graceful shutdown on Ctrl+C to prevent zombie GPU processes#149
Jayce-Ping merged 8 commits into
mainfrom
feat/graceful-shutdown-ctrl-c

Jayce-Ping commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Jayce-Ping commented Apr 30, 2026

Summary

Context

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants