[trainer,reward] fix: graceful shutdown on Ctrl+C to prevent zombie GPU processes#149
Conversation
When users press Ctrl+C during multi-GPU training, worker processes could become orphaned while still holding GPU memory. The root cause: the RewardBuffer's ThreadPoolExecutor (holding GPU references via CUDA streams) was never shut down on interrupt, and NCCL's C++ heartbeat threads kept running during Python's normal shutdown sequence. Changes: - Add RewardBuffer.shutdown() for non-blocking executor termination - Add BaseTrainer.cleanup() that calls shutdown on reward buffers - Wrap trainer.start() with KeyboardInterrupt handler in train.py - Use os._exit(0) after cleanup to bypass NCCL heartbeat noise - Catch SIGINT exit codes in cli.py for clean subprocess teardown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves Ctrl+C handling during (multi-GPU) training to avoid orphaned worker processes that keep GPU memory allocated, by explicitly shutting down async reward executors and exiting cleanly when interrupted.
Changes:
- Add
RewardBuffer.shutdown()for fast, non-reinitializing teardown of async reward executors. - Add
BaseTrainer.cleanup()to shut down training/eval reward buffers on interrupt. - Wrap
trainer.start()intrain.pywithKeyboardInterrupthandling and force-exit to avoid noisy/hanging shutdown paths; handle SIGINT-like subprocess exit codes incli.py.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/flow_factory/trainers/abc.py | Adds cleanup() to shut down reward buffers during interrupt-driven teardown. |
| src/flow_factory/train.py | Catches KeyboardInterrupt, calls trainer cleanup, and exits immediately to avoid lingering shutdown noise. |
| src/flow_factory/rewards/reward_processor.py | Introduces RewardBuffer.shutdown() to terminate the underlying executor without resetting state. |
| src/flow_factory/cli.py | Treats SIGINT-style subprocess exits as a clean interrupt and exits with code 0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| trainer.cleanup() | ||
| os._exit(0) |
| Called on KeyboardInterrupt to ensure ThreadPoolExecutor threads (which | ||
| may hold GPU references via reward model CUDA streams) are terminated | ||
| before the process exits. The OS reclaims all GPU memory on exit, so | ||
| explicit empty_cache() is unnecessary and would race with executor threads. |
Address review feedback: - Wrap trainer.cleanup() in try/finally so os._exit(0) is guaranteed to execute even if cleanup fails on partially-initialized state - Correct cleanup() docstring to reflect non-blocking semantics (shutdown signals threads but os._exit does the actual termination) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds explicit interrupt/teardown hooks to ensure multi-GPU training exits cleanly on Ctrl+C without leaving orphaned processes holding GPU memory, primarily by shutting down async reward executors and force-exiting to avoid noisy/disruptive NCCL shutdown behavior.
Changes:
- Add
RewardBuffer.shutdown()for fast, non-blocking threadpool teardown on interrupt. - Add
BaseTrainer.cleanup()to trigger reward-buffer shutdowns during interrupt handling. - Handle
KeyboardInterruptintrain.pywith cleanup +os._exit(0), and treat SIGINT-like subprocess exits as clean interrupts incli.py.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/flow_factory/trainers/abc.py | Introduces BaseTrainer.cleanup() that shuts down training/eval reward buffers without blocking. |
| src/flow_factory/train.py | Wraps trainer.start() in a KeyboardInterrupt handler, calls trainer.cleanup(), then os._exit(0). |
| src/flow_factory/rewards/reward_processor.py | Adds RewardBuffer.shutdown() to terminate the async ThreadPoolExecutor without reinitialization. |
| src/flow_factory/cli.py | Treats SIGINT-style subprocess termination as a clean interrupt to avoid noisy tracebacks and ensure exit(0). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Before this change, os._exit(0) bypassed all atexit handlers, causing wandb/swanlab to lose unsynced metrics. Now cleanup() calls logger.finish() before exit, allowing data sync to complete. Double Ctrl+C still force-exits via try/finally + os._exit(0). - Add Logger.finish() base method (cleans up temp files) - Override in WandbLogger: calls wandb.finish(quiet=True) - Override in SwanlabLogger: calls swanlab.finish() - Override in TensorboardLogger: calls self.platform.close() - BaseTrainer.cleanup(): call self.logger.finish() after executor shutdown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4928bc2 to
c10f9b1
Compare
wandb.finish() did not reliably sync data during interrupt. Revert to keep the scope focused on graceful GPU process exit only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves Ctrl+C handling during training to avoid orphaned/zombie GPU processes by introducing explicit non-blocking shutdown hooks for async reward workers and handling SIGINT exit paths more deliberately in both the training entrypoint and CLI launcher.
Changes:
- Add
RewardBuffer.shutdown()to stop the reward async executor without waiting/reinitializing. - Add
BaseTrainer.cleanup()to trigger reward-buffer shutdown on interruption. - Catch
KeyboardInterruptintrain.pyand force-exit, plus treat SIGINT returns as clean exits incli.py.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/flow_factory/trainers/abc.py |
Adds BaseTrainer.cleanup() to shutdown reward buffers on interrupt. |
src/flow_factory/train.py |
Wraps training start in KeyboardInterrupt handling and force-exits after cleanup. |
src/flow_factory/rewards/reward_processor.py |
Adds RewardBuffer.shutdown() for non-blocking executor termination. |
src/flow_factory/cli.py |
Treats SIGINT/KeyboardInterrupt from the training subprocess as a clean exit. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| trainer.cleanup() | ||
| finally: | ||
| os._exit(0) |
| trainer = load_trainer(config) | ||
| trainer.start() | ||
|
|
||
| try: | ||
| trainer.start() |
If Ctrl+C arrives during load_trainer() (e.g. while loading large models), the ThreadPoolExecutor may already exist but cleanup() would never run. Widen the try/except to include trainer creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Improves Ctrl+C (SIGINT) handling during training to avoid orphaned processes holding GPU memory, primarily by proactively shutting down async reward worker executors and exiting quickly.
Changes:
- Add a
RewardBuffer.shutdown()method for fast, non-blocking executor teardown on interruption. - Introduce
BaseTrainer.cleanup()to shut down reward buffers’ async executors. - Add SIGINT/KeyboardInterrupt handling in
train.py(including forced exit) and incli.pyto treat Ctrl+C as a clean termination.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/flow_factory/trainers/abc.py |
Adds BaseTrainer.cleanup() to shut down reward buffers’ async executors. |
src/flow_factory/train.py |
Wraps trainer.start() with KeyboardInterrupt handling and forces immediate process exit. |
src/flow_factory/rewards/reward_processor.py |
Adds RewardBuffer.shutdown() for final teardown of the thread pool executor. |
src/flow_factory/cli.py |
Treats SIGINT-like subprocess termination and KeyboardInterrupt as clean exits. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except KeyboardInterrupt: | ||
| if local_rank == 0: | ||
| logger.info("Training interrupted. Cleaning up...") | ||
| try: | ||
| if trainer is not None: | ||
| trainer.cleanup() | ||
| finally: | ||
| os._exit(0) |
| sys.exit(0) | ||
| raise | ||
| except KeyboardInterrupt: | ||
| logger.info("Training interrupted.") | ||
| sys.exit(0) |
Exit code 130 (128 + SIGINT) is the standard convention for processes interrupted by Ctrl+C. Using 0 would mask the interrupt from shell scripts and cluster schedulers (SLURM, K8s) that rely on exit codes to distinguish successful completion from user cancellation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e953d99 to
2a54b8f
Compare
Resolve conflicts between CRD trainer (this PR) and DGPO trainer (X-GenGroup#133) added to main since branching: - training_args.py: Place DGPOTrainingArguments before CRDTrainingArguments, each class retains its own fields and defaults - algorithms.md: Adopt main's reference numbering, add CRD as ref [13] - crd.py: Add _maybe_offload_samples_to_cpu() call matching the pattern established in GRPO/GRPOGuard on main (PR X-GenGroup#149) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…PU processes (X-GenGroup#149) * fix: add graceful shutdown on Ctrl+C to prevent zombie GPU processes When users press Ctrl+C during multi-GPU training, worker processes could become orphaned while still holding GPU memory. The root cause: the RewardBuffer's ThreadPoolExecutor (holding GPU references via CUDA streams) was never shut down on interrupt, and NCCL's C++ heartbeat threads kept running during Python's normal shutdown sequence. Changes: - Add RewardBuffer.shutdown() for non-blocking executor termination - Add BaseTrainer.cleanup() that calls shutdown on reward buffers - Wrap trainer.start() with KeyboardInterrupt handler in train.py - Use os._exit(0) after cleanup to bypass NCCL heartbeat noise - Catch SIGINT exit codes in cli.py for clean subprocess teardown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: ensure os._exit runs even if cleanup raises, fix docstring Address review feedback: - Wrap trainer.cleanup() in try/finally so os._exit(0) is guaranteed to execute even if cleanup fails on partially-initialized state - Correct cleanup() docstring to reflect non-blocking semantics (shutdown signals threads but os._exit does the actual termination) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: finalize logger on Ctrl+C to preserve wandb/swanlab data Before this change, os._exit(0) bypassed all atexit handlers, causing wandb/swanlab to lose unsynced metrics. Now cleanup() calls logger.finish() before exit, allowing data sync to complete. Double Ctrl+C still force-exits via try/finally + os._exit(0). - Add Logger.finish() base method (cleans up temp files) - Override in WandbLogger: calls wandb.finish(quiet=True) - Override in SwanlabLogger: calls swanlab.finish() - Override in TensorboardLogger: calls self.platform.close() - BaseTrainer.cleanup(): call self.logger.finish() after executor shutdown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: remove logger finalization on Ctrl+C wandb.finish() did not reliably sync data during interrupt. Revert to keep the scope focused on graceful GPU process exit only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cover trainer construction in KeyboardInterrupt handler If Ctrl+C arrives during load_trainer() (e.g. while loading large models), the ThreadPoolExecutor may already exist but cleanup() would never run. Widen the try/except to include trainer creation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move success log inside try block for cleaner control flow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update * fix: use exit code 130 for SIGINT in cli launcher Exit code 130 (128 + SIGINT) is the standard convention for processes interrupted by Ctrl+C. Using 0 would mask the interrupt from shell scripts and cluster schedulers (SLURM, K8s) that rely on exit codes to distinguish successful completion from user cancellation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
RewardBuffer.shutdown()for non-blocking executor termination on interruptBaseTrainer.cleanup()that calls shutdown on reward bufferstrainer.start()withKeyboardInterrupthandler intrain.py, usingos._exit(0)to bypass NCCL heartbeat noisecli.pyfor clean subprocess teardownContext
When users press Ctrl+C during multi-GPU training, worker processes could become orphaned holding GPU memory. Root cause: the
RewardBuffer'sThreadPoolExecutor(holding GPU references via CUDA streams) was never shut down on interrupt, and NCCL's C++ heartbeat threads kept running during Python's normal shutdown sequence.Test plan
ff-train, press Ctrl+C during evaluation/samplingnvidia-smishows no zombie processes after exit🤖 Generated with Claude Code