Skip to content

Master fault tolerance#1

Closed
PDoakORNL wants to merge 13 commits into
CompFUSE:masterfrom
PDoakORNL:master_fault_tolerance
Closed

Master fault tolerance#1
PDoakORNL wants to merge 13 commits into
CompFUSE:masterfrom
PDoakORNL:master_fault_tolerance

Conversation

@PDoakORNL

Copy link
Copy Markdown
Contributor

Return of the master fault tolerance branch.

Speaking with @gbalduzz further on this he thinks we should decide what changes we need to put this in without a runtime switch since it greatly reduces how difficult to understand CUDA failures are.

@ubulling

Copy link
Copy Markdown
Contributor

test this please

@jenkins-cscs

Copy link
Copy Markdown

Can I test this patch?

@PDoakORNL PDoakORNL closed this Aug 15, 2018
PDoakORNL added a commit that referenced this pull request Aug 3, 2020
PDoakORNL pushed a commit that referenced this pull request Jun 1, 2026
A non-existent output.directory previously surfaced only as a cryptic HDF5
crash. Add OutputParameters::validate(), called from Parameters::readInput
on first rank only, which throws std::invalid_argument with a clear message
naming the missing directory.

Validation is kept separate from OutputParameters::readWrite so the parsing
path stays disk-free and unit-testable; this also avoids needing the existing
ReadAll fixture to point at a real path. Follow-up PRs for bugs #2#4 are
expected to add validate() to other parameter sections under the same
convention.
PDoakORNL added a commit that referenced this pull request Jun 1, 2026
phys/parameters: throw on missing output directory (issue #300 bug #1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants