Skip to content

Debug only Matrix resizing bug. Tc threaded solver revamp#362

Merged
PDoakORNL merged 9 commits into
CompFUSE:masterfrom
PDoakORNL:tc-threaded-solver-revamp
Jun 3, 2026
Merged

Debug only Matrix resizing bug. Tc threaded solver revamp#362
PDoakORNL merged 9 commits into
CompFUSE:masterfrom
PDoakORNL:tc-threaded-solver-revamp

Conversation

@PDoakORNL

Copy link
Copy Markdown
Contributor

Fixed a rather devious Debug build only bug in the Matrix::resize method. The incorrect zeroing of matrix elements in a corner case of matrix growth resulted in unstable k expansions because of how this effected vertex additions in ctaux. The rather ominous comment on resizing in general still seems to hold, is long standing and should become an issue.

In the course of debugging found no unit tests for dca::phys::solver::ctaux::CT_AUX_HS_configuration and some odd behavior with respect to indexing (see get_first_non_interacting_spin_index()). I just generated unit tests to capture the current behavior. Right now there is only one caller n_tools.hpp and it does the right thing but the size check seems to serve more than on purpose in that code.

This PR is part of a series to return the tc tutorial to a working and easy to run state.

PDoakORNL added 9 commits May 22, 2026 22:42
- Change accumulators from 5 to 3
- Add shared-walk-and-accumulation-thread: true
- Update README to reflect 3 accumulators
…=true

The multi-thread per rank configuration (3+ walkers/accumulators with
shared-walk-and-accumulation-thread) triggers a code bug causing expansion
order explosion and chemical potential divergence. shared=false is being
removed from the codebase. The only viable working configuration is:
- walkers: 1
- accumulators: 1
- shared-walk-and-accumulation-thread: true
- MPI parallelism across ranks (mpiexec -n 4) for workstation runs
…_spin_index

Tests cover:
- Empty configuration returns 0
- All-annihilatable config returns size() sentinel
- Manually marking first entry non-annihilatable returns 0
- Inserting non-interacting vertex is found correctly

Uses existing G0Setup test fixture with bilayer lattice and StubRng.
- Fix critical memory-corruption bug in Matrix::resize debug path:
  The column-major indexing was using row-major stride
  (i * new_size.second + j instead of i + j * leadingDimension()),
  causing zeros to be written to the wrong locations and corrupting
  existing matrix data. Also expanded the zeroing to cover all newly
  exposed elements, not just the bottom-right corner.

  In CT-AUX, this corruption produced incorrect determinant ratios,
  which caused the expansion order (number of vertices k) to spike
  to nonphysical values, eventually grinding the solver to a halt.

- Update tutorial input templates (input_sp.json.in, input_tp.json.in)
  to use max-submatrix-size=256 instead of 16 for reasonable performance.

- Regenerate all preconfigured tutorial inputs from the updated templates.
Update both input templates (input_sp.json.in, input_tp.json.in) to use
4 walkers and 4 accumulators with shared-walk-and-accumulation-thread: true.
Regenerate all preconfigured tutorial inputs from the updated templates.
@PDoakORNL PDoakORNL requested a review from maierta May 28, 2026 22:39
@PDoakORNL PDoakORNL merged commit 20ec633 into CompFUSE:master Jun 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants