Skip to content

Joint dyad-level modeling: nodematch + absdiff (#63 phases 1 & 2)#69

Merged
smjenness merged 1 commit into
mainfrom
feature/joint-nodematch-absdiff
Apr 20, 2026
Merged

Joint dyad-level modeling: nodematch + absdiff (#63 phases 1 & 2)#69
smjenness merged 1 commit into
mainfrom
feature/joint-nodematch-absdiff

Conversation

@smjenness
Copy link
Copy Markdown
Contributor

@smjenness smjenness commented Apr 20, 2026

Summary

Under method = "joint", nodematch_* and absdiff_* target statistics now come from per-ego g-computation on dyad-level joint GLMs, not from scaling univariate marginals by the new edge count. Partially addresses #63 (phases 1 and 2). Phase 3 (duration / dissolution with method flag) lands in a follow-up PR.

Default method = "existing" is byte-identical to pre-refactor — verified.

Approach (Option A per #63)

Fit on partnership-level data (lmain / lcasl / linst) with only ego attributes on the RHS (matching PR #68's synthetic-newdata structure). Aggregate per-ego predictions weighted by the joint-Poisson-predicted degree.

New additive outputs on netparams per layer when method = "joint":

Field Model
joint_nm_age_model glm(same.age.grp ~ age + race + hiv [+ geog], family = binomial)
joint_nm_race_model (race = TRUE only) glm(same.race ~ age + race + hiv [+ geog], family = binomial)
joint_absdiff_age_model glm(ad ~ age + race + hiv [+ geog], family = gaussian)
joint_absdiff_sqrtage_model glm(ad.sr ~ age + race + hiv [+ geog], family = gaussian)

All use the existing fit_joint_glm() helper with AIC-based interaction selection over {age.grp:race.cat.num}.

In build_netstats under method = "joint":

nodematch_<attr>[level] <- sum(pred_deg * pred_dyad)[ego in level] / 2
absdiff_age             <- sum(pred_deg * pred_ad_age) / 2

where pred_deg comes from the layer's joint Poisson and pred_dyad is the per-ego expected partnership property. Each edge double-counted (from both endpoints), divide by 2.

Empirical (Atlanta, race = TRUE, N = 10k)

All 12 dyad models converge without warnings, nobs 2557/5790/7462. Marginal recovery on training data is exact for every model (expected for GLMs with intercept).

AIC-selected interactionsage.grp:race.cat.num picked for 8/12 models. Race-conditional mixing structure is real and heterogeneous with age.

Layer × Model AIC Interactions kept
main × nm_age 3259
main × nm_race 2725 age:race
main × absdiff_age 16822
main × absdiff_sqrtage 4071
casl × nm_age 6973 age:race
casl × nm_race 7052 age:race
casl × absdiff_age 41027 age:race
casl × absdiff_sqrtage 12302 age:race
inst × nm_age 9315
inst × nm_race 9076 age:race
inst × absdiff_age 51882
inst × absdiff_sqrtage 15004

Target-stat shifts vs existing method:

Field Existing Joint Δ
main$edges 1990.3 1697.9 −14.7%
main$absdiff_age 10056 10592 +5.3%
main$absdiff_sqrt.age 848.0 863.2 +1.8%
main$nodematch_race_diffF 1518.9 1192.9 −21.5%

nodematch_race[r] per-race is largely consistent in absolute terms — the big drop in nodematch_race_diffF comes from the edges shift, not the race-match rate itself.

Validation

  • Backward-compat snapshot harness: default and explicit method = "existing" match 3/3 on all parameter sets (Atlanta+race, national no-geog, Atlanta no-race).
  • Identity: sum(nodematch_race) == nodematch_race_diffF holds to machine precision across all layers.
  • Bound: sum(nodematch_<attr>) <= edges verified for age.grp and race on all layers.
  • Joint ≠ existing: absdiff_age and nodematch_race_diffF both diverge >1% on main, confirming the refactor actually changes something.

Tests

tests/testthat/test-joint-dyad.R — 8 test blocks, 77 assertions:

  • Dyad models present when method = "joint", absent when "existing".
  • All dyad models converge with correct family.
  • Marginal recovery within 1%.
  • sum(nodematch_race) == nodematch_race_diffF.
  • sum(nodematch_<attr>) <= edges.
  • Joint differs from existing by > 1% on at least one key target.
  • race = FALSE path handles nm_race gracefully (no fit, no target stat).

Existing tests still pass: test-joint-model.R (77), test-joint-netstats.R (29). Total: 180/180.

What's still deferred

Test plan

  • Backward-compat snapshot match 3/3
  • All dyad models converge
  • Marginal recovery < 1% on training data
  • nodematch identity + nonneg bound
  • Joint ≠ existing (> 1% divergence on at least one target)
  • race = FALSE path works
  • Unit tests 180/180

Depends on #61, #62 (merged). Part of #63. Unblocks the duration PR + the EpiModelHIV-Template end-to-end.

Under method = "joint", nodematch_* and absdiff_* target statistics
are now produced via g-computation on dyad-level joint GLMs instead
of scaling univariate marginals by new edge counts. Option A per #63:
fit on partnership-level data (lmain/lcasl/linst) with ego attributes
only on the RHS; aggregate per-ego expected values weighted by the
joint-Poisson-predicted degree.

New per-layer additive outputs on netparams under method = "joint":
- joint_nm_age_model       : glm(same.age.grp ~ age + race + hiv [+ geog], binomial)
- joint_nm_race_model      : glm(same.race    ~ age + race + hiv [+ geog], binomial)  [race=TRUE only]
- joint_absdiff_age_model  : glm(ad    ~ age + race + hiv [+ geog], gaussian)
- joint_absdiff_sqrtage_model : glm(ad.sr ~ age + race + hiv [+ geog], gaussian)

All use the shared fit_joint_glm() helper with AIC-based interaction
selection over {age:race}.

In build_netstats under method = "joint":
- nodematch_<attr>[level] = sum(pred_deg * pred_dyad) over egos in level / 2
- nodematch_<attr>_diffF  = sum(pred_deg * pred_dyad) / 2
- absdiff_age / absdiff_sqrt.age = sum(pred_deg * pred_ad) / 2

Replaces the "half-joint" computations from PR #68
(new_edges * univariate_ratio) with fully joint aggregations.

Validation (Atlanta, race = TRUE, N = 10k):
- All 12 dyad models converge; marginal recovery exact on training
  data for all layers.
- AIC picks age.grp:race.cat.num for 8/12 models (race-related
  interactions are real).
- Target-stat shifts relative to existing:
  - main$absdiff_age: +5.3%
  - main$nodematch_race_diffF: -21.5%
  - main$nodematch_race[r]: <1% shift per r (race-conditional mixing
    probabilities carry through; the drop in diffF is driven by the
    drop in edges, not the match rate)
- sum(nodematch_race) == nodematch_race_diffF (identity holds)
- sum(nodematch_<attr>) <= edges (valid bound)

Backward-compat snapshot: default and explicit method = "existing"
match 3/3 on all parameter sets.

New tests: tests/testthat/test-joint-dyad.R (8 test blocks, 77
assertions) covering: presence/absence of dyad models by method,
convergence + family correctness, marginal recovery within 1%,
nodematch identity (sum = diffF), nodematch <= edges bound,
divergence vs existing method, race = FALSE path.

All 180 joint-related assertions across three test files pass.

Next PR: duration.method flag + weibull option (#63 phase 3).
Partially addresses #63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant