Joint dyad-level modeling: nodematch + absdiff (#63 phases 1 & 2)#69
Merged
Conversation
Under method = "joint", nodematch_* and absdiff_* target statistics are now produced via g-computation on dyad-level joint GLMs instead of scaling univariate marginals by new edge counts. Option A per #63: fit on partnership-level data (lmain/lcasl/linst) with ego attributes only on the RHS; aggregate per-ego expected values weighted by the joint-Poisson-predicted degree. New per-layer additive outputs on netparams under method = "joint": - joint_nm_age_model : glm(same.age.grp ~ age + race + hiv [+ geog], binomial) - joint_nm_race_model : glm(same.race ~ age + race + hiv [+ geog], binomial) [race=TRUE only] - joint_absdiff_age_model : glm(ad ~ age + race + hiv [+ geog], gaussian) - joint_absdiff_sqrtage_model : glm(ad.sr ~ age + race + hiv [+ geog], gaussian) All use the shared fit_joint_glm() helper with AIC-based interaction selection over {age:race}. In build_netstats under method = "joint": - nodematch_<attr>[level] = sum(pred_deg * pred_dyad) over egos in level / 2 - nodematch_<attr>_diffF = sum(pred_deg * pred_dyad) / 2 - absdiff_age / absdiff_sqrt.age = sum(pred_deg * pred_ad) / 2 Replaces the "half-joint" computations from PR #68 (new_edges * univariate_ratio) with fully joint aggregations. Validation (Atlanta, race = TRUE, N = 10k): - All 12 dyad models converge; marginal recovery exact on training data for all layers. - AIC picks age.grp:race.cat.num for 8/12 models (race-related interactions are real). - Target-stat shifts relative to existing: - main$absdiff_age: +5.3% - main$nodematch_race_diffF: -21.5% - main$nodematch_race[r]: <1% shift per r (race-conditional mixing probabilities carry through; the drop in diffF is driven by the drop in edges, not the match rate) - sum(nodematch_race) == nodematch_race_diffF (identity holds) - sum(nodematch_<attr>) <= edges (valid bound) Backward-compat snapshot: default and explicit method = "existing" match 3/3 on all parameter sets. New tests: tests/testthat/test-joint-dyad.R (8 test blocks, 77 assertions) covering: presence/absence of dyad models by method, convergence + family correctness, marginal recovery within 1%, nodematch identity (sum = diffF), nodematch <= edges bound, divergence vs existing method, race = FALSE path. All 180 joint-related assertions across three test files pass. Next PR: duration.method flag + weibull option (#63 phase 3). Partially addresses #63. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 20, 2026
Closed
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Under
method = "joint",nodematch_*andabsdiff_*target statistics now come from per-ego g-computation on dyad-level joint GLMs, not from scaling univariate marginals by the new edge count. Partially addresses #63 (phases 1 and 2). Phase 3 (duration / dissolution with method flag) lands in a follow-up PR.Default
method = "existing"is byte-identical to pre-refactor — verified.Approach (Option A per #63)
Fit on partnership-level data (
lmain/lcasl/linst) with only ego attributes on the RHS (matching PR #68's synthetic-newdata structure). Aggregate per-ego predictions weighted by the joint-Poisson-predicted degree.New additive outputs on
netparamsper layer whenmethod = "joint":joint_nm_age_modelglm(same.age.grp ~ age + race + hiv [+ geog], family = binomial)joint_nm_race_model(race = TRUE only)glm(same.race ~ age + race + hiv [+ geog], family = binomial)joint_absdiff_age_modelglm(ad ~ age + race + hiv [+ geog], family = gaussian)joint_absdiff_sqrtage_modelglm(ad.sr ~ age + race + hiv [+ geog], family = gaussian)All use the existing
fit_joint_glm()helper with AIC-based interaction selection over{age.grp:race.cat.num}.In
build_netstatsundermethod = "joint":where
pred_degcomes from the layer's joint Poisson andpred_dyadis the per-ego expected partnership property. Each edge double-counted (from both endpoints), divide by 2.Empirical (Atlanta, race = TRUE, N = 10k)
All 12 dyad models converge without warnings, nobs 2557/5790/7462. Marginal recovery on training data is exact for every model (expected for GLMs with intercept).
AIC-selected interactions —
age.grp:race.cat.numpicked for 8/12 models. Race-conditional mixing structure is real and heterogeneous with age.Target-stat shifts vs existing method:
main$edgesmain$absdiff_agemain$absdiff_sqrt.agemain$nodematch_race_diffFnodematch_race[r]per-race is largely consistent in absolute terms — the big drop innodematch_race_diffFcomes from the edges shift, not the race-match rate itself.Validation
method = "existing"match 3/3 on all parameter sets (Atlanta+race, national no-geog, Atlanta no-race).sum(nodematch_race) == nodematch_race_diffFholds to machine precision across all layers.sum(nodematch_<attr>) <= edgesverified for age.grp and race on all layers.Tests
tests/testthat/test-joint-dyad.R— 8 test blocks, 77 assertions:method = "joint", absent when"existing".sum(nodematch_race) == nodematch_race_diffF.sum(nodematch_<attr>) <= edges.race = FALSEpath handles nm_race gracefully (no fit, no target stat).Existing tests still pass:
test-joint-model.R(77),test-joint-netstats.R(29). Total: 180/180.What's still deferred
duration.methodflag with options{"empirical", "weibull_strat", "joint_lm"}, default"empirical"(byte-identical). See discussion on [Phase 1.3] Joint modeling for nodematch + absdiff + dissolution (dyad-level target stats) #63.Test plan
race = FALSEpath worksDepends on #61, #62 (merged). Part of #63. Unblocks the duration PR + the EpiModelHIV-Template end-to-end.