Skip to content

Use joint GLM g-computation for build_netstats target stats (#62)#68

Merged
smjenness merged 2 commits into
mainfrom
feature/joint-gcomp-netstats
Apr 20, 2026
Merged

Use joint GLM g-computation for build_netstats target stats (#62)#68
smjenness merged 2 commits into
mainfrom
feature/joint-gcomp-netstats

Conversation

@smjenness
Copy link
Copy Markdown
Contributor

@smjenness smjenness commented Apr 19, 2026

Summary

Adds method = c("existing", "joint") to build_netstats(). Under method = "joint", per-synthetic-node expected degree is predicted from the joint Poisson GLMs produced by build_netparams(..., method = "joint") in #61, and target statistics are aggregated via g-computation. This makes edges and every nodefactor target internally consistent by construction (sum(nodefactor_<attr>) == 2 * edges to machine precision).

Default method = "existing" is byte-identical to pre-refactor. Closes #62.

Scope note: This PR mixes surface area from #61 and #62. During review, PI pointed out that the concurrent target should also be joint-modeled — using a binomial GLM on the deg > 1 indicator with the same joint RHS as the Poisson. That fit logically belongs in build_netparams (#61 territory) but its use is in build_netstats (#62). Fixing it split across two branches would have been awkward, so the joint concurrent binomial is included here as commit cf558a6.

What method = "joint" does NOT yet cover — these target stats still use univariate marginals and will be handled in #63 (extended to include dissolution/duration per PI request):

  • nodematch_age.grp, nodematch_race, nodematch_race_diffF
  • absdiff_age, absdiff_sqrt.age
  • nodematch(role.class)
  • diss.homog, diss.byage (durations / dissolution coefficients)

Until #63 lands, running method = "joint" end-to-end through EpiModelHIV-Template would mix joint and univariate sources in the ERGM target-stat vector — coherent enough for unit validation but not meaningful for a production estimation run.

What's refactored under method = "joint"

Field Joint formula Model
edges sum(pred) / 2 Poisson joint
nodefactor_race sum(pred[race == r]) per level Poisson joint
nodefactor_age.grp sum(pred[age.grp == k]) per level Poisson joint
nodefactor_deg.casl (main) sum(pred[deg.casl == d]) per level Poisson joint
nodefactor_deg.main (casl) same pattern Poisson joint
nodefactor_deg.tot (inst) same pattern Poisson joint
nodefactor_diag.status sum(pred[diag.status == h]) per level Poisson joint
concurrent (main/casl) sum(pred_conc) over synthetic nodes Binomial joint on deg > 1
nodematch_*, absdiff_* rescaled with new edges/nodefactor (univariate ratio retained — see #63)

On concurrent specifically

First pass (commit 5ef9737) retained concurrent from the univariate binomial because the Poisson-derived P(deg > 1) = 1 - exp(-λ)(1+λ) was inflated ~5× by the truncation of deg.main at 2 and deg.casl at 3 in the training data. The Poisson rejection was correct but the conclusion was wrong: the right fix is to joint-model the binomial on the concurrency indicator, same RHS as the Poisson, different family. Commit cf558a6 does that:

  • fit_joint_poisson() generalized to fit_joint_glm(..., family = poisson())
  • Under method = "joint" in build_netparams, fit glm(deg.<layer>.conc ~ <joint terms>, family = binomial()) and store at netparams$<layer>$joint_concurrent_model (main and casl only — inst has no concurrent target)
  • Under method = "joint" in build_netstats, out$<layer>$concurrent = sum(predict(joint_concurrent_model, synth, type = "response"))

Empirical result (Atlanta, race = TRUE, N = 10k):

  • main: existing 97.09 → joint 95.41 (-1.7%)
  • casl: existing 1456.31 → joint 1614.05 (+10.8%)

Both in reasonable ranges — no Poisson-style inflation. Main barely moves because the univariate intercept-only fit already captures the population-average rate well; casl moves more because the AIC-selected age.grp:deg.main interaction captures real heterogeneity the univariate can't.

Validation

Backward-compat snapshot harness (extended with netstats_extra + symmetric method =):

Call Result
compare_to_snapshot() (default) ALL MATCH 3/3
compare_to_snapshot(method = "existing") (now forwards to both) ALL MATCH 3/3

Internal consistency (Atlanta, race = TRUE, N = 10k):

| Layer | |2*edges − Σnf_race| (existing) | under joint |
|---|---:|---:|
| main | 537 | 0.0 |
| casl | 319 | 0.0 |
| inst | 70 | 0.0 |

Shifted target population (race.prop = c(0.35, 0.25, 0.40)):

Layer existing edges joint edges Δ
main 1990.3 1808.3 −9.1%
casl 2669.9 2929.2 +9.7%
inst 382.7 356.7 −6.8%

Tests

Known minor (pre-existing, not a regression)

Dissolution-coefficient formulas capture the caller's scope environment. Under joint, that scope includes the pred_deg_* / pred_conc_* vectors (~10k doubles × 5), which silently bloats a serialized netstats. Consumers only use $coef.diss, so this is cosmetic — worth a follow-up if on-disk size matters.

Test plan

Depends on #61 (merged). Unblocks #63, #64, #65.

smjenness and others added 2 commits April 19, 2026 15:59
Adds `method = c("existing", "joint")` to `build_netstats()`. Under
`method = "joint"`, expected per-synthetic-node degree is predicted from
the joint Poisson GLMs fit by `build_netparams(..., method = "joint")`,
and target statistics are aggregated via g-computation:

  edges                   = sum(pred) / 2
  nodefactor_race[r]      = sum(pred[race == r])
  nodefactor_age.grp[k]   = sum(pred[age.grp == k])
  nodefactor_deg.<x>[d]   = sum(pred[<x> == d])
  nodefactor_diag.status  = sum(pred[diag.status == h])

This makes edges and every nodefactor target stat internally consistent
by construction: `sum(nodefactor_<attr>) == 2 * edges` to machine
precision. Under the existing path, those sums can differ by hundreds of
edges once the target race or age distribution diverges from ARTnet's
(the `edges.avg` flag was a workaround for exactly this mismatch).

Scope kept to Poisson-degree target stats. Out of scope for #62 and
retained from the univariate marginals:
- nodematch_race / nodematch_age.grp / nodematch_race_diffF (#63)
- absdiff_age / absdiff_sqrt.age (downstream of edges)
- dissolution coefficients
- nodefactor_risk.grp (quantile-based, not a Poisson predictor)
- concurrent (Poisson-derived P(deg > 1) is biased by truncation of
  deg.main at 2 and deg.casl at 3 in the training data; the univariate
  binomial fit remains the right model for this probability)

Default `method = "existing"` is byte-identical to the pre-refactor
behavior, verified against the `inst/validation/` snapshot harness on
all three parameter sets.

The validation harness is extended with `netstats_extra` and a
convenience `method =` arg that forwards a single value to both
`build_netparams()` and `build_netstats()` (the common symmetric case).

Empirical findings (Atlanta, race = TRUE, network.size = 10k):
- Internal consistency: |2*edges - sum(nodefactor_*)| = 0 under joint
  across all three layers; 20-500 edge residuals under existing.
- Under a shifted race.prop = c(0.35, 0.25, 0.40) target population,
  joint and existing edges diverge by 7-10% per layer (main -9.1%,
  casl +9.7%, inst -6.8%) — the scenario the refactor is designed for.

New tests: tests/testthat/test-joint-netstats.R (6 test blocks, 27
assertions). Require ARTnetData; skipped silently otherwise. Seeded at
20260419 for reproducibility.

Closes #62.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first pass kept `concurrent` on the univariate binomial under
method = "joint" because I had initially reached for the Poisson-derived
P(deg > 1) and that is biased by the deg-truncation in the training
data. PI review correctly pointed out that the clean analogue is simply
to joint-model the BINOMIAL on the .conc indicator the same way we
joint-modeled the Poisson — same RHS, different family.

Changes:
- Generalize `fit_joint_poisson()` to `fit_joint_glm()` with a `family`
  argument (defaults to `poisson()`; unchanged for the Poisson call
  sites).
- Under `method = "joint"` in build_netparams, fit a second joint GLM
  per layer on `deg.main.conc` / `deg.casl.conc` with
  `family = binomial()`, storing at
  `netparams$<layer>$joint_concurrent_model`. No concurrent target for
  the one-off layer, so no binomial fit there.
- Under `method = "joint"` in build_netstats,
  `concurrent = sum(predict(joint_concurrent_model, synth, type = "response"))`
  for main and casl. Inactive-age nodes zeroed out under sex.cess.mod
  the same way degree predictions are.
- Drop the concurrent carve-out from the roxygen doc.
- Extend the validation harness `.strip_additive()` to also drop
  `joint_concurrent_model` before diffing.
- Tests: add convergence + marginal-recovery assertions for
  joint_concurrent_model in test-joint-model.R; replace the "concurrent
  preserved" assertion in test-joint-netstats.R with a sanity-range
  check (joint vs existing within 0.5x-2x, confirming no Poisson-style
  inflation).

Empirical check (Atlanta, race = TRUE, N = 10k):
- main: existing concurrent = 97.09, joint = 95.41 (-1.7%)
- casl: existing concurrent = 1456.31, joint = 1614.05 (+10.8%)
  Both in the reasonable range. The earlier Poisson-derived attempt
  would have given main concurrent ~5x higher, which the binomial
  correctly avoids.

Backward-compat snapshot match still 3/3 on default and
`method = "existing"`.

This commit mixes surface area from #61 (joint GLM fitting) and #62
(target-stat consumption) but the logic only makes sense as a unit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@smjenness smjenness merged commit 3e8a3ca into main Apr 20, 2026
@smjenness smjenness deleted the feature/joint-gcomp-netstats branch April 20, 2026 00:45
smjenness added a commit that referenced this pull request Apr 20, 2026
Under method = "joint", nodematch_* and absdiff_* target statistics
are now produced via g-computation on dyad-level joint GLMs instead
of scaling univariate marginals by new edge counts. Option A per #63:
fit on partnership-level data (lmain/lcasl/linst) with ego attributes
only on the RHS; aggregate per-ego expected values weighted by the
joint-Poisson-predicted degree.

New per-layer additive outputs on netparams under method = "joint":
- joint_nm_age_model       : glm(same.age.grp ~ age + race + hiv [+ geog], binomial)
- joint_nm_race_model      : glm(same.race    ~ age + race + hiv [+ geog], binomial)  [race=TRUE only]
- joint_absdiff_age_model  : glm(ad    ~ age + race + hiv [+ geog], gaussian)
- joint_absdiff_sqrtage_model : glm(ad.sr ~ age + race + hiv [+ geog], gaussian)

All use the shared fit_joint_glm() helper with AIC-based interaction
selection over {age:race}.

In build_netstats under method = "joint":
- nodematch_<attr>[level] = sum(pred_deg * pred_dyad) over egos in level / 2
- nodematch_<attr>_diffF  = sum(pred_deg * pred_dyad) / 2
- absdiff_age / absdiff_sqrt.age = sum(pred_deg * pred_ad) / 2

Replaces the "half-joint" computations from PR #68
(new_edges * univariate_ratio) with fully joint aggregations.

Validation (Atlanta, race = TRUE, N = 10k):
- All 12 dyad models converge; marginal recovery exact on training
  data for all layers.
- AIC picks age.grp:race.cat.num for 8/12 models (race-related
  interactions are real).
- Target-stat shifts relative to existing:
  - main$absdiff_age: +5.3%
  - main$nodematch_race_diffF: -21.5%
  - main$nodematch_race[r]: <1% shift per r (race-conditional mixing
    probabilities carry through; the drop in diffF is driven by the
    drop in edges, not the match rate)
- sum(nodematch_race) == nodematch_race_diffF (identity holds)
- sum(nodematch_<attr>) <= edges (valid bound)

Backward-compat snapshot: default and explicit method = "existing"
match 3/3 on all parameter sets.

New tests: tests/testthat/test-joint-dyad.R (8 test blocks, 77
assertions) covering: presence/absence of dyad models by method,
convergence + family correctness, marginal recovery within 1%,
nodematch identity (sum = diffF), nodematch <= edges bound,
divergence vs existing method, race = FALSE path.

All 180 joint-related assertions across three test files pass.

Next PR: duration.method flag + weibull option (#63 phase 3).
Partially addresses #63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
smjenness added a commit that referenced this pull request Apr 20, 2026
Under method = "joint", nodematch_* and absdiff_* target statistics
are now produced via g-computation on dyad-level joint GLMs instead
of scaling univariate marginals by new edge counts. Option A per #63:
fit on partnership-level data (lmain/lcasl/linst) with ego attributes
only on the RHS; aggregate per-ego expected values weighted by the
joint-Poisson-predicted degree.

New per-layer additive outputs on netparams under method = "joint":
- joint_nm_age_model       : glm(same.age.grp ~ age + race + hiv [+ geog], binomial)
- joint_nm_race_model      : glm(same.race    ~ age + race + hiv [+ geog], binomial)  [race=TRUE only]
- joint_absdiff_age_model  : glm(ad    ~ age + race + hiv [+ geog], gaussian)
- joint_absdiff_sqrtage_model : glm(ad.sr ~ age + race + hiv [+ geog], gaussian)

All use the shared fit_joint_glm() helper with AIC-based interaction
selection over {age:race}.

In build_netstats under method = "joint":
- nodematch_<attr>[level] = sum(pred_deg * pred_dyad) over egos in level / 2
- nodematch_<attr>_diffF  = sum(pred_deg * pred_dyad) / 2
- absdiff_age / absdiff_sqrt.age = sum(pred_deg * pred_ad) / 2

Replaces the "half-joint" computations from PR #68
(new_edges * univariate_ratio) with fully joint aggregations.

Validation (Atlanta, race = TRUE, N = 10k):
- All 12 dyad models converge; marginal recovery exact on training
  data for all layers.
- AIC picks age.grp:race.cat.num for 8/12 models (race-related
  interactions are real).
- Target-stat shifts relative to existing:
  - main$absdiff_age: +5.3%
  - main$nodematch_race_diffF: -21.5%
  - main$nodematch_race[r]: <1% shift per r (race-conditional mixing
    probabilities carry through; the drop in diffF is driven by the
    drop in edges, not the match rate)
- sum(nodematch_race) == nodematch_race_diffF (identity holds)
- sum(nodematch_<attr>) <= edges (valid bound)

Backward-compat snapshot: default and explicit method = "existing"
match 3/3 on all parameter sets.

New tests: tests/testthat/test-joint-dyad.R (8 test blocks, 77
assertions) covering: presence/absence of dyad models by method,
convergence + family correctness, marginal recovery within 1%,
nodematch identity (sum = diffF), nodematch <= edges bound,
divergence vs existing method, race = FALSE path.

All 180 joint-related assertions across three test files pass.

Next PR: duration.method flag + weibull option (#63 phase 3).
Partially addresses #63.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
smjenness added a commit that referenced this pull request Apr 21, 2026
Per PI + Steve Goodreau review on PR #71: the Weibull-fit mean
full-partnership duration can't be faithfully reproduced by a
geometric-dissolution TERGM. Under Weibull with k != 1, the mean
full duration and the mean cross-sectional age of extant ties
diverge; TERGM with `offset(edges)` can only match the latter (via
the inspection-paradox identity at exponential). Passing the
length-bias-corrected Weibull mean to `dissolution_coefs()` produces
a sim whose individual-partnership duration matches the Weibull but
whose cross-sectional age structure is mis-matched to reality by a
factor of (1 + CV^2)/2 — i.e., ~2x under k = 0.5.

Adding non-geometric dissolution to tergm is a real upstream project
(C-level ergm term that reads dynamic-network toggle history to get
partnership age), not in ARTnet's scope. So weibull_strat as a
production target was structurally wrong given the framework we're
stuck with.

Keeping it as a diagnostic-only option would have been defensible,
but a diagnostic nobody acts on is cruft in production code. Dropped
entirely. The remaining duration.method menu is:

- "empirical" (default) — unadjusted median age of extant ties,
  geometric transformation to mean.dur.adj.
- "joint_lm" — covariate-adjusted median age of extant ties
  (same target, regression-based), same geometric transformation.

Steve's reframe makes the menu read cleanly: both methods target
the same observable cross-sectional quantity that geometric TERGM
can honor. joint_lm is the covariate-adjusted version of empirical —
the direct duration analog of the marginal-vs-joint fix the rest of
#63 applies to formation stats.

Removed:
- fit_weibull_dur() helper and its optim-based length-biased MLE.
- weibull_strat branches in compute_alt_durs() and in the main/casl
  block override logic (use_direct_mean paths, weibull_shape attrs,
  survival package guard).
- Four weibull-specific test blocks in test-duration-method.R.
- stats::optim from @importFrom (no longer used).

Opened issue #73 to track the remaining joint_lm gap: the stratum
medians are currently computed by predicting the fitted log-duration
lm at ARTnet observations, not at the synthetic target population
constructed in build_netstats. This is "confounding-corrected within
ARTnet" but not fully g-computed. Same pattern as #68 closing the
loop on #61. Issue references point here and at the existing
roxygen for forward discoverability.

Full suite: 508/508 pass. R CMD check: 0/0/0. Backward-compat
snapshot: 3/3 on both default and explicit method = "existing".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
smjenness added a commit that referenced this pull request Apr 25, 2026
* Add duration.method flag with empirical / weibull_strat / joint_lm (#63 phase 3)

New `duration.method` argument to `build_netparams()`. Default
`"empirical"` preserves byte-identical behavior. Two alternatives:

- `"weibull_strat"`: per-stratum Weibull AFT fits via survival::survreg
  with proper right-censoring (ongoing partnerships = censored, completed
  = observed events). Uses more of the data than empirical (which only
  uses ongoing). Attaches the Weibull shape parameter k as a
  `"weibull_shape"` attribute on `durs.<layer>.byage` as a diagnostic
  for the constant-hazard assumption (k == 1 means geometric; the
  current ARTnet data gives k ~= 0.5 in every stratum, a clean
  rejection of the geometric assumption).
- `"joint_lm"`: log-linear regression `lm(log(duration.time) ~
  joint ego + partner + matching terms)` on ongoing partnerships.
  Stratum medians computed from model predictions. Fitted model stored
  at `netparams$<layer>$joint_duration_model` for use by future
  build_netstats refactors that want per-dyad predictions.

All three methods produce `durs.<layer>.{homog,byage}` data.frames with
identical shapes and column names so the geometric transformation
(`mean.dur.adj = 1/(1 - 2^(-1/median))`) and downstream
`dissolution_coefs()` pipeline are unchanged. TERGM parameterization
identical regardless of method.

Implementation is a minimal surgical override at the end of each
layer's empirical duration block in NetParams.R. When duration.method
!= "empirical", the stratum median / mean values are replaced with
model-based estimates and rates.*.adj / mean.dur.adj are recomputed.
Smoothing (smooth.main.dur) and sex.cess.mod logic apply uniformly
after the override. Per-stratum fits fall back to empirical when the
new method fails (sparse data, convergence issues).

New dependency: survival (base-R recommended package) added to
Suggests. The weibull_strat path guards with a requireNamespace
check and errors clearly if survival isn't installed.

Findings from Atlanta + race = TRUE (empirical vs weibull_strat vs
joint_lm on mean.dur.adj):

Main layer match.grp.1:    73 | 379 | 76
Main layer match.grp.5:   934 | 1,482,404 | 446
Casl layer nonmatch:      106 | 359 | 93
Casl layer match.grp.5:   150 | 1,157 | 129

Weibull shape (main): 0.48, 0.72, 0.60, 0.83, 0.35, 0.46 -- all well
below 1, consistent with decreasing hazard. The extreme Weibull
mean.dur.adj values in high-age-matched strata reflect extrapolation
under k << 1 on heavily censored data; the roxygen doc flags this
explicitly and recommends joint_lm as the more production-safe
non-default option.

New tests: tests/testthat/test-duration-method.R (8 blocks, 112
assertions). Covers: default is empirical; weibull_strat produces valid
durs shape + shape diagnostic attribute; joint_lm stores lm model;
output shape invariant across all three methods (for dissolution_coefs
compatibility); non-duration netparams fields preserved across
methods; weibull shapes are all < 1 (substantive finding locked in);
joint_lm composes cleanly with method = "joint" in build_netstats.

Backward-compat snapshot: default and explicit method = "existing"
match 3/3 on all parameter sets.

Full test suite: 563 assertions pass in 10.3s.
R CMD check: 0 errors / 0 warnings / 0 relevant notes
(one "unable to verify current time" environment note, unrelated).

Part of #63.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Length-bias-corrected Weibull MLE for weibull_strat (#71 review)

Replaces the naive survreg-based Weibull fit with a length-biased
MLE. ARTnet is a cross-sectional prevalence sample, not an incidence
cohort: duration.time for ongoing partnerships is the backward
recurrence time (age at inspection in a renewal process), not the
full partnership duration. Under stationarity and Weibull(k, lambda)
durations, the density of observed ages is:

  f_A(a) = S(a; k, lambda) / mu(k, lambda)

where S is the Weibull survival function and mu = lambda * Gamma(1 + 1/k)
is the Weibull mean. Maximizing sum log f_A(t_i) over ongoing obs via
optim() gives unbiased estimates of (k, lambda) of the underlying
full-duration distribution. The naive survreg treatment of the same
data interprets length-biased elapsed durations as incidence-cohort
survival times and produces catastrophically biased estimates (scale
parameter off by 3+ orders of magnitude on ARTnet, median
extrapolations to 15-20 year main partnerships in the oldest matched
age groups, etc.).

Completed partnerships (ongoing2 == 0) are deliberately not used here:
they are subject to a different truncation (past-12-month recall
window) that would require separate modeling. Restricting to ongoing
matches the convention of empirical and joint_lm.

Additionally, under weibull_strat, mean.dur.adj is now set directly
to the Weibull analytical mean (wt * lambda * Gamma(1 + 1/k)) rather
than being derived from median.dur via the geometric-distribution
formula. When k is far from 1, these diverge by orders of magnitude;
the geometric formula is inappropriate and was masking the underlying
fit quality.

Empirical comparison on Atlanta + race = TRUE, mean.dur.adj:

  MAIN        empirical  weibull_lb  joint_lm
  nonmatch      208.5      160.0      235.5
  matched.1      72.6       72.2       75.7
  matched.5     934.4     1368.4      445.9  (was 1,482,404 under naive)

  CASL        empirical  weibull_lb  joint_lm
  nonmatch      105.8       64.4       93.3
  matched.5     150.1       96.6      129.1  (was 1,157,200 under naive)

All three methods now produce values in the same order of magnitude.
Weibull shape parameters are also more informative:
  MAIN k per stratum: 0.68  1.03  1.96  2.62  2.84  4.96  (overall 0.63)
  CASL k per stratum: 0.62  0.68  0.55  0.59  0.53  0.59  (overall 0.57)

Casual partnerships show consistent decreasing hazard (k < 1) across
all strata. Main shows a more interesting pattern: non-matched and
youngest-matched near k = 1 (geometric-like), older-matched strata
with k > 1 (increasing hazard -- stable mature relationships with
break points).

Issue #72 opened for the broader length-bias and 5-partnership
truncation issue affecting formation stats.

Test: "all shapes < 1" assertion replaced with a generous sanity range
(0.1 <= k <= 20) per stratum and a narrower band around the overall
pooled k (0.3-1.2) that would catch a regression back to naive-fit
behavior. Full suite: 570/570 pass. R CMD check: 0/0/0 after adding
stats::optim to @importFrom.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop weibull_strat: diagnostic that geometric TERGM can't honor

Per PI + Steve Goodreau review on PR #71: the Weibull-fit mean
full-partnership duration can't be faithfully reproduced by a
geometric-dissolution TERGM. Under Weibull with k != 1, the mean
full duration and the mean cross-sectional age of extant ties
diverge; TERGM with `offset(edges)` can only match the latter (via
the inspection-paradox identity at exponential). Passing the
length-bias-corrected Weibull mean to `dissolution_coefs()` produces
a sim whose individual-partnership duration matches the Weibull but
whose cross-sectional age structure is mis-matched to reality by a
factor of (1 + CV^2)/2 — i.e., ~2x under k = 0.5.

Adding non-geometric dissolution to tergm is a real upstream project
(C-level ergm term that reads dynamic-network toggle history to get
partnership age), not in ARTnet's scope. So weibull_strat as a
production target was structurally wrong given the framework we're
stuck with.

Keeping it as a diagnostic-only option would have been defensible,
but a diagnostic nobody acts on is cruft in production code. Dropped
entirely. The remaining duration.method menu is:

- "empirical" (default) — unadjusted median age of extant ties,
  geometric transformation to mean.dur.adj.
- "joint_lm" — covariate-adjusted median age of extant ties
  (same target, regression-based), same geometric transformation.

Steve's reframe makes the menu read cleanly: both methods target
the same observable cross-sectional quantity that geometric TERGM
can honor. joint_lm is the covariate-adjusted version of empirical —
the direct duration analog of the marginal-vs-joint fix the rest of
#63 applies to formation stats.

Removed:
- fit_weibull_dur() helper and its optim-based length-biased MLE.
- weibull_strat branches in compute_alt_durs() and in the main/casl
  block override logic (use_direct_mean paths, weibull_shape attrs,
  survival package guard).
- Four weibull-specific test blocks in test-duration-method.R.
- stats::optim from @importFrom (no longer used).

Opened issue #73 to track the remaining joint_lm gap: the stratum
medians are currently computed by predicting the fitted log-duration
lm at ARTnet observations, not at the synthetic target population
constructed in build_netstats. This is "confounding-corrected within
ARTnet" but not fully g-computed. Same pattern as #68 closing the
loop on #61. Issue references point here and at the existing
roxygen for forward discoverability.

Full suite: 508/508 pass. R CMD check: 0/0/0. Backward-compat
snapshot: 3/3 on both default and explicit method = "existing".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update gitignore

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 1.2] Refactor build_netstats() to use g-computation predictions from joint model

1 participant