Skip to content

[Phase 1.4] Support post-stratification via user-supplied target population distribution #64

@smjenness

Description

@smjenness

Context

build_netstats() currently assembles a synthetic target population from a mix of reference sources:

  • age: NCHS 2020 general-population pyramid (not MSM-specific)
  • race: ARTnetData::race.dist national or city-specific (not MSM-specific)
  • deg.casl: ARTnet sample's own deg.casl.dist
  • deg.main: ARTnet sample's own deg.main.dist
  • role.class: ARTnet sample distribution
  • risk.grp: uniform (5 equal quintiles)

This is a patchwork — no single coherent post-stratification target. If a user wants to parametrize for, say, CDC NHBS 2023 MSM demographics, there is no clean API to do so.

Proposed approach

Add a target_pop argument to build_netstats() that accepts either:

  1. A named list of marginal distributions (current behavior as default):

    target_pop = list(
      age.pyramid  = full.age.pyr,         # length = nAges
      race.props   = c(Black=0.15, Hispanic=0.20, White.Other=0.65),
      deg.casl     = c(0.45, 0.30, 0.15, 0.10),
      deg.main     = c(0.60, 0.35, 0.05),
      role.class   = c(0.18, 0.27, 0.55),
      risk.grp     = rep(0.2, 5)
    )
  2. A pre-built data frame of synthetic respondents (user has their own joint distribution):

    target_pop = my_synthetic_pop  # data.frame with age, race, deg.casl, etc.
  3. A built-in reference (character flag):

    target_pop = 'nhbs_msm_2022'   # package-provided MSM demographics

For #3, we'd add built-in reference population data to ARTnetData (CDC NHBS or similar).

Tasks

  • Design target_pop argument API (three-option: list / data.frame / character).
  • Default behavior unchanged from current (patchwork references).
  • When user provides a joint data.frame, use it directly (skip sampling).
  • When user provides marginal distributions, sample independently (current behavior generalization).
  • Add at least one built-in reference population to ARTnetData (CDC NHBS MSM or similar) — coordinate with Sam on data source.
  • Document trade-offs: marginal resampling vs joint user-provided.
  • Unit tests covering each input form.

Acceptance criteria

  • build_netstats(..., target_pop = NULL) produces current output byte-identically.
  • build_netstats(..., target_pop = user_df) uses the user's joint distribution.
  • At least one built-in reference MSM population is available.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions