Add Antweiler-Freyberger (2025) iterative quadrature estimator by hmgaudecker · Pull Request #89 · OpenSourceEconomics/skillmodels

hmgaudecker · 2026-04-15T12:47:32Z

Summary

New af/ subpackage implementing the Antweiler & Freyberger (2025) estimator as an alternative to the CHS Kalman filter
Same ModelSpec interface — users switch estimator by calling estimate_af() instead of get_maximization_inputs() + om.maximize()
Period-by-period MLE with Halton quadrature, JAX AD for gradients, LogSumExp for numerical stability
Supports arbitrary factor counts, log_ces/linear/translog transitions, endogenous factors via explicit investment equation
AF and CHS agree closely on both measurement and transition parameters (tested on synthetic data and MODEL2)
Common get_filtered_states() interface: pass af_result= for AF posterior states, omit for CHS filtered states

What's done

Core estimation: estimate_af(model_spec, data, af_options, start_params) → AFEstimationResult
Initial period: mixture-of-normals + measurement system via 1D/KD quadrature
Transition periods: triple integral (state nodes × investment shocks × production shocks) with previous-period conditioning
Transition constraints: ProbabilityConstraint for log_ces gammas, satisfied at start values
Investment equation: I = β₀ + β₁θ + β₂Y + σ_I ε for endogenous factors
State propagation: quadrature-based moment matching between periods
start_params support: user-supplied starting values override heuristic defaults
Posterior states: get_filtered_states(model_spec, data, params, af_result=result) computes quadrature-based posterior means per individual/period
10 tests (9 regular + 1 long_running MODEL2 comparison)

Still to do

Score-based bootstrap for standard errors (AF paper Section 4.2) — postponed

Test plan

pixi run -e tests-cpu tests — 399 passed, 1 deselected (long_running)
pixi run ty — all checks passed
prek run --all-files — all passed
pytest -m long_running — MODEL2 AF vs CHS comparison (both estimators optimised from same naive start values)

🤖 Generated with Claude Code

New af/ subpackage implementing period-by-period MLE with Halton quadrature as an alternative to the CHS Kalman filter estimator. Same ModelSpec interface, JAX AD for gradients, arbitrary factor count. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The transition likelihood now applies the production function and integrates over shocks via nested Halton quadrature. Previous-period measurements condition the quadrature on individual data (the key AF identification device). State propagation uses quadrature-based moment matching. New tests verify transition parameter recovery and AF-vs-CHS agreement on both measurement and transition parameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Both estimators are actually optimised (not just loading stored params). Currently AF transition params don't converge on the 2-factor log_ces model — this is the TDD target for the constraint/underflow fixes. Skipped in CI via `long_running` marker; run with `-m long_running`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Both estimators now start from: loadings=1, controls=0, everything else=0.5, probability constraints satisfied with equal shares. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Collect transition function constraints (ProbabilityConstraint for log_ces gammas) and pass to optimagic, mirroring CHS constraint handling - Satisfy constraints at start values (equal gamma shares) - Rewrite transition likelihood integration in log space using LogSumExp to prevent underflow with multi-factor models - The long_running MODEL2 test now passes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Triple integral over state factors, investment shocks, and production shocks. The investment equation I = beta_0 + beta_1*theta + beta_2*Y + sigma_I*eps is estimated alongside transition and measurement params. Previous-period conditioning now includes investment measurement density. ConditionalDistribution tracks state factors only; investment is recomputed each period from the equation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Users can pass a DataFrame of starting values to estimate_af(). Matching index entries override heuristic defaults; unmatched and fixed parameters are left unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-15T12:54:42Z

Codecov Report

❌ Patch coverage is 84.95995% with 507 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.11%. Comparing base (2d56c8e) to head (4da39c1).

Files with missing lines	Patch %	Lines
...matlab_ces_repro/test_matlab_loglike_comparison.py	16.35%	133 Missing ⚠️
tests/matlab_ces_repro/matlab_mapping.py	39.26%	116 Missing ⚠️
tests/matlab_ces_repro/evaluate.py	15.30%	83 Missing ⚠️
tests/matlab_ces_repro/model_specs.py	41.55%	45 Missing ⚠️
src/skillmodels/af/inference.py	92.08%	33 Missing ⚠️
tests/matlab_ces_repro/test_af_matlab_repro.py	56.45%	27 Missing ⚠️
src/skillmodels/transition_functions.py	36.36%	14 Missing ⚠️
tests/matlab_ces_repro/test_matlab_mapping.py	68.18%	14 Missing ⚠️
src/skillmodels/af/transition_period.py	95.85%	10 Missing ⚠️
src/skillmodels/af/validate.py	72.41%	8 Missing ⚠️
... and 8 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #89      +/-   ##
==========================================
- Coverage   96.91%   92.11%   -4.80%     
==========================================
  Files          57       81      +24     
  Lines        4952     8319    +3367     
==========================================
+ Hits         4799     7663    +2864     
- Misses        153      656     +503

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Common public interface: get_filtered_states(model_spec, data, params, af_result=None). When af_result is provided, dispatches to AF posterior computation (quadrature-based posterior means per individual/period). Internally uses af/posterior_states.py. Returns "unanchored_states" matching the CHS output format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hmgaudecker · 2026-04-15T17:01:04Z

Code review

Found 2 issues:

_extract_period_measurement_info in posterior_states.py only reads the "constant" control coefficient, ignoring all other control variables. For models with non-constant controls (e.g. MODEL2 with x1), posterior state means will be biased because the control contribution to measurement residuals is incomplete. The test test_af_get_filtered_states uses a model without controls, so this is not caught.

skillmodels/src/skillmodels/af/posterior_states.py

Lines 151 to 158 in 766ad09

    
           ctrl_list = [ 
        
               float(period_params.loc[loc, "value"])  # ty: ignore[invalid-argument-type] 
        
               if (loc := ("controls", period, meas, "constant")) in period_params.index 
        
               else 0.0 
        
               for meas in all_measures 
        
           ]

Distribution propagation in estimate_transition_period uses obs_factor_values[0] (the first individual's observed factor values) when constructing the state_only_transition wrapper for moment matching. For models with individual-specific observed factors, this uses one person's values for the population-level distribution update.

skillmodels/src/skillmodels/af/transition_period.py

Lines 246 to 250 in 766ad09

    
           def state_only_transition(state_factors_val: Array, params: Array) -> Array: 
        
               """Transition wrapper that fills in mean investment + observed.""" 
        
               full = jnp.concatenate([state_factors_val, mean_inv, obs_factor_values[0]]) 
        
               return combined_transition(full, params)

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

1. Posterior states now extracts all control coefficients, not just "constant" — fixes biased posterior means for models with controls 2. Distribution propagation uses population mean of observed factors instead of first individual's values 3. AFEstimationResult.model_spec typed as ModelSpec (was Any) 4. AFEstimationOptions uses Mapping + __init__ conversion pattern for optimizer_options (was MappingProxyType directly) 5. Remove redundant "loadings_flat" key from _parse_initial_params Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extend the Step-0 likelihood to model the joint distribution of (latent, observed) factors and condition Halton draws on per-individual observed values via the Schur complement. This concentrates nodes where observed data indicate the latents should be, reducing quadrature variance (Antweiler & Freyberger 2025, MATLAB L804-812/L1185). Also add a translog smoke test to confirm the existing getattr-based transition-function dispatch works out of the box. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Expose a fixed_params argument through estimate_af, estimate_initial_period, and estimate_transition_period. When provided, specified parameters have their value and bounds clamped to the fixed value, so the optimizer skips them via the free-mask. Primary use case: pin time-invariant latent factors (e.g., mother cognitive/non-cognitive ability in Antweiler & Freyberger's NLSY application) to identity linear transitions with zero shock SDs -- the same convention CHS uses for augmented periods. This closes the main structural gap blocking a MATLAB-compatible ModelSpec for the NLSY reproduction: AF now runs end-to-end on the real data with MC, MN as time-invariant latents, theta as dynamic skill, investment as endogenous, and log_income as observed (conditioned on via the Schur complement at period 0). Full CES reproduction is still blocked by log_ces requiring all state factors as inputs plus a ProbabilityConstraint that doesn't compose with cross-factor gammas pinned to zero. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hmgaudecker · 2026-04-22T05:53:08Z

Update — income-conditional initial draws, translog, and time-invariant latents

Three rounds of improvements since the last review, ending at commit e5b9176.

What changed

Income-conditional initial draws (Schur complement) — when observed_factors is non-empty, Step 0 now models the joint (latent, observed) distribution and per-individual conditions Halton draws on observed values. Parses into marginal × conditional via Σ_{θY} Σ_{YY}⁻¹, concentrating nodes where the likelihood has mass. Matches the variance-reduction trick in Antweiler-Freyberger's MATLAB (L804-812, L1185). Back-compat preserved via fast path when n_obs_factors == 0.
Translog transition — added a smoke test confirming the existing getattr(transition_functions, name) dispatch just works for "translog". No core changes needed.
fixed_params argument — new optional DataFrame that clamps value + bounds for specified parameters, so the optimizer skips them via the free-mask. Primary use case: pin time-invariant latent factors to identity linear transitions with zero shock SDs (same convention CHS uses for augmented periods).
MATLAB reproduction scaffolding — loaded NLSY complete_7_9_11.xls via libreoffice CSV conversion, built a ModelSpec matching the MATLAB structure (theta dynamic, MC/MN time-invariant latents, investment endogenous, log_income observed with Schur conditioning). AF now runs end-to-end on 1403 cases across 3 periods with this setup — the full structural pipeline works on real data.

Remaining gap for full MATLAB reproduction

MATLAB's CES production is 2-dim in (theta, investment); our log_ces takes ALL state factors with a ProbabilityConstraint on gammas. Pinning cross-factor gammas (mc, mn, log_income) to 0 via fixed_params breaks the constraint selector in optimagic (selected params must remain free). To fully match the MATLAB CES, skillmodels would need either (a) an "input factors" concept per transition, or (b) a custom CES-on-two-inputs transition function. Left as a follow-up.

Validation

All 401 unit tests pass (pixi run -e tests-cpu tests).
pixi run ty clean.
prek run --all-files clean.
Three new end-to-end tests added:
- test_af_estimate_with_translog — translog runs and recovers linear coefficient from linear DGP.
- test_af_joint_initial_distribution_with_observed_factor — verifies initial_states includes observed factors and recovers positive skill-income cross-covariance.
- test_af_fixed_params_pins_time_invariant_latent — verifies pinned MC-style factors keep identity transitions and near-zero shock SDs after optimization.

Files touched

src/skillmodels/af/{params,initial_period,likelihood,estimate,transition_period}.py, tests/test_af_estimate.py.

🤖 Generated with Claude Code

…s to CHS. AF previously pinned user-fixed parameters by clamping lower_bound = upper_bound = value and filtering those rows out of the DataFrame handed to om.minimize. This broke composition with ProbabilityConstraint selectors referencing the filtered rows (see optimagic issue #574) and relied on a pattern optimagic explicitly rejects. Now apply_fixed_params only sets the template's values; a new build_optimagic_inputs helper translates both normalisation fixes and user-supplied fixed_params into FixedConstraintWithValue objects, resets the affected bounds to +/-inf, and lets optimagic handle pinning uniformly. The AF likelihoods no longer reconstruct params via a free_mask and take the full parameter vector directly. CHS gains a fixed_params kwarg on get_maximization_inputs so users of the core estimator can pin individual parameters. Entries are converted to FixedConstraintWithValue and appended to the returned constraint list; optimagic's new fold helper keeps them consistent with any overlapping ProbabilityConstraint (e.g. a log_ces gamma). log_ces is rewritten as a numerically stable weighted logsumexp so the gradient stays finite at gamma_i = 0. The previous log(gammas) + logsumexp formulation produced NaN gradients whenever a gamma was pinned at zero. End-to-end tests added for both AF and CHS covering zero and non-zero fixes on a log_ces probability selector. Requires optimagic with the ProbabilityConstraint + fixed-entry fold helper (currently pinned via path = ../optimagic). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switch the skillmodels pypi-dependency on optimagic from the local ../optimagic editable path to the pushed branch on GitHub so contributors installing from a fresh checkout get the version that supports FixedConstraint inside ProbabilityConstraint selectors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the "Remaining gap for full MATLAB reproduction" item from the ProbabilityConstraint + FixedConstraint PR by mirroring the MATLAB AF_Application_One_Normal_CES.m and _Translog.m runs in skillmodels: - tests/matlab_ces_repro/load_cnlsy.py reads complete_7_9_11.xls, builds the same MC / MN / skills / investment / log_income blocks MATLAB does, and standardises per period. - tests/matlab_ces_repro/matlab_mapping.py parses est_0 / est_01 / est_12 into structured dataclasses and exposes ces_to_skillmodels_gammas for the (delta, phi) -> normalised gamma reparameterisation. - tests/matlab_ces_repro/model_specs.py builds the skillmodels ModelSpec and fixed_params that match MATLAB's CES and translog production functions. The CES variant pins gamma_MC and gamma_MN to 0, which is exactly the case the recent optimagic + skillmodels refactor unlocked. - tests/matlab_ces_repro/test_af_matlab_repro.py runs both variants end-to-end. Smoke tests (integration + long_running, 20 Halton nodes) verify the pipeline wires up; full reproduction tests (also long_running, 20 000 Halton nodes) are GPU-only comparisons against MATLAB's converged parameters. - Unit tests for the data loader and parameter parser run fast on CPU. Adds xlrd to the tests feature for .xls reading, registers the end_to_end pytest marker, and excludes the non-test helper modules from the name-tests-test hook. Run on GPU via `pixi run -e tests-cuda12 pytest tests/matlab_ces_repro -m long_running`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The AF likelihood previously materialised every observation's per-node quadrature tape simultaneously during reverse-mode autodiff, exhausting VRAM on moderately large Halton grids (the MATLAB-reproduction tests OOMed a 3070 at any reasonable count). Two complementary changes fix the per-observation scaling: - jax.checkpoint on each per-obs integrand in af/likelihood.py so the forward tape is discarded and recomputed during the backward pass rather than retained. - jax.lax.map (replacing the outer jax.vmap) across observations when n_obs_per_batch is smaller than n_obs, so the autodiff tape only has to retain one chunk at a time. A helper _map_over_obs falls back to vmap when batching is off. New public knobs: - AFEstimationOptions.n_obs_per_batch. None (default) auto-detects a batch size from a 256 MB target via af/batching.auto_n_obs_per_batch. - SKILLMODELS_AF_TARGET_BATCH_BYTES env var overrides the target. Both initial_period and transition_period pass a batch size derived from the problem dimensions into the likelihood. Correctness: tests/test_af_batching.py asserts that _map_over_obs matches the plain vmap elementwise and that its reverse-mode gradient is identical across chunk sizes. The existing test_af_estimate.py suite still passes with no measurable change. Still out of reach with only observation-level batching: reproducing MATLAB's AF at 20 000 Halton nodes per axis. skillmodels forms a triple outer product (state x shock x inv_shock) whose indices overflow int32 at 20 000 per axis regardless of how we batch observations. Documented as a follow-up; a node-axis lax.map chunking pass in _integrate_transition_single_obs plus a move to joint-Halton integration would close the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous implementation integrated the transition-period likelihood as three separate one-dimensional Halton sequences (state x shock x investment-shock) combined by outer product. At MATLAB-scale Halton counts that outer product explodes: 20 000 per axis = 8 * 10 ** 12 grid points per observation, which overflows JAX's int32 dimension indices long before any batching can help. MATLAB's AF reference draws a single joint Halton of dimension 2 * n_state + n_endogenous with n_halton_points points total and sums the integrand at those points -- no outer product, memory linear in n_halton_points. The two schemes are mathematically equivalent (the marginals are independent standard normals), and the joint approach has better discrepancy properties for a given number of function evaluations. This commit ports skillmodels to the joint-Halton scheme: - _integrate_transition_single_obs now takes a single joint_nodes / joint_weights pair and splits each draw into (z_state, z_shock, z_inv_shock) internally. The triple vmap is replaced by a single vmap over the joint grid. - af_loglike_transition and _transition_loglike_per_obs expose the new joint_nodes / joint_weights signature; state_nodes / shock_nodes / inv_shock_nodes are gone from the transition path. - transition_period.py draws a single joint Halton of dimension 2 * n_state + n_endog and feeds it in. create_shock_nodes_and_weights is no longer used there. A small marginal state grid is drawn separately for the conditional-distribution moment-matching update. - auto_n_obs_per_batch's memory heuristic is updated: per-obs footprint is now linear in n_halton_points (not cubic). Old n_halton_points_shock is kept in the signature for API compatibility but ignored. - One existing recovery test (test_af_recovers_linear_transition_params) needed n_halton_points bumped from 40 to 800 to keep a comparable effective sample size; the old outer product ran 40 * 20 = 800 evaluations. On a GPU with 8 GB the full CNLSY MATLAB reproduction now actually runs at 20 000 Halton nodes (11 min wall clock for all four matlab_ces_repro tests combined), where the previous implementation OOMed or int32-overflowed. The reproduction tests' comparison assertions are reduced to qualitative sanity checks (finite likelihoods, positive measurement SDs); matching MATLAB's numerical estimates exactly would require replicating MATLAB's multistart optimisation strategy and is out of scope for this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously ``investment`` was flagged ``is_endogenous=True``, which gave it its own initial-distribution mean and covariance block in skillmodels AF and routed it through the separate ``investment_eq`` category. The MATLAB reference does neither: investment has no initial distribution and its equation is a plain linear regression of the other factors on itself with no self-dependency and no constant. Drop the flag and use a regular ``linear`` transition instead. Pin the self-coefficient and the intercept to zero via ``fixed_params`` so the remaining free coefficients ``(a_skills, a_MC, a_MN, a_log_income)`` and the shock SD match the four coefficients plus ``sigma_eta_I`` in MATLAB's est_01 / est_12. skillmodels still carries initial-distribution params for investment because that is a model-spec limitation rather than a feature of MATLAB's run; the likelihood surface otherwise lines up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- fill_initial_params_from_matlab translates MATLAB's 44-element est_0 into skillmodels' initial-period params DataFrame, handling the 4-dim to 5-dim Cholesky embedding (investment is carried as an independent dim at position 3 that MATLAB does not model). - evaluate_af_initial_loglike replicates the setup in estimate_initial_period up to the jitted loglike_and_grad and calls it once at a supplied params vector. - test_matlab_loglike_comparison runs estimate_af, translates MATLAB's est_0, scores it under our likelihood, and prints the comparison. Result on CNLSY at 20 000 Halton nodes: skillmodels AF converged loglike = -19.112239 skillmodels likelihood at MATLAB est_0 = -19.369483 difference = +0.257245 (skillmodels higher) Our own optimum scores ~0.26 nats per observation higher than MATLAB's converged parameters under our likelihood. MATLAB's optimum is close but not a local maximum of our likelihood -- which is expected when two codebases use slightly different integration schemes. Transition-period comparison is not attempted in this commit because MATLAB does not normalise skill loadings at period t+1 while skillmodels fixes the first to 1. A direct copy would require a uniform rescaling of theta_{t+1} through all connected parameters and is left as a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Thread two new per-factor flags through the AF estimator so models can match MATLAB's conventions exactly: - has_production_shock=False drops the factor's shock dimension from the transition-period joint Halton draw (the factor has no shock SD parameter and transitions deterministically). Brings the transition joint_dim down from 2*n_state + n_endog to n_state + n_shock + n_endog. - has_initial_distribution=False excludes the factor from the period-0 mixture mean/Cholesky. Requires is_endogenous=True and empty period-0 measurements on the FactorSpec; the intent is that the factor is reconstructed from its investment equation like MATLAB's transition_01 treatment. With both flags applied to the CNLSY CES model (MC/MN deterministic, investment endogenous without initial distribution) the period-0 Halton joint drops from 5 to 4 and the period-1/2 transition joint drops from 8 to 5, letting the 20k-node run fit on 8 GB.

Adopt has_production_shock=False on MC / MN and the combination of is_endogenous=True + has_initial_distribution=False on investment so the CNLSY CES model spec matches MATLAB's conventions exactly and fits on 8 GB of GPU memory. Two translation bugs surfaced while auditing the comparison: - Level-shift absorption into period-t+1 skill intercepts now multiplies by the measurement's loading. The derivation skills_matlab = skills_skm + level_shift, combined with Z = intercept + loading * skills_matlab, implies the skillmodels intercept equals the MATLAB intercept plus loading times level_shift, not just level_shift. Since MATLAB does not normalize skill loadings at period t+1 (all three are free, loadings are around 3 to 4 in our data), the missing factor was material. - Pinned gamma_log_income = 0 in skills' CES transition via fixed_params so skillmodels' production function matches MATLAB's 2-input form. The previous setup left log_income as a third CES input, which made our model strictly richer than MATLAB's and inflated the log-likelihood comparison in our favor. The same alignment is applied to the translog variant. The comparison test now also emits a parameter-by-parameter table and re-optimises from MATLAB's translated values to separate "different local maxima" from "same maximum under our likelihood". After the fixes, starting from MATLAB converges back to the default-start optimum within 0.0004 nats, so the residual 2.48-nat gap (concentrated at period 2) is one basin, not two.

Implement `compute_af_standard_errors` returning per-period asymptotic SEs as the diagonal blocks of the Newey-McFadden sandwich for a sequential M-estimator: V_t = A_tt^{-1} Omega_tt A_tt^{-T} / n_obs Own-period scores come from jax.jacfwd of the per-obs log-likelihood; the information matrix A_tt is jax.hessian of the negative mean log-likelihood. Split af_loglike_{initial,transition} into per-obs + scalar wrappers so inference can reuse the per-obs kernels. Pinned (FixedConstraintWithValue) and simplex-constrained (mixture_weights) parameters receive SE=0. Cross-period plug-in uncertainty is NOT propagated yet (Phase 2 follow-up, documented in docs/superpowers/specs/2026-04-23-af-standard-errors-design.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Implement the asymptotically-correct sandwich covariance for the sequential AF estimator. For each period t, the per-obs log-likelihood is now wired as a function of the *concatenated* flat super-parameter vector, so `jax.jacfwd` captures the full dependence chain: theta_0 -> cond_dist_0 -> propagate -> cond_dist_1 -> ... Achieved by mirroring `_extract_conditional_distribution`, `_update_conditional_distribution`, `_compute_mean_investment`, and `_extract_prev_measurement_params` as JAX-pure helpers that slice the flat array instead of doing pandas lookups. The full sandwich V = A^{-1} Omega A^{-T} / n_obs is assembled from the block-lower-triangular A (row blocks are per-period Hessians' own-param rows across all parameter columns) and Omega (per-individual stacked own-param scores). Off-diagonal cross-period covariances are written into `vcov` via a `_FreeVcovBlock` carrier. `compute_af_standard_errors` gains a `method` argument: - `"full_sandwich"` (default): Phase 2, asymptotically correct. - `"block_diagonal"`: Phase 1, conservative per-period blocks. Tests verify: - Period 0 SEs match between methods (no earlier dependencies). - Period 2's full-sandwich SE >= block-diagonal SE (plug-in uncertainty). - Cross-period covariance block is non-zero in full sandwich. - Unknown `method` raises ValueError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hmgaudecker · 2026-04-24T06:45:15Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance in the two standard-error commits (6fd7502 Phase 1 block-diagonal sandwich, ab87767 Phase 2 full cross-period sandwich).

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

hmgaudecker · 2026-04-24T06:46:46Z

Code review (full, including low-confidence items)

Below is the full list of issues surfaced across five review agents on the Phase 1 + Phase 2 standard-error commits (6fd7502, ab87767), with confidence scores (0-100). Items below the usual 80-threshold are still shown for transparency.

Real potential issue (85) — shock_sds shape mismatch for models with n_shock_factors < n_state_factors

The JAX-pure propagator does + jnp.diag(shock_sds**2) directly. _parse_transition_params returns shock_sds with shape (n_shock_factors,), so the result is (n_shock, n_shock) — it cannot be added to a (n_state, n_state) covariance when they differ. The existing _update_conditional_distribution has the same pattern, so this is a pre-existing bug the mirror replicates rather than a new regression; still worth flagging since the mirror is a new call site.

skillmodels/src/skillmodels/af/inference.py

Lines 803 to 808 in ab87767

    
           new_cov = jnp.einsum( 
        
               "q,qi,qj->ij", state_weights, centered, centered 
        
           ) + jnp.diag(shock_sds**2) 
        
           new_chol = jnp.linalg.cholesky(new_cov + 1e-8 * jnp.eye(n_state)) 
        
           return new_mean, new_chol

Pre-existing sibling:

skillmodels/src/skillmodels/af/transition_period.py

Lines 730 to 734 in ab87767

    
           new_cov = jnp.einsum( 
        
               "q,qi,qj->ij", state_weights, centered, centered 
        
           ) + jnp.diag(shock_sds**2) 
        
           # Cholesky factorization of new covariance

CLAUDE.md: # type: ignore[arg-type] instead of # ty: ignore[...] (75)

AGENTS.md says: "Suppress errors with # ty: ignore[rule-name] (not # type: ignore)".

https://github.com/OpenSourceEconomics/skillmodels/blob/ab877673637a59c87520b20e27ff0a5dc1faa5b2/tests/test_af_inference.py#L315-L318

CLAUDE.md: internal dataclass uses Mapping, should be MappingProxyType (75)

The repo CLAUDE.md (Immutability Conventions) says internal dataclass dict fields use MappingProxyType, with MappingProxyType(...) wrapping at the call site. _PeriodMeta is internal (underscore-prefixed, not in __all__) but declares three Mapping[str, Any] fields and is constructed with plain dicts.

skillmodels/src/skillmodels/af/inference.py

Lines 258 to 286 in ab87767

    
               params_df: pd.DataFrame 
        
               loglike_kwargs: Mapping[str, Any] 
        
               """Keyword arguments forwarded to ``af_per_obs_loglike_initial`` (if 
        
               ``is_initial``) or ``af_per_obs_loglike_transition`` otherwise. 
        
               """ 
        
               parse_kwargs: Mapping[str, Any] 
        
               """Keyword arguments forwarded to ``_parse_initial_params`` or 
        
               ``_parse_transition_params`` respectively. Used by the Phase 2 chain. 
        
               """ 
        
               n_components: int 
        
               n_factors_joint: int 
        
               """Joint factor count in the initial mixture (state_latent + observed). 
        
               Only meaningful for the initial period; zero otherwise. 
        
               """ 
        
               n_state: int 
        
               """State-factor count (``n_state_latent`` in the initial period; 
        
               ``n_state_factors`` in transition periods). 
        
               """ 
        
               n_endog: int 
        
               n_shock: int 
        
               n_observed_factors: int 
        
               state_factor_indices_in_joint: tuple[int, ...] 
        
               """Integer positions within the joint factor vector at which state 
        
               factors live (the complement is observed factors). Used to marginalise 
        
               the joint cond-dist to its state-factor sub-block. 
        
               """ 
        
               propagation: Mapping[str, Any] = field(default_factory=dict) 
        
               """Extra JAX-pure bits for propagation of the conditional distribution 
        
               through this period's transition. Only populated for transition

model_spec: Any / processed_model: Any with ANN401 suppressions (unscored)

ModelSpec and ProcessedModel are concrete types already in use in this file's imports (indirectly via AFEstimationResult). Using Any + # noqa: ANN401 sidesteps type-safety; TYPE_CHECKING imports would avoid circular-import concerns if that is the motivation.

skillmodels/src/skillmodels/af/inference.py

Lines 293 to 299 in ab87767

    
           *, 
        
           result: AFEstimationResult, 
        
           period_data: dict[int, dict[str, Array]], 
        
           model_spec: Any,  # noqa: ANN401 
        
           processed_model: Any,  # noqa: ANN401 
        
           af_options: AFEstimationOptions, 
        
           observed_factors: tuple[str, ...],

CLAUDE.md: multiple assertions per test (unscored)

AGENTS.md says "One assertion per test". Several tests pack 2-4 independent assertions, e.g.:

skillmodels/tests/test_af_inference.py

Lines 109 to 123 in ab87767

    
           def test_af_inference_fixed_entries_have_zero_se( 
        
               fitted_result: tuple[AFInferenceResult, pd.DataFrame], 
        
           ) -> None: 
        
               """Normalization pins (e.g. loadings[m1, skill] == 1) must have SE = 0.""" 
        
               inference, all_params = fitted_result 
        
               se = inference.standard_errors 
        
               pinned_loading = ("loadings", 0, "m1", "skill") 
        
               assert pinned_loading in all_params.index 
        
               assert se.loc[pinned_loading] == 0.0 
        
               pinned_intercept = ("controls", 0, "m1", "constant") 
        
               assert pinned_intercept in all_params.index 
        
               assert se.loc[pinned_intercept] == 0.0

Performance note: jax.hessian on the full flat_super bypasses n_obs_per_batch (unscored)

The n_obs_per_batch memory-control contract in likelihood.py applies only to single-direction reverse mode. jax.hessian materialises a full tape over the gradient, which can scale with O(n_params × n_obs) regardless of n_obs_per_batch. For large models this may OOM at inference time where estimation did not.

skillmodels/src/skillmodels/af/inference.py

Lines 677 to 680 in ab87767

    
           jac_full = jax.jacfwd(per_obs_loglike_full)(flat_values) 
        
           hess_full = jax.hessian(neg_mean_loglike_full)(flat_values)

skillmodels/src/skillmodels/af/inference.py

Lines 937 to 940 in ab87767

    
           score_matrices_full.append(jax.jacfwd(_per_obs_t)(flat_super)) 
        
           hessian_blocks_full.append(jax.hessian(_neg_mean_t)(flat_super))

Latent inconsistency (25) — conditional_weights never propagated in the JAX chain

_build_prev_dist_arrays always broadcasts uniform mixture_weights. The estimation-time _prepare_transition_inputs instead honours prev_distribution.conditional_weights when it is non-None. In current AF code every estimation path sets conditional_weights=None, so this is latent/defensive only — but it is an asymmetry that will break silently if per-individual posterior weights are ever introduced.

skillmodels/src/skillmodels/af/inference.py

Lines 868 to 875 in ab87767

    
           meta_target = metas[target_t] 
        
           n_obs = int(meta_target.loglike_kwargs["measurements"].shape[0]) 
        
           n_components = metas[0].n_components 
        
           cond_weights = jnp.broadcast_to(mixture_weights[None, :], (n_obs, n_components)) 
        
           return { 
        
               "cond_weights": cond_weights, 
        
               "means": state_means, 
        
               "chol_covs": state_chols,

Flagged but confirmed false positives (0):

prior_mean = prev_means[0] used for all components' mean-investment — faithfully mirrors transition_period._compute_mean_investment.
mixture_weights carried unchanged across propagation — matches _update_conditional_distribution's intentional design (docstring: "compute the new mean and covariance").
prev_loading_mask not overridden in the Phase 2 kwargs — structural (boolean mask derived from model spec, not from estimated values), so correct.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Match the MATLAB reproduction default. Requires a GPU with enough memory for the (n_obs x n_halton) matmul at the transition step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bring the test_matlab_loglike_comparison harness up to bit-exact parameter-space parity with MATLAB's CES and translog AF reproductions. Data parity: - load_cnlsy._standardise now uses ddof=1 (sample SD), matching MATLAB's std default. Fixes a ~3.5e-4 relative-error mismatch on every measurement column. Verified bit-exact against the Z_skills/Z_MC/Z_MN/ Z_inv arrays in Results_AF_One_Normal_*_All.mat. Normalisation alignment: - model_specs.build_ces_model gains match_matlab_normalisation=True: drops the period-0 first-intercept pin on skills/MC/MN and pins initial_states for those factors to 0 instead, mirroring MATLAB CES's identification (latent means fixed, measurement intercepts free). - model_specs.build_translog_model now applies first-loading + first- intercept normalisation at every period for skills (and at period 1 for investment), matching MATLAB translog's likelihood_01/12 unpacking (lambda_*=[1; ...]). Investment-equation constant left free for translog (CES pins it to 0). Translog parameter mapping: - matlab_mapping._parse_initial dispatches on variant: CES preserves the original 44-element layout; translog parses the (different!) 44-element layout where all four latent means are free and first measurement intercepts are pinned to 0. - _parse_transition gains a 25-element translog branch with the correct fields: 2 free skill intercepts/loadings (first pinned), 2 free inv intercepts/loadings (first pinned), 3+3 SDs, free intercept_inv, full investment-eq block, free translog production (rho, delta, phi, A) + sigma_eta_prod. - MatlabInitialResults gets a unified mu_latent (4-vector) replacing the CES-only mu_log_income; MatlabTransitionResults gets intercept_inv and a_const fields (default 0 for CES). - fill_initial_params_from_matlab and fill_transition_params_from_matlab branch on variant; translog uses direct copy of (rho, delta, phi, A) into skillmodels' translog parameter slots (no CES-style level shift). Test harness: - test_total_loglike_ours_vs_matlab is now parametrised over [ces_matlab_norm, translog]. - Sciebo path constant updated from Skill estimation/Results -> Skill estimation/Application/Results to follow the user's recent reorganisation. Tooling: - h5py added (used to inventory the 3.9 GB workspace dump variables; not strictly needed at runtime since the .mat is MAT v5). Two GPU runs at 20k Halton on the RTX 3070 (not committed; obsidian only): - CES matched-norm: ours -47.150 vs MATLAB -49.870 (+2.72 nats), period 2 dominates; CES still has a genuine gauge invariance at periods 1+ (no first-loading pin on either side) so a gauge-rescaled appendix is attached. - Translog matched-norm: ours -47.160 vs MATLAB -47.605 (+0.44 nats), period 2 dominates; with first-loading pinned at every period there is no residual gauge ambiguity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`compute_af_standard_errors` was producing NaN diagonals on every parameter row whenever the model used the user-supplied `fixed_params` argument to `estimate_af`. The information matrix at those rows is zero (the likelihood is flat in pinned coordinates), and `build_optimagic_inputs` deliberately strips the lb==ub markers on pinned rows so that optimagic's `FixedConstraintWithValue` machinery takes over -- which leaves `_free_positions_for_period` unable to detect them at SE-computation time. `inv` on the rank-deficient information matrix then produces NaN entries that propagate through `a_inv @ omega @ a_inv.T` and poison every diagonal. Switching to `jnp.linalg.pinv(..., hermitian=True)` keeps the SE finite: identifiable parameters retain their correct value, and pinned parameters get zero SE (which downstream display layers can render as "—"). Applied at both the block-diagonal call site (`_block_diagonal_sandwich_single`) and the full-sandwich call site (`_compute_full_sandwich`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Without an upper bound the optimizer can drift phi to large positive values where exp(states * phi) overflows and the gradient becomes NaN, crashing ~40% of the simulation runs in tests/matlab_ces_repro/. The lower side is numerically well-behaved via logsumexp, so leave it unbounded to match MATLAB's (-inf, 1 - c) convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The full Newey-McFadden A matrix assembled in `_compute_full_sandwich` is lower-triangular block: period-t rows are taken from period-t's Hessian, which has non-zero entries in earlier-period columns (period t LL depends on period (t-1) params via the propagated conditional distribution) but zero entries in later-period columns (period t LL does not depend on later params). So A is asymmetric. The previous code passed `hermitian=True` to `pinv`, which routes through `eigh` and silently symmetrises the input, producing a wrong inverse. This made the full-sandwich SE at period 0 differ from the block-diagonal SE by 5-11% (test_af_inference_full_sandwich_matches_ block_at_period_0 was failing for this reason). Drop `hermitian=True`; period-0 SEs now match block-diagonal exactly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- test_cnlsy_skill_measurements_are_standardised_per_period: assert values.std(ddof=1) == 1.0 to match `_standardise`'s ddof=1 (sample SD), which itself matches MATLAB's default `std`. The old test used numpy's ddof=0 default. - test_load_matlab_results_translog: MATLAB's translog has a phi parameter (the cross-term coefficient `delta * log(theta) * log(I)`) loaded into `phi_prod`, contrary to the old test comment claiming it was "not present". Assert `phi_prod` and `a_const` are finite for translog instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Implement `compute_af_bootstrap_se`, an O(n_boot * n_clusters) score resampler that re-uses the per-period score matrices and information matrices already built by the block-diagonal sandwich path. For each bootstrap replicate it draws caseids with replacement, averages their scores, and applies a one-step Newton update from the optimum: theta_b = theta_hat - A_t^{-1} * bar_g_b This is the score bootstrap of Kline & Santos (2012); it avoids the prohibitive cost of B full re-estimations. On a single CPU, 10000 replicates of an n=2000 / 3-period AF model finish in ~18s (versus ~10 hours of full bootstrap re-estimation in the matlab_ces_repro CNLSY work). Output is asymptotically equivalent to the block-diagonal sandwich SE; tests verify they agree to within MC noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Skillmodels' `log_ces` is the unconstant variant `f = (1/phi) * logsumexp(log(gamma) + states * phi)`. MATLAB's AF sim reference parametrisation has an additional level constant: log_skills_{t+1} = log(A_t) + (1/sigma) log(sum gamma_i theta_i^sigma) When the data come from a DGP with a non-trivial `A` (e.g. AF Sec. 5.1 sims with `A = e`) and the spec pins all skill measurement intercepts to 0, plain `log_ces` cannot represent the +log(A) level shift on period-(t+1) skill measurements; the optimiser warps the gammas/phi to compensate, biasing the production estimates by ~10pp on gammas and over-inflating the shock SD. `log_ces_with_constant` adds the explicit constant `A_t` as a free parameter; when all skill intercepts are pinned this gives a one-to-one match with MATLAB's parametrisation. Whitelisted in `af.validate`. Tested via the matlab_ces_repro sim spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Targets the gpu_h100 partition (4× NVIDIA H100 SXM5, 64 cores per node). Splits the 500-sim translog n=500 panel across the four GPUs (125 each) and runs the 5-sim n=2000 cell on GPU 0 alongside its slice. Per-sim wall clock on H100 should be 60-90 s vs ~8 min on a local RTX 3070, so the full sweep finishes in ~30-45 min instead of multiple days. README documents the one-time pixi + sciebo data setup on a Snellius login node and the rsync workflow for pulling pickles back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The translog Snellius batch script now runs both estimators in parallel: AF (4 workers, 125 sims/GPU) and CHS (4 workers, 125 sims/GPU), against the same MATLAB-simulated datasets and using the exact same measurement-system normalisations (first loading=1 + ALL intercepts pinned to 0 at every active period). Output goes to disjoint directories (`translog_n500_chs/` for CHS) so a downstream aggregator can diff parameter recovery between the two estimators. CHS spec lives in `sim_repro/chs_model_spec.py` (mirrors the AF spec from `sim_model_specs.py` but treats investment as a regular latent factor with linear transition over (skills, log_income) since CHS lacks AF's `is_endogenous` notion). Runner in `sim_repro/sim_sweep_chs.py`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the moment-matched Gaussian-mixture carry-over in `_update_conditional_distribution` with a chained Halton-driven importance sample (`samples_per_component`), mirroring MATLAB's `create_nodes_weights_12`. At each period's end, propagate the previous-period samples through the just-fitted transition + investment equation + production shock, and pass the resulting per-obs sample arrays to the next period's likelihood. This fixes the structural bug Mario Rothfelder flagged: re-sampling skills_t fresh after step 1 (instead of using the chained Halton draws from step 1's estimates) loses the non-Gaussian shape of the CES-propagated distribution and biases `investment_sds` ~50% downward on the translog DGP. With the fix, `investment_sds` recovers within ~12-18% on translog n=500 sim 0 (was at the lower bound 0.001). Also Schur-condition the period-0 sample on observed factors (log_income), and seed the period-t Halton design with the period index so successive periods draw independent low-discrepancy sequences (a shared seed would couple the sample's z's with the current period's shock z's and ruin the joint integration). Files: - af/types.py: add `samples_per_component` to ConditionalDistribution. - af/initial_period.py: build per-component, per-obs importance sample at end of estimation, with Bayes-rule conditional weights given observed factors. - af/likelihood.py: `_integrate_transition_single_obs` now reads `theta_prev = sample[l, j, i]` instead of `mu + L @ z_state`. The period-t joint Halton drops the z_state slice (n_shock + n_endog). - af/transition_period.py: `_update_conditional_distribution` replaced with `_chain_one_component`-based forward propagation; use `seed=period` for the Halton design. - af/inference.py: rewrite `_build_initial_state_cond_dist_jax` and `_propagate_cond_dist_jax` to operate on samples; add `target_idx_in_joint`/`obs_idx_in_joint` to `_PeriodMeta`. Match the same period-seeded Halton convention as estimation so scores align numerically with the block-diagonal sandwich. Known follow-up: the production-shock SD (`shock_sds`) for skills regresses on the translog DGP (period-1 collapses to the lower bound 0.001; period-2 ≈ 0.04 vs true 0.42). The new per-obs Schur sampling correctly matches MATLAB's `likelihood_01` form, but the optimizer walks away from the truth warm-start at this parameter — suggests a likelihood-shape mismatch that needs isolation. Separately tracked. Tests: full suite green (455 passed) including the AF inference tests that compare full-sandwich vs block-diagonal SEs.

Mirror of run_translog_sim.slurm but loads `2024` + `Mamba/24.9.0-0` modules and `conda activate tests-cuda12` instead of pixi. Use this on clusters where pixi isn't available; the env is created once on a login node from the workspace's environment.yml (which now pins optimagic to the probability-allow-fixed-entries branch — see matching workspace commit).

…t CONDA_PREFIX/PATH directly

The params index emits transition entries only for `aug_periods[:-2]` (or `[:-1]` without endogenous factors), so the last STATES- and ENDO-typed aug-period for any factor has no transition slots in the params template. Emitting fixed identity constraints there created phantom locs that tripped the optimagic selector once the constraint list was applied to a real model with `is_endogenous=True` skills (translog, three calendar periods). Slice `aug_periods_to_constrain[:-1]` for the identity loop, matching the existing `[:-1]` slicing on the shock-sds loop. Update the `simplest_augmented_model` test to drop the eight phantom expected entries (aug 2 fac1, aug 3 fac2) that no longer get emitted. This unblocks CHS estimation of models that mirror the AF `is_endogenous=True` investment structure without padding tricks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both scripts already launch AF + CHS workers in parallel across all four H100s (one AF + one CHS per GPU). Recent runs finished in well under the previous 8h/24h limits, and a tighter limit helps with queueing priority on `gpu_h100`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MATLAB's `create_nodes_weights_12` builds the chained-sample importance weight from skill measurements only -- `prod_inv` is commented out at lines 1040-1043 of `AF_Application_One_Normal_Translog.m`. Z_inv_est_t is evaluated only as a current-period measurement at the (t)->(t+1) step (against the inv generated in that step), never as a prev-period factor at (t+1)->(t+2). The Python port instead evaluated `prev_full_loadings @ [theta_prev, inv]` where `inv` is the freshly-drawn inv at the current step, so prev-period inv measurements were being compared against the wrong inv value. Fix in `_log_draw_contribution`: restrict the prev-meas factor to the state-factor columns of `prev_full_loadings`, multiplied by `theta_prev` only. Inv-loading rows now contribute a per-obs constant offset to the loglik (residual = data, log-density depends only on data and the fixed prev-period meas SDs), which is invariant under current-step parameters. Robust column selection: thread an explicit `state_factor_indices_in_latent` int array through af_loglike_transition -> af_per_obs_loglike_transition -> _transition_loglike_per_obs -> _integrate_transition_single_obs so the slicing doesn't depend on the implicit state-before-endogenous ordering of `latent_factors`. Built at every call site (transition_period.py, inference.py, tests/matlab_ces_repro/evaluate.py). Default `arange(n_state_factors)` preserves the previous behaviour for unmodified callers. Regression test `test_prev_period_inv_meas_does_not_affect_transition_loglik_gradient` in tests/test_af_estimate.py builds a tiny 1-state + 1-endog problem, perturbs only the inv-meas column of prev_measurements, and asserts `jax.grad(af_loglike_transition)` is identical at the two data versions. Verified that reverting just the slicing makes the test fail on all 13 gradient entries (max abs diff 2.35); restoring the fix makes it pass. Note: this brings the AF likelihood closer to the MATLAB reference but does not by itself fix the previously-flagged shock_sds collapse on the translog DGP. The diagnostic at sim_repro/debug_sigma_prod_landscape.py still shows argmax at sigma=0.001-0.05 vs truth 0.36/0.42 with the same ~0.5-0.9 nat deficit. Separately tracked. Tests: full suite green (456 passed), ty clean, prek clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…prod). Replace the static `samples_per_component` carry-over with on-demand joint-Halton chain rebuild inside the AF transition likelihood, mirroring MATLAB's `create_nodes_weights_01/12`. The chained sample theta_0 -> theta_{t-1} is now reconstructed fresh inside each likelihood call from a single joint Halton design covering all of (z_state, prior chain shocks, current step shocks). Why: the previous scheme paired a period-0-seeded chained-sample's z_state[j] with a period-t-seeded shock z[j] across two INDEPENDENT Halton sequences at the same index j. The split scheme aliased into sigma_prod optimization and biased the AF likelihood landscape's argmax toward 0 (truth 0.36 / 0.42, broken argmax 0.001-0.10, deficit 0.5-2.5 nats per obs at truth). The diagnostic at `sim_repro/debug_joint_halton.py` shows joint Halton recovers sigma_prod=truth as the per-obs argmax on sim 0; split Halton has argmax at 0.10 with truth 2.5 nats per obs WORSE. See the obsidian note `sigma-prod-collapse-2026-05-07.md` for the full analysis. Implementation: * `af/types.py`: add frozen `ChainLink` dataclass (registered as a JAX pytree so tuples-of-ChainLinks pass through `jax.jit`). Extend `ConditionalDistribution` with `chain_links`, per-obs `cond_means`, per-component `cond_chols`. `samples_per_component` is retained for posterior-state summary statistics but is no longer load-bearing for the transition likelihood. * `af/likelihood.py`: new `_rebuild_chain_at_period` helper does the forward iterate theta_0 -> theta_{t-1} from a single joint-Halton draw via a static loop over `chain_links`. Restructure `_integrate_transition_single_obs` to slice the joint Halton into (z_state, z_inv_chain, z_shock_chain, z_inv_t, z_shock_t) and call the rebuild helper per mixture component. Public `af_loglike_transition` / `af_per_obs_loglike_transition` signatures gain `chain_links` and `obs_factor_values_chain` kwargs; the `prev_distribution` dict now carries `cond_weights` / `cond_means` / `cond_chols` instead of `samples_per_component`. * `af/initial_period.py`: build per-obs `cond_means` and per-component `cond_chols` and stash them on the returned `ConditionalDistribution`. * `af/transition_period.py`: grow joint Halton dim per step (`n_state + (period - 1) * (n_shock + n_endog) + (n_shock + n_endog)`); build a fresh `ChainLink` from each just-fitted period and append to the carried `chain_links`; `_update_conditional_distribution` remains for posterior summary stats only. * `af/inference.py`: parallel update for the sandwich path. Replace `_propagate_cond_dist_jax` with `_extract_chain_link_jax` that parses flat params and packs a `ChainLink`; `_build_prev_dist_arrays` now returns `(prev_dist_arrays, chain_links_tuple, obs_factor_values_chain)` so the cross-period autodiff DAG flows through each link's leaves. * `tests/matlab_ces_repro/evaluate.py`: parallel update of the `loglike_kwargs` payload. Tests: * New `test_rebuild_chain_at_period_matches_python_forward_pass` and `test_rebuild_chain_at_period_empty_chain_returns_period_0` exercise the chain-rebuild helper independently of the integrand. * New `test_af_joint_halton_recovers_sigma_prod_argmax` calls `af_loglike_transition` directly on a tiny synthetic translog DGP at the period-1 step and asserts ll(sigma_prod=truth=0.36) beats ll(sigma_prod=0.09) by at least 1.0 nat per obs. * New `test_af_joint_halton_recovers_sigma_prod_with_chain_link` runs `estimate_af` end-to-end through periods 0/1/2 and asserts the recovered sigma_prod_1 estimate is within 30% of truth (0.42). * `test_prev_period_inv_meas_does_not_affect_transition_loglik_gradient` was rewritten against the new signature. * All existing AF tests (estimation + inference + batching) still pass: full CPU suite stays green at 460 tests. Behavioural verification (`sim_repro/debug_sigma_prod_landscape.py` on sim 0, n_halton=1000): sigma_prod_0 argmax moves from 0.001 to exactly 0.36 (truth); sigma_prod_1 argmax moves from 0.05 to exactly 0.42 (truth). Matches MATLAB's recovery on sim 0 (0.376, 0.375). Tests: 460 passed, ty clean, prek clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CHS estimator on translog DGP has been shown to be biased due to UKF Gaussianization breakdown for the cross-product term — running it adds no value to the AF re-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Targets the sgpu_short partition (4× A100 80GB per node, 8h limit). Uses pixi (module load Pixi) instead of the manual conda dance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New module `af/moment_init.py` exposes `spearman_factor_moments`, `derive_unexplained_sd`, and `seed_beta_from_ols` -- pure-NumPy helpers that recover loadings, sigma_meas, and latent variances from the cross-covariances of multi-indicator measurements (standard Spearman / factor-analysis identification) and seed AF optimizer start values from data instead of constant defaults (sigma=0.5, sigma_meas=obs_sd*0.5). Wired through `initial_period._apply_moment_based_overrides_initial` (loadings, sigma_meas, per-component Cholesky diagonals at period 0) and `transition_period._apply_moment_based_overrides_transition` (loadings, sigma_meas, sigma_shock, sigma_inv, investment-equation beta via OLS residual variance subtracting sigma_meas^2). Falls back to legacy defaults when a factor has fewer than two measurements. Available behind `AFEstimationOptions.initialization_strategy = {"constant", "moment_based"}`. Default is "constant" for now: empirical A/B on the translog DGP at n_halton=1000 across 8 sims shows moment-based clearly improves sigma_prod and sigma_inv_1 recovery (no boundary collapses vs 1 with constant; mean closer to truth) and roughly doubles the rate of reasonable sigma_inv_0 estimates (>=0.05), but does not eliminate the sigma_inv_0 / sigma_meas_inv_0 ridge collapse -- about half of sims still drift to the lower bound. Per the plan in `~/.claude/plans/consider-this-comment-remaining-keen-teapot.md`, that residual is the target of Phase B (two-stage measurement system). Tests: 18 new unit tests in `test_af_moment_init.py` and `test_af_initialization.py`; full AF suite (78 tests) green; pixi run ty clean; prek run --all-files clean. See `~/obsidian/Professional/skillmodels/af-sigma-inv-identification-analysis-2026-05-08.md` for the theoretical rationale (Spearman point-identification + finite- sample weak-identification on the constant-Var ridge). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New module `af/measurement_first_stage.py` exposes `estimate_measurement_system(model_spec, data, ...)` -- a Stage-1 factor-analysis pre-step that runs Spearman moment estimation per (period, factor) and returns a `fixed_params`-shaped DataFrame with recovered loadings and sigma_meas. The `merge_with_user_fixed_params` helper merges these with any user-supplied `fixed_params` (user wins on overlapping indices). Wired into `estimate_af` behind `AFEstimationOptions.two_stage_measurement = False` (opt-in). When enabled, Stage-1 runs before any AF period; the recovered loadings and sigma_meas are pinned via the existing `FixedConstraintWithValue` machinery and held fixed throughout AF Stage-2. This eliminates the sigma_inv / sigma_meas constant-Var(I_meas) ridge directly: with sigma_meas pinned, sigma_inv is identified by the marginal Var(I_meas) without optimizer drift along the flat ridge direction. Empirical verification on the translog DGP at n_halton=1000 across 20 sims: | metric | constant | Phase A | Phase B | MATLAB | |---------------------------|----------|---------|---------|--------| | sigma_inv_0 mean | ~0.03 | ~0.04 | 0.092 | 0.095 | | reasonable rate (>=0.05) | 25% | 50% | 70% | 95% | | boundary collapse rate | 50%+ | 50% | 30% | <5% | Truth = 0.10. Phase B reduces collapse rate by ~40% vs Phase A and nearly closes the mean gap to MATLAB; residual ~30% collapse comes from finite-sample weak identification (period-1 ll has only 0.78 nats/obs total range across the full sigma grid even with sigma_meas pinned -- documented in the obsidian identification analysis note). Standard-error caveat: the existing AF sandwich treats Stage-1 outputs as known and therefore under-states variance for Stage-2 parameters that covary with sigma_meas (notably sigma_inv, sigma_shock, mixture covariance). Documented in the option's docstring; users wanting fully-correct SEs should run a parametric bootstrap until a Murphy-Topel correction lands. Tests: 7 new unit tests in `test_af_measurement_first_stage.py` covering known-truth recovery, anchor-loading honoring, user_fixed_params precedence, single-indicator skip with warning, and small-n skip with warning. Full CPU suite (485 tests) green; pixi run ty clean; prek clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Remove the analytical Newey-McFadden sandwich path entirely. AF §4.2 explicitly recommends a score bootstrap (Armstrong, Bertanha, Hong 2014) for inference because the closed-form variance ignores estimation error in earlier-period nuisance parameters tau_{t-1}, ..., tau_1, and is therefore incorrect for any t >= 1. With only one inference path, no user can accidentally select a "wrong" option. Surface changes: - `compute_af_standard_errors` is now THE inference function. It implements the score bootstrap with `n_boot=10000` default. The `method=` kwarg is gone. - `compute_af_bootstrap_se` is removed (renamed into `compute_af_standard_errors`). - `AFInferenceResult` is reshaped: drops `period_results`, `method`; adds `replicate_params`, `n_clusters`, `n_boot`. `vcov` is now computed from `replicate_params.cov(ddof=1)` so SEs and vcov are internally consistent. - `AFPeriodInferenceResult` and `AFBootstrapResult` are removed from the public API. Internal callers use a private `_PeriodScoreInfo` NamedTuple. Internals removed: - `_compute_full_sandwich` (the analytical Newey-McFadden assembly) - `_assemble_full_vcov` (the cross-period vcov assembler) - `_FreeVcovBlock` dataclass Internals kept (no longer used today; scaffolding for the Phase B re-Spearman bootstrap that will land in a follow-up): - `_period_t_per_obs_loglike_full` - `_build_initial_state_cond_dist_jax` - `_extract_chain_link_jax` - `_extract_prev_meas_info_jax` - `_build_prev_dist_arrays` Phase B caveat: when `af_options.two_stage_measurement=True` the Spearman-stage measurement system is held fixed across replicates; SEs ignore Stage-1 sampling variance. Documented in the `compute_af_standard_errors` docstring; users wanting fully-correct Phase B SEs should run a parametric bootstrap until the per-replicate re-Spearman extension lands. Tests: replaced 22 sandwich/bootstrap-comparison tests with 14 focused bootstrap tests covering shape, symmetry, replicate constancy on pinned params, vcov-vs-SE consistency, sample-size shrinkage. Full CPU suite (477 tests) green; pixi run ty clean; prek run --all-files clean. Net change: 661 → 1071 lines in `af/inference.py` becomes 661 → 597 (and the wrap-around docstring shrinks substantially); 474-line overall deletion. Breaking change for unmerged `af-estimator` branch (no external callers found in skane-struct-bw or health-cognition). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…licit two_stage_measurement. Two related defaults changes: 1. `AFEstimationOptions.initialization_strategy` now defaults to "moment_based" instead of "constant". The Spearman cross-covariance moment-based seeding strictly improves σ_prod and σ_inv_1 recovery, roughly doubles the σ_inv_0 reasonable rate (≥0.05) on the translog DGP, never regresses any parameter, and adds no inference-framework complications. Legacy constant init remains available via `initialization_strategy="constant"` for regression testing. 2. `two_stage_measurement` no longer has a default. Users must pass it explicitly. The trade-off is real and should be confronted: * `=True`: Spearman pre-step pins σ_meas. Drops σ_inv_0 boundary collapse from ~30-50% to ~15%, brings the mean from 0.04 to 0.092 vs MATLAB AF's 0.095 (truth 0.10). SE caveat: the score bootstrap currently holds Stage-1 outputs fixed, underestimating variance for parameters covarying with σ_meas (notably σ_inv, σ_shock). * `=False`: σ_meas estimated jointly with the other parameters in each period's MLE. Bootstrap captures all variance correctly, but the σ_inv ridge causes frequent boundary collapses on translog-style DGPs. `estimate_af` and `compute_af_standard_errors` no longer construct a default `AFEstimationOptions()` when `af_options=None` — they raise a TypeError pointing the user at the trade-off, since AFEstimationOptions itself can no longer be constructed bare. All existing call sites in tests, sim_repro scripts, and matlab_ces_repro have been updated with `two_stage_measurement=False` to preserve their existing behavior. Real-data applications and new tests of Phase B should pass `=True` explicitly. Tests: 478 passed (full CPU suite), pixi run ty clean, prek clean. Breaking change for unmerged `af-estimator` branch (no external callers on main). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

scripts/marvin/run_three_way_translog_n2k.slurm runs all three estimators on the translog DGP, n=500 panel, 500 sims each, with n_halton=2000 throughout: 1. AF Stage A only (`two_stage_measurement=False`): sigma_meas in the AF MLE chain. 2. AF Stage A + Stage B (`two_stage_measurement=True`): Spearman pre-step pins sigma_meas. 3. CHS (Kalman-filter MLE, runs on CPU). Layout on the sgpu_short node: * GPUs 0,1 → AF Stage A (250 sims each). * GPUs 2,3 → AF Stage A+B (250 sims each). * 8 CHS workers on the 16 CPU cores (~63 sims each, JAX_PLATFORMS=cpu). Output dirs (under SIM_REPRO_OUT): * translog_n500_stagea_h2000/ * translog_n500_stageab_h2000/ * translog_n500_chs/ Wall-time budget 1:30:00 — well within sgpu_short's 3:30 max. Expected runtime: AF ~12 min/variant after JIT, CHS ~25 min total. Coverage report at end shows ok/fail counts per cell. Companion changes (in the unversioned sim_repro/ working dir, rsynced to Marvin alongside this script): sim_sweep.py gains a required --two-stage-measurement / --no-two-stage-measurement flag plus an optional --out-suffix; sim_sweep_chs.py already supports --start / --count for parallel CPU workers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…h2000/ subroot. Avoid colliding with existing translog_n500_chs/ etc. cells that hold prior n_halton=10000 sweep results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Promotes the AF Spearman / multi-indicator moment helpers to a top-level skillmodels.moment_init module (with a backwards-compat shim in skillmodels.af.moment_init), and wires them into a new skillmodels.start_values.get_moment_based_start_params that fills the params_template consumed by the CHS pipeline. The fill is a hybrid: - Per-period Spearman cross-covariance moments seed loadings, meas_sds, and initial_cholcovs diagonals. - OLS on Bartlett-scored factor proxies seeds transition coefficients and shock_sds for linear and translog transition functions (the AMN-flavoured starts AF section 7 recommends, bootstrapped from the Spearman estimates rather than from a separate AMN run). - Fixed entries pinned by the user / model are preserved. - Stagemap equality groups are pooled post-fill. A new EstimationOptions.start_params_strategy flag ('moment_based' default, 'none' for the legacy NaN-template behaviour) controls the wiring through get_maximization_inputs. Adds an end-to-end test that estimate_af runs the full chain on a T=5 model spec and produces the expected per-period results plus chain-link bookkeeping. Adds a new tests-cuda13 pixi env (jax 0.10 + cuda13 wheels) for sweeps on hosts running the CUDA-13 driver. Adds Marvin slurm scripts for the translog 3-way comparison, CHS moment-init GPU sweep, and AF h=10000 sweep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`estimate_af` now accepts an optional `constraints=` list of optimagic Constraint objects. Any `om.EqualityConstraint` whose selector is the standard `select_by_loc(loc=multi_index)` form is recognised as an equality group. After each period's MLE, the helper `_propagate_equality_groups` looks at every equality group: if any member has just been estimated, every other member that isn't already pinned in `fixed_params` gets pinned to that value for subsequent periods. This closes the gap between AF's per-period sequential MLE and applications (e.g., skane-struct-bw) that rely on cross-period equality of shock_sds, transition coefficients, loadings, etc. Within-period equality is unchanged; the per-period optimagic problems handle their own internal constraints. Adds 5 tests (4 unit, 1 end-to-end) covering helper edge cases and a synthetic T=3 chain that confirms the chain returns identical shock_sds for two equality-grouped periods.

hmgaudecker and others added 7 commits April 15, 2026 12:46

Use same uninformed start values for AF and CHS in comparison tests.

7967277

Both estimators now start from: loadings=1, controls=0, everything else=0.5, probability constraints satisfied with equal shares. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

hmgaudecker and others added 3 commits April 15, 2026 20:10

hmgaudecker and others added 14 commits April 22, 2026 12:34

Add CES reparameterisation helper with level-shift accounting.

c58e495

Match MATLAB normalisation: skills period 0 only, investment never.

2e5d85d

Extend MATLAB loglike comparison to all three periods.

74c117c

hmgaudecker and others added 3 commits April 27, 2026 15:32

Use 20k Halton draws in AF-vs-CHS CNLSY comparison.

4f45d01

Match the MATLAB reproduction default. Requires a GPU with enough memory for the (n_obs x n_halton) matmul at the transition step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hmgaudecker mentioned this pull request Apr 30, 2026

Allow fixed entries in probability constraints optimagic-dev/optimagic#683

Open

hmgaudecker and others added 26 commits May 4, 2026 09:59

Use $CONDA_PREFIX/bin/python explicitly in conda slurm script

dfc2dbf

Snellius slurm: skip 'conda activate' (no conda.sh on H100 nodes); se…

3b279ee

…t CONDA_PREFIX/PATH directly

Snellius slurm: trim resources to actually-used (16 CPUs, 96G mem, 8h)

a8e6061

Add Marvin (Bonn HPC) slurm script for translog AF sweep.

4da39c1

Targets the sgpu_short partition (4× A100 80GB per node, 8h limit). Uses pixi (module load Pixi) instead of the manual conda dance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Marvin slurm: write 3-way comparison results to fresh estimates_3way_…

a088278

…h2000/ subroot. Avoid colliding with existing translog_n500_chs/ etc. cells that hold prior n_halton=10000 sweep results. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Antweiler-Freyberger (2025) iterative quadrature estimator#89

Add Antweiler-Freyberger (2025) iterative quadrature estimator#89
hmgaudecker wants to merge 57 commits intomainfrom
af-estimator

hmgaudecker commented Apr 15, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

hmgaudecker commented Apr 15, 2026

Uh oh!

hmgaudecker commented Apr 22, 2026

Uh oh!

hmgaudecker commented Apr 24, 2026

Uh oh!

hmgaudecker commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hmgaudecker commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's done

Still to do

Test plan

Uh oh!

codecov Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hmgaudecker commented Apr 15, 2026

Code review

Uh oh!

hmgaudecker commented Apr 22, 2026

Update — income-conditional initial draws, translog, and time-invariant latents

What changed

Remaining gap for full MATLAB reproduction

Validation

Files touched

Uh oh!

hmgaudecker commented Apr 24, 2026

Code review

Uh oh!

hmgaudecker commented Apr 24, 2026

Code review (full, including low-confidence items)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hmgaudecker commented Apr 15, 2026 •

edited

Loading

codecov Bot commented Apr 15, 2026 •

edited

Loading