Skip to content

[wip] hea integration (for GAM fit)#21

Open
huangziwei wants to merge 29 commits into
mainfrom
gam
Open

[wip] hea integration (for GAM fit)#21
huangziwei wants to merge 29 commits into
mainfrom
gam

Conversation

@huangziwei

Copy link
Copy Markdown
Collaborator

in #18 I added hea as dependency for LCRegression, which just a wrapper around lm() and some aesthetics on top. As gam also reached some maturity, it can actually be used as an "automatic harmonic-order selector" (with s(bs='cc')), instead of guessing and hand-rolling how many combos of cos/sin should be used. Plus, hea also offers a full set of tidyverse data frame manipulation + fluent API of ggplot which can be used for I/O, data transformation, and plotting, but that's for later.

I am not sure if this is a good call yet, as hea is still a toy project (mostly Agent-driven, unlike pycircstat2 which was coded by hand since 2022; but yes, I also have been using more and more LLMs on this codebase; the difference is: agents work on the code I wrote before, mostly review and fix bugs; for hea, it's 99% generated). I think I need to put this here to be transparent.

But here's what you can get from hea:

import numpy as np
import polars as pl
import matplotlib.pyplot as plt
from pycircstat2 import load_data

from pycircstat2.regression import LCRegression

df = (
    load_data("lung_deaths", source="pewsey")
        .mutate(θ=(np.pi / 6) * pl.col("month"))
        .rename({"deaths": "y"})
        .filter(~((pl.col("month") == 2) & pl.col("year").is_in([1976, 1979])))
)

lc = LCRegression(formula="y ~ cos(θ) + sin(θ)", data=df)
lc.summary()

which will output:

Linear-Circular Regression

Formula: y ~ cos(θ) + sin(θ)

Residuals:
    Min      1Q Median     3Q    Max
-402.64 -121.16 -15.45 136.53 459.72

Coefficients:
             Estimate  Std. Error  CI[2.5%]  CI[97.5]%  t value  Pr(>|t|)
(Intercept)   2122.26       22.37    2077.6     2166.9    94.86    <2e-16  ***
cos(θ)         451.32       31.41     388.6      514.0    14.37    <2e-16  ***
sin(θ)         597.02       31.87     533.4      660.6    18.73    <2e-16  ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

n = 70, p = 3, Residual SE = 187.024 on 67 DF
R-Squared = 0.8904, adjusted R-Squared = 0.8871
F-statistics = 272.0277 on 2 and 67 DF, p-value: 6.915228e-33

Log Likelihood = -463.9790, AIC = 935.9581, BIC = 944.9521

Harmonic decomposition:
term    amplitude  SE       CI[2.5%, 97.5%]       phase   SE      CI[2.5%, 97.5%] 
----------------------------------------------------------------------------------
θ, k=1  748.4167   32.0873  [685.5267, 811.3066]  0.9235  0.0417  [0.8418, 1.0052]
Phase in radians; SEs and CIs from the delta method on (cos, sin) coefficients.
output

and you can get do a GAM fit:

from hea.models import gam

g= gam("y ~ s(θ, bs='cc')", data=df, knots={"θ": [0., 2 * np.pi]}, method="REML")
g.summary()
Family: gaussian
Link function: identity

Formula: y ~ s(θ, bs='cc')

Parametric coefficients:
             Estimate  Std. Error  t value  Pr(>|t|)
(Intercept)   2101.04       19.66    106.9    <2e-16  ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
        edf  Ref.df      F  p-value
s(θ)  6.258   8.000  90.11   <2e-16  ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) = 0.913  Deviance explained = 92.1%
-REML = 462.42  Scale est. = 27060  n = 70
output output

The plan for this RP, to the very least, is to integrate gam directly into LCRegression(). It can be easily (likely) extended to CCRegression. But for CLRegression, we also need to work on some refactoring of the distributions (to fit the hea.family standard).

huangziwei added 29 commits June 9, 2026 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant