CEA Imputation Validation Study Report

Response to Reviewer Concerns Regarding CEA Imputation

Executive Summary

This report presents findings from a comprehensive sensitivity analysis conducted to address reviewer concerns that the weak calcitonin-CEA correlation (r=0.1158, n=12) may undermine the reliability of CEA imputation in our MEN2 prediction model.

Key Finding: CEA contribution is model-dependent. For LightGBM on the expanded dataset, CEA improves accuracy from 92.86% to 96.19%, and MICE+PMM is the strongest imputation strategy. For XGBoost on the original dataset, removing CEA improves accuracy from 83.33% to 90.00% while preserving 100.00% recall.

Reviewer Concern

"The calcitonin-CEA correlation is weak (r=0.24, n=34). This modest correlation raises concerns about the reliability of MICE+PMM imputation for CEA values, which may introduce noise or bias into the prediction model."

Methodology

Study Design

We conducted a two-part sensitivity analysis across:

5 machine learning models: Logistic Regression, Random Forest, XGBoost, LightGBM, SVM
2 datasets: Original (149 patients) and Expanded (1,047 samples with synthetic augmentation)
2 study options: CEA presence/absence and imputation method comparison

Option A: CEA Presence/Absence Comparison

Compare model performance WITH vs WITHOUT CEA features to quantify CEA's actual contribution to prediction.

Option B: Imputation Method Comparison

Test 5 different imputation strategies to assess robustness:

Method	Description
MICE+PMM	Multiple Imputation by Chained Equations with Predictive Mean Matching
Mean Imputation	Replace missing CEA with mean of observed values
Median Imputation	Replace missing CEA with median of observed values
Zero Imputation	Replace missing CEA with zero (conservative lower bound)
Complete Case	Use only patients with observed CEA values

Statistical Approach

80/20 stratified train-test split with fixed random seed (42)
SMOTE applied only to training data to prevent data leakage
StandardScaler fitted only on training data
Consistent evaluation metrics: Accuracy, Recall, F1, ROC-AUC

Results

Finding 1: CEA Effects Depend on the Model

The effect of CEA differs in the most relevant screening and triage configurations:

Model	Dataset	With CEA	Without CEA	Accuracy Change
XGBoost	Original	83.33%	90.00%	+6.67% without CEA
LightGBM	Expanded	96.19%	92.86%	-3.33% without CEA

Interpretation: CEA is not universally beneficial or universally unnecessary. It is dispensable for the screening-safe XGBoost model, but helpful for the highest-accuracy LightGBM model.

Finding 2: Imputation Method Choice Affects Accuracy-Recall Tradeoffs

The two most relevant models respond differently to imputation:

Method	XGBoost Original Accuracy / Recall	LightGBM Expanded Accuracy / Recall
MICE+PMM (Current)	83.33% / 100.00%	96.19% / 90.20%
Mean Imputation	86.67% / 93.33%	93.81% / 86.27%
Median Imputation	86.67% / 93.33%	92.86% / 90.20%
Zero Imputation	86.67% / 93.33%	93.81% / 86.27%
Complete Case	33.33% / 50.00%	69.23% / 66.67%

Interpretation: For LightGBM on the expanded dataset, MICE+PMM is the best overall option. For XGBoost on the original dataset, MICE+PMM preserves perfect recall, while simpler imputations raise accuracy slightly at the cost of sensitivity.

Finding 3: Screening and Triage Favor Different CEA Strategies

Clinical screening prioritizes recall, while triage prioritizes overall discrimination:

Model	Best Recall Setting	Best Accuracy Setting
XGBoost (Original)	100.00% recall with CEA or without CEA	90.00% accuracy without CEA
LightGBM (Expanded)	90.20% recall with CEA or median imputation	96.19% accuracy with MICE+PMM

Interpretation: The preferred CEA strategy depends on the intended use case. For screening, CEA is not required in the best-performing XGBoost model. For highest-accuracy triage, CEA should be retained in LightGBM.

Cross-Model Consistency

The findings are directionally consistent across the broader benchmark in showing that CEA effects are modest relative to the main predictive structure, but the sign of the effect is not identical:

Model	Dataset	With CEA	Without CEA	Delta Accuracy
Logistic Regression	Original	66.67%	73.33%	+6.67%
Random Forest	Original	83.33%	73.33%	-10.00%
XGBoost	Original	83.33%	90.00%	+6.67%
LightGBM	Expanded	96.19%	92.86%	-3.33%
SVM	Expanded	85.71%	89.52%	+3.81%

Pattern Summary:

Finding	XGBoost Original	LightGBM Expanded
Best use case	Screening / recall	Accuracy / triage
CEA required?	No	Yes, helpful
Best imputation by accuracy	No CEA	MICE+PMM
Best imputation by recall	With or without CEA	MICE+PMM or median

Response to Reviewer Critique

Claim: "Weak CEA correlation undermines imputation reliability"

Finding: The weak correlation does not invalidate the model. Instead, it means CEA must be interpreted in a model-specific way. In XGBoost on the original dataset, CEA can be omitted without harming recall. In LightGBM on the expanded dataset, CEA improves overall performance and MICE+PMM is the preferred strategy.

Claim: "MICE+PMM may introduce noise or bias"

Finding: Imputation effects are measurable but not catastrophic. For LightGBM-expanded, MICE+PMM gives the best overall result. For XGBoost-original, simpler imputations trade a small gain in accuracy for lower recall.

Clinical Rationale for Including CEA

CEA remains clinically relevant even when not required for the strongest screening configuration:

1. Calcitonin Has Specificity Limitations

Elevated calcitonin levels can occur in many conditions other than MTC:

Hypercalcemia and hypergastrinemia
Other neuroendocrine tumors
Kidney insufficiency
Papillary and follicular thyroid carcinomas
Goiter and chronic autoimmune thyroiditis
Medications (omeprazole, beta-blockers)

2. CEA Adds Complementary Clinical Value

Prognostic value: CEA doubling time helps assess disease aggressiveness
Detection of aggressive MTC: Rising CEA without calcitonin change may indicate poorly differentiated MTC
Clinical guidelines recommend both: Current practice measures both markers for comprehensive MTC evaluation

3. Combined Use is Clinical Standard of Care

Clinical guidelines recommend measuring both serum calcitonin AND CEA together because both markers' doubling times serve as powerful predictors of recurrence and mortality. Including CEA aligns with standard clinical practice for MEN2 management.

Methodological Validity Assessment

Concern	Our Implementation	Verdict
Data leakage in splits	Train-test split before any processing	Valid
Feature scaling leakage	Scaler fitted only on training data	Valid
SMOTE applied correctly	Applied after split, training only	Valid
Multiple imputation methods	5 strategies tested systematically	Valid
Multiple model validation	5 diverse algorithms tested	Valid
Reproducibility	Fixed random seed (42)	Valid

Conclusion: The validation study methodology is sound and results are reproducible.

Conclusions

CEA effects are model-dependent. CEA improves the highest-accuracy LightGBM model but is not required for the screening-safe XGBoost model.
MICE+PMM remains the preferred imputation strategy for LightGBM on the expanded dataset. It delivers the strongest overall accuracy and F1 performance.
The weak calcitonin-CEA correlation does not compromise validity. It changes how CEA should be interpreted, but does not invalidate the models.
Clinical safety is maintained in the preferred screening model. XGBoost on the original dataset preserves 100% recall with or without CEA.
Results remain clinically interpretable across algorithms, but the strongest deployment conclusions come from XGBoost-original for screening and LightGBM-expanded for accuracy.
CEA inclusion remains clinically justified because calcitonin alone has specificity limitations and combined assessment is clinical standard of care.

Appendix: Detailed Results

XGBoost (Original Dataset) - Option A

Configuration	Accuracy	Recall	F1 Score	ROC-AUC
With CEA Features	83.33%	100.00%	0.8571	0.9111
Without CEA Features	90.00%	100.00%	0.9091	0.9378

LightGBM (Expanded Dataset) - Option B

Imputation Method	Accuracy	Recall	F1 Score	ROC-AUC
MICE+PMM (Current)	96.19%	90.20%	0.9200	0.9908
Mean Imputation	93.81%	86.27%	0.8713	0.9819
Median Imputation	92.86%	90.20%	0.8598	0.9817
Zero Imputation	93.81%	86.27%	0.8713	0.9801
Complete Case	69.23%	66.67%	0.6667	0.7857

Study Files

Complete results for all 10 model-dataset combinations are available in:

results/cea_validation/{model}_{dataset}_cea_validation.txt
results/cea_validation/{model}_{dataset}_option_a.csv
results/cea_validation/{model}_{dataset}_option_b.csv

Methodology: Systematic imputation method comparison with consistent train-test protocols.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CEA Imputation Validation Study Report

Response to Reviewer Concerns Regarding CEA Imputation

Executive Summary

Reviewer Concern

Methodology

Study Design

Option A: CEA Presence/Absence Comparison

Option B: Imputation Method Comparison

Statistical Approach

Results

Finding 1: CEA Effects Depend on the Model

Finding 2: Imputation Method Choice Affects Accuracy-Recall Tradeoffs

Finding 3: Screening and Triage Favor Different CEA Strategies

Cross-Model Consistency

Response to Reviewer Critique

Claim: "Weak CEA correlation undermines imputation reliability"

Claim: "MICE+PMM may introduce noise or bias"

Clinical Rationale for Including CEA

Methodological Validity Assessment

Conclusions

Appendix: Detailed Results

XGBoost (Original Dataset) - Option A

LightGBM (Expanded Dataset) - Option B

Study Files

FilesExpand file tree

cea_imputation_validation.md

Latest commit

History

cea_imputation_validation.md

File metadata and controls

CEA Imputation Validation Study Report

Response to Reviewer Concerns Regarding CEA Imputation

Executive Summary

Reviewer Concern

Methodology

Study Design

Option A: CEA Presence/Absence Comparison

Option B: Imputation Method Comparison

Statistical Approach

Results

Finding 1: CEA Effects Depend on the Model

Finding 2: Imputation Method Choice Affects Accuracy-Recall Tradeoffs

Finding 3: Screening and Triage Favor Different CEA Strategies

Cross-Model Consistency

Response to Reviewer Critique

Claim: "Weak CEA correlation undermines imputation reliability"

Claim: "MICE+PMM may introduce noise or bias"

Clinical Rationale for Including CEA

Methodological Validity Assessment

Conclusions

Appendix: Detailed Results

XGBoost (Original Dataset) - Option A

LightGBM (Expanded Dataset) - Option B

Study Files