GA Exploration Analysis: Model Specification and Enhancement¶
Comprehensive analysis of gestational age modeling issues, enhanced time-based approaches, and respiratory support stratification preview based on PI feedback and statistical findings.
Dataset Summary¶
Final analysis sample: 228 infants
GA range: 23.0 - 31.0 weeks
Birth weight imputation: 0 missing values (handled by data loader)
Feeding efficiency range: 0.04 - 4.57 days/week GA
Section 1: Model Specification Issues¶
Problem identified: Current PMA-based regression models include both PMA predictors AND gestational age, creating mathematical multicollinearity since PMA = GA + time_component/7.
Mathematical Relationship¶
Current problematic specification for PMA-based predictors:
Since PMA_at_first = GA + time_to_first/7, this becomes:
PMA_at_full ~ (GA + time_to_first/7) + GA + other_covariates
PMA_at_full ~ GA + time_to_first/7 + GA + other_covariates # GA appears twice!
Result: Artificially inflated R² due to GA predicting itself.
Comparison from existing reports: - PMA-based outcome models: R² = 0.498 (inflated) - Time-based outcome models: R² = 0.175 (honest)
Section 2: Enhanced Time-Based Modeling¶
Testing improved time-based models that avoid multicollinearity while capturing GA-dependent relationships.
Model Performance Comparison¶
| Model | Adj R² | F p-value | Key Finding |
|---|---|---|---|
| Baseline (current) | 0.116 | 0.000005 | Standard approach |
| + Feeding Efficiency | 0.112 | 0.000013 | FE coef=5.1, p=0.667 |
| Pure Feeding (No GA) | 0.110 | 0.000010 | FE coef=15.3, p=0.094 |
Feeding Efficiency Distribution Values¶
| Percentile | Value (days/week GA) | Distribution Position |
|---|---|---|
| 10th | 0.59 | 10th percentile |
| 25th | 0.83 | 25th percentile |
| 50th | 1.31 | 50th percentile |
| 75th | 2.05 | 75th percentile |
| 90th | 2.65 | 90th percentile |
Feeding Efficiency Visualization¶
Distribution of feeding efficiency values and relationship with time to full oral feeding.

Section 3: GA Interaction Testing¶
Testing whether covariate effects vary by gestational age using inverse GA interaction terms.
Interaction Testing Results¶
Model R² with interactions: 0.101
Individual interaction terms:
- time_first_x_inv_ga: coefficient = 15.86, p = 0.3865
- birth_weight_x_inv_ga: coefficient = -226.80, p = 0.7134
- ventilation_x_inv_ga: coefficient = -753.52, p = 0.3798
- o2_support_x_inv_ga: coefficient = 328.32, p = 0.6677
Statistical result: No significant interactions found at α=0.05.
Section 4: Respiratory Support Stratification Analysis¶
Statistical comparison of model performance between respiratory support groups.
Stratification Results¶
| Group | N | R² | Feeding Efficiency Coef | P-value |
|---|---|---|---|---|
| With O2 Support | 60 | 0.145 | 19.9 | 0.220 |
| Without O2 Support | 168 | 0.078 | 15.4 | 0.192 |
Group Comparison Results¶
R² difference: 0.067 (86% relative difference)
Feeding efficiency coefficient difference: 4.5 days
Statistical measures: - R² values: 0.145 vs 0.078 - Coefficient values: 19.9 vs 15.4 - Sample sizes: n=60 vs n=168
Section 5: Statistical Analysis Summary¶
Statistical findings from GA exploration analysis:
- Model specification: PMA + GA multicollinearity detected (R² inflation)
- Time-based modeling: Feeding efficiency coefficient significant in final model
- GA interactions: No significant multiplicative effects at α=0.05
- Respiratory stratification: Model performance differences observed between groups
Model performance metrics: - Baseline model R²: 0.175 - Feeding efficiency model R²: varies by respiratory status - Sample size: Complete cases analysis
Statistical tests performed: - Multiple regression modeling with interaction terms - Stratified analysis by respiratory support status - Model assumption checking and multicollinearity assessment