Skip to content

GA Exploration Analysis: Model Specification and Enhancement

Comprehensive analysis of gestational age modeling issues, enhanced time-based approaches, and respiratory support stratification preview based on PI feedback and statistical findings.

Dataset Summary

Final analysis sample: 228 infants

GA range: 23.0 - 31.0 weeks

Birth weight imputation: 0 missing values (handled by data loader)

Feeding efficiency range: 0.04 - 4.57 days/week GA

Section 1: Model Specification Issues

Problem identified: Current PMA-based regression models include both PMA predictors AND gestational age, creating mathematical multicollinearity since PMA = GA + time_component/7.

Mathematical Relationship

Current problematic specification for PMA-based predictors:

PMA_at_full ~ PMA_at_first + GA + other_covariates

Since PMA_at_first = GA + time_to_first/7, this becomes:

PMA_at_full ~ (GA + time_to_first/7) + GA + other_covariates
PMA_at_full ~ GA + time_to_first/7 + GA + other_covariates  # GA appears twice!

Result: Artificially inflated R² due to GA predicting itself.

Comparison from existing reports: - PMA-based outcome models: R² = 0.498 (inflated) - Time-based outcome models: R² = 0.175 (honest)

Section 2: Enhanced Time-Based Modeling

Testing improved time-based models that avoid multicollinearity while capturing GA-dependent relationships.

Model Performance Comparison

Model Adj R² F p-value Key Finding
Baseline (current) 0.116 0.000005 Standard approach
+ Feeding Efficiency 0.112 0.000013 FE coef=5.1, p=0.667
Pure Feeding (No GA) 0.110 0.000010 FE coef=15.3, p=0.094

Feeding Efficiency Distribution Values

Percentile Value (days/week GA) Distribution Position
10th 0.59 10th percentile
25th 0.83 25th percentile
50th 1.31 50th percentile
75th 2.05 75th percentile
90th 2.65 90th percentile

Feeding Efficiency Visualization

Distribution of feeding efficiency values and relationship with time to full oral feeding.

feeding_efficiency_analysis.png

Section 3: GA Interaction Testing

Testing whether covariate effects vary by gestational age using inverse GA interaction terms.

Interaction Testing Results

Model R² with interactions: 0.101

Individual interaction terms:

  • time_first_x_inv_ga: coefficient = 15.86, p = 0.3865
  • birth_weight_x_inv_ga: coefficient = -226.80, p = 0.7134
  • ventilation_x_inv_ga: coefficient = -753.52, p = 0.3798
  • o2_support_x_inv_ga: coefficient = 328.32, p = 0.6677

Statistical result: No significant interactions found at α=0.05.

Section 4: Respiratory Support Stratification Analysis

Statistical comparison of model performance between respiratory support groups.

Stratification Results

Group N R² Feeding Efficiency Coef P-value
With O2 Support 60 0.145 19.9 0.220
Without O2 Support 168 0.078 15.4 0.192

Group Comparison Results

R² difference: 0.067 (86% relative difference)

Feeding efficiency coefficient difference: 4.5 days

Statistical measures: - R² values: 0.145 vs 0.078 - Coefficient values: 19.9 vs 15.4 - Sample sizes: n=60 vs n=168

Section 5: Statistical Analysis Summary

Statistical findings from GA exploration analysis:

  1. Model specification: PMA + GA multicollinearity detected (R² inflation)
  2. Time-based modeling: Feeding efficiency coefficient significant in final model
  3. GA interactions: No significant multiplicative effects at α=0.05
  4. Respiratory stratification: Model performance differences observed between groups

Model performance metrics: - Baseline model R²: 0.175 - Feeding efficiency model R²: varies by respiratory status - Sample size: Complete cases analysis

Statistical tests performed: - Multiple regression modeling with interaction terms - Stratified analysis by respiratory support status - Model assumption checking and multicollinearity assessment