Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data

Abstract:

Background: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome.

Methods: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a time-dependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality.

Results: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30-90, 90-180, 180-360, >360 minutes). The estimated VIM of mortality also changed significantly at each time point.

Conclusion: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SL VIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.

Author:

Hubbard, Alan

Munoz, Ivan Diaz

Decker, Anna

Holcomb, John B.

Schreiber, Martin A.

Bulger, Eileen M.

Brasel, Karen J.

Fox, Erin E.

del Junco, Deborah J.

Wade, Charles E.

others

Publication date:

July 1, 2013

Publication type:

Journal Article

Document

School of Public Health