Targeted learning of the mean outcome under an optimal dynamic treatment rule


Suppose we observe n independent and identically distributed observations of a time-dependent random variable consisting of baseline covariates, initial treatment and censoring indicator, intermediate covariates, subsequent treatment and censoring indicator, and a final outcome. For example, this could be data generated by a sequentially randomized controlled trial, where subjects are sequentially randomized to a first line and second line treatment, possibly assigned in response to an intermediate biomarker, and are subject to right-censoring. In this article we consider estimation of an optimal dynamic multiple time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, where the candidate rules are restricted to only respond to a user-supplied subset of the baseline and intermediate covariates. This estimation problem is addressed in a statistical model for the data distribution that is nonparametric beyond possible knowledge about the treatment and censoring mechanism, while still providing statistical inference for the mean outcome under the optimal rule. This contrasts from the current literature that relies on parametric assumptions. For the sake of presentation, we first consider the case that the treatment/censoring is only assigned at a single time-point, and subsequently, we cover the multiple time-point case. We characterize the optimal dynamic treatment as a statistical target parameter in the nonparametric statistical model, and we propose highly data adaptive estimators of this optimal dynamic regimen, utilizing sequential loss-based super-learning of sequentially defined (so called) blip-functions, based on newly proposed loss-functions. We also propose a cross-validation selector (among candidate estimators of the optimal dynamic regimens) based on a cross-validated targeted minimum loss-based estimator of the mean outcome under the candidate regimen, thereby aiming directly to select the candidate estimator that maximizes the mean outcome. We also establish that the mean of the counterfactual outcome under the optimal dynamic treatment is a pathwise differentiable parameter under assumptions, and develop a targeted minimum loss-based estimator (TMLE) of this target parameter. We establish asymptotic linearity and statistical inference based on this targeted minimum loss-based estimator under specified conditions. In a sequentially randomized trial the statistical inference essentially only relies upon a second order difference between the estimator of the optimal dynamic treatment and the optimal dynamic treatment to be asymptotically negligible, which may be a problematic condition when the rule is based on multivariate time-dependent covariates. To avoid this condition, we also develop targeted minimum loss based estimators and statistical inference for data adaptive target parameters that are defined in terms of the mean outcome under the {\em estimate} of the optimal dynamic treatment. In particular, we develop a novel cross-validated TMLE approach that provides asymptotic inference under minimal conditions, avoiding the need for any empirical process conditions. For the sake of presentation, in the main part of the article we focus on two-time point interventions, but the results are generalized to general multiple time point interventions in the appendix.

M. J. van der Laan
A. R. Luedtke
Publication date: 
September 3, 2014
Publication type: 
Journal Article