Suppose that we observe a sample of independent and identically distributed realizations of a random variable, and a parameter of interest can be defined as the minimizer, over a suitably defined parameter set, of the expectation of a (loss) function of a candidate parameter value and the random variable. For example, squared error loss in regression or the negative log-density loss in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter set may result in ill-defined or too variable estimators of the parameter of interest. In this article, we propose a cross-validated ε-net estimation method, which uses a collection of submodels and a collection of ε-nets over each submodel. For each submodel s and each resolution level ε, the minimizer of the empirical risk over the corresponding ε-net is a candidate estimator. Next we select from these estimators (i.e. select the pair (s,ε)) by multi-fold cross-validation. We derive a finite sample inequality that shows that the resulting estimator is as good as an oracle estimator that uses the best submodel and resolution level for the unknown true parameter. We also address the implementation of the estimation procedure, and in the context of a linear regression model we present results of a preliminary simulation study comparing the cross-validated ε-net estimator to the cross-validated L1-penalized least squares estimator (LASSO) and the least angle regression estimator (LARS).
September 25, 2009