Abstract:
We consider choosing an estimator or model from a given class by cross validation consisting of holding a nonneglible fraction of the observations out as a test set. We derive bounds that show that the risk of the resulting procedure is (up to a constant) smaller than the risk of an oracle plus an error which typically grows logarithmically with the number of estimators in the class. We extend the results to penalized cross validation in order to control unbounded loss functions. Applications include regression with squared and absolute deviation loss and classification under Tsybakov’s condition.
Publication date:
January 1, 2006
Publication type:
Journal Article