To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health


Two modeling approaches are commonly used to estimate the associations between neighborhood characteristics and individual-level health outcomes in multilevel studies (subjects within neighborhoods). Random effects models (or mixed models) use maximum likelihood estimation. Population average models typically use a generalized estimating equation (GEE) approach. These methods are used in place of basic regression approaches because the health of residents in the same neighborhood may be correlated, thus violating independence assumptions made by traditional regression procedures. This violation is particularly relevant to estimates of the variability of estimates. Though the literature appears to favor the mixed-model approach, little theoretical guidance has been offered to justify this choice. In this paper, we review the assumptions behind the estimates and inference provided by these 2 approaches. We propose a perspective that treats regression models for what they are in most circumstances: reasonable approximations of some true underlying relationship. We argue in general that mixed models involve unverifiable assumptions on the data-generating distribution, which lead to potentially misleading estimates and biased inference. We conclude that the estimation-equation approach of population average models provides a more useful approximation of the truth.

Hubbard, A. E.
Ahern, J.
Fleischer, N. L.
Laan, M. V.
Lippman, S. A.
Jewell, N.
Bruckner, T.
Satariano, W. A.
Publication date: 
July 21, 2010
Publication type: 
Journal Article