Background: In generalized epidemic settings, strategies are needed to prioritize individuals at higher risk of human immunodeficiency virus (HIV) acquisition for prevention services. We used population-level HIV testing data from rural Kenya and Uganda to construct HIV risk scores and assessed their ability to identify seroconversions.
Methods: During 2013-2017, >75% of residents in 16 communities in the SEARCH study were tested annually for HIV. In this population, we evaluated 3 strategies for using demographic factors to predict the 1-year risk of HIV seroconversion: membership in ≥1 known "risk group" (eg, having a spouse living with HIV), a "model-based" risk score constructed with logistic regression, and a "machine learning" risk score constructed with the Super Learner algorithm. We hypothesized machine learning would identify high-risk individuals more efficiently (fewer persons targeted for a fixed sensitivity) and with higher sensitivity (for a fixed number targeted) than either other approach.
Results: A total of 75 558 persons contributed 166 723 person-years of follow-up; 519 seroconverted. Machine learning improved efficiency. To achieve a fixed sensitivity of 50%, the risk-group strategy targeted 42% of the population, the model-based strategy targeted 27%, and machine learning targeted 18%. Machine learning also improved sensitivity. With an upper limit of 45% targeted, the risk-group strategy correctly classified 58% of seroconversions, the model-based strategy 68%, and machine learning 78%.
Conclusions: Machine learning improved classification of individuals at risk of HIV acquisition compared with a model-based approach or reliance on known risk groups and could inform targeting of prevention strategies in generalized epidemic settings.
Clinical trials registration: NCT01864603.
Keywords: HIV prevention; HIV risk score; PrEP; SEARCH Study; clinical prediction rule.