Using Machine Learning to Identify Subgroups with the Highest Expected Benefit in a 1 Population-based Water, Sanitation, Handwashing, and Nutrition Intervention

Abstract: 

Understanding who benefits most from investments in water, sanitation, and hygiene (WaSH) interventions can elucidate causal pathways, uncover complex interactions between population characteristics and interventions, and inform targeted implementation. We applied machine learning to identify and describe households of children that benefited most from WaSH and nutrition interventions. We used causal forests and baseline characteristics of pregnant women enroled in a trial in Bangladesh (2013-2015) to test for heterogenous treatment effects of the primary trial outcomes at two years (length-for-age Z-score [LAZ-score] and diarrhoea prevalence) and one secondary outcome (child development [EASQ Z-score]) for each treatment-outcome 55 combination. We split households into three groups based on predicted treatment effect 56 magnitude and compared characteristics of those that benefitted the most (Tercile 3) versus the 57 least (Tercile 1). 58 Results: Heterogeneity was detected in the effect of Sanitation on EASQ Z-score, compared to 59 Control; children in Tercile 3 were estimated to gain 0.51 SD (95% CI: 0.35, 0.67) whereas 60 children in Tercile 1 were estimated to have no benefit. At baseline, households of children in 61 Tercile 3 were more likely to report that chickens always entered the house (85% vs. 4%) and 62 had animal feces observed in the child’s play area (84% vs. 18%) when compared with Tercile 1. 63 Tercile 3 households also owned less land and assets and lived further from Dhaka, any 64 population center, or a market. We did not detect heterogeneity for any other treatment-outcome 65 comparison.

Author: 
Caitlin Hemlock
Laura H Kwong
Lia C. H. Fernald
John M Colford
Fahmida Tofail
Mahbubur Rahman
Sarker Parvez
Stephen P Luby
Andrew Mertens
Publication date: 
June 18, 2025
Publication type: 
Journal Article