Application of machine learning methodology to assess the performance of DIABETIMSS program for patients with type 2 diabetes in family medicine clinics in Mexico



The study aimed to assess the performance of a multidisciplinary-team diabetes care program called DIABETIMSS on glycemic control of type 2 diabetes (T2D) patients, by using available observational patient data and machine-learning-based targeted learning methods.


We analyzed electronic health records and laboratory databases from the year 2012 to 2016 of T2D patients from six family medicine clinics (FMCs) delivering the DIABETIMSS program, and five FMCs providing routine care. All FMCs belong to the Mexican Institute of Social Security and are in Mexico City and the State of Mexico. The primary outcome was glycemic control. The study covariates included: patient sex, age, anthropometric data, history of glycemic control, diabetic complications and comorbidity. We measured the effects of DIABETIMSS program through 1) simple unadjusted mean differences; 2) adjusted via standard logistic regression and 3) adjusted via targeted machine learning. We treated the data as a serial cross-sectional study, conducted a standard principal components analysis to explore the distribution of covariates among clinics, and performed regression tree on data transformed to use the prediction model to identify patient sub-groups in whom the program was most successful. To explore the robustness of the machine learning approaches, we conducted a set of simulations and the sensitivity analysis with process-of-care indicators as possible confounders.


The study included 78,894 T2D patients, from which 37,767patients received care through DIABETIMSS. The impact of DIABETIMSS ranged, among clinics, from 2 to 8% improvement in glycemic control, with an overall (pooled) estimate of 5% improvement. T2D patients with fewer complications have more significant benefit from DIABETIMSS than those with more complications. At the FMC’s delivering the conventional model the predicted impacts were like what was observed empirically in the DIABETIMSS clinics. The sensitivity analysis did not change the overall estimate average across clinics.


DIABETIMSS program had a small, but significant increase in glycemic control. The use of machine learning methods yields both population-level effects and pinpoints the sub-groups of patients the program benefits the most. These methods exploit the potential of routine observational patient data within complex healthcare systems to inform decision-makers.

Yue You
Svetlana V. Doubova
Diana Pinto-Masis
Ricardo Pérez-Cuevas
Víctor Hugo Borja-Aburto
Publication date: 
November 12, 2019
Publication type: 
Journal Article