Exploration of Machine Learning and Statistical Techniques in Development of a Low-Cost Screening Method Featuring the Global Diet Quality Score for Detecting Prediabetes in Rural India.

Nick Birk

; Mika Matsuzaki

; Teresa T Fung

; Yanping Li

; Carolina Batis; Meir J Stampfer; Megan Deitchler; Walter C Willett

; Wafaie W Fawzi

; Sabri Bromage; +3 more... Sanjay Kinra

; Shilpa N Bhupathiraju; Erin Lake; (2021) Exploration of Machine Learning and Statistical Techniques in Development of a Low-Cost Screening Method Featuring the Global Diet Quality Score for Detecting Prediabetes in Rural India. The Journal of nutrition, 151 (12 Sup). 110S-118S. ISSN 0022-3166 DOI: 10.1093/jn/nxab281

Copy

BACKGROUND: The prevalence of type 2 diabetes has increased substantially in India over the past 3 decades. Undiagnosed diabetes presents a public health challenge, especially in rural areas, where access to laboratory testing for diagnosis may not be readily available. OBJECTIVES: The present work explores the use of several machine learning and statistical methods in the development of a predictive tool to screen for prediabetes using survey data from an FFQ to compute the Global Diet Quality Score (GDQS). METHODS: The outcome variable prediabetes status (yes/no) used throughout this study was determined based upon a fasting blood glucose measurement ≥100 mg/dL. The algorithms utilized included the generalized linear model (GLM), random forest, least absolute shrinkage and selection operator (LASSO), elastic net (EN), and generalized linear mixed model (GLMM) with family unit as a (cluster) random (intercept) effect to account for intrafamily correlation. Model performance was assessed on held-out test data, and comparisons made with respect to area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. RESULTS: The GLMM, GLM, LASSO, and random forest modeling techniques each performed quite well (AUCs >0.70) and included the GDQS food groups and age, among other predictors. The fully adjusted GLMM, which included a random intercept for family unit, achieved slightly superior results (AUC of 0.72) in classifying the prediabetes outcome in these cluster-correlated data. CONCLUSIONS: The models presented in the current work show promise in identifying individuals at risk of developing diabetes, although further studies are necessary to assess other potentially impactful predictors, as well as the consistency and generalizability of model performance. In addition, future studies to examine the utility of the GDQS in screening for other noncommunicable diseases are recommended.

Item Type	Article
Elements ID	167108

Explore Further

Dept of Non-Communicable Disease Epidemiology

The Journal of nutrition

picture_as_pdf: nxab281.pdf
subject: Published Version
: Available under Creative Commons: 4.0

View

Download

Atom

BibTeX

OpenURL ContextObject in Span

Multiline CSV

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

EndNote

HTML Citation

JSON

MARC (ASCII)

MARC (ISO 2709)

METS

MODS

RDF+N3

RDF+N-Triples

RDF+XML

RIOXX2 XML

Reference Manager

Refer

Simple Metadata

ASCII Citation

EP3 XML

Export

Downloads