Application of two machine learning algorithms to genetic association studies in the presence of covariates.

Bareng AS Nonyane; Andrea S Foulkes; (2008) Application of two machine learning algorithms to genetic association studies in the presence of covariates. BMC genetics, 9 (1). 71-. ISSN 1471-2156 DOI: 10.1186/1471-2156-9-71
Copy

BACKGROUND: Population-based investigations aimed at uncovering genotype-trait associations often involve high-dimensional genetic polymorphism data as well as information on multiple environmental and clinical parameters. Machine learning (ML) algorithms offer a straightforward analytic approach for selecting subsets of these inputs that are most predictive of a pre-defined trait. The performance of these algorithms, however, in the presence of covariates is not well characterized. METHODS AND RESULTS: In this manuscript, we investigate two approaches: Random Forests (RFs) and Multivariate Adaptive Regression Splines (MARS). Through multiple simulation studies, the performance under several underlying models is evaluated. An application to a cohort of HIV-1 infected individuals receiving anti-retroviral therapies is also provided. CONCLUSION: Consistent with more traditional regression modeling theory, our findings highlight the importance of considering the nature of underlying gene-covariate-trait relationships before applying ML algorithms, particularly when there is potential confounding or effect mediation.


picture_as_pdf
1471-2156-9-71.pdf
subject
Published Version
Available under Creative Commons: 3.0

View Download

Atom BibTeX OpenURL ContextObject in Span Multiline CSV OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL EndNote HTML Citation JSON MARC (ASCII) MARC (ISO 2709) METS MODS RDF+N3 RDF+N-Triples RDF+XML RIOXX2 XML Reference Manager Refer Simple Metadata ASCII Citation EP3 XML
Export

Downloads