Bayesian modelling in genetic association studies

Jemma Walker; (2012) Bayesian modelling in genetic association studies. PhD thesis, London School of Hygiene & Tropical Medicine. DOI: 10.17037/PUBS.01635516
Copy

Bayesian Model Selection Approaches are flexible methods that can be utilised to investigate Genetic Association studies in greater detail; enabling us to more accurately pin-point locations of disease genes in complex regions such as the MHC, as well as investigate possible causal pathways between genes, disease and intermediate phenotypes. This thesis is split into two distinct parts. The first uses a Bayesian Multivariate Adaptive Regression Spline Model to search across many highly correlated variants to try to determine which are likely to be the truly causal variants within complex genetic regions and also how each of these variants influences disease status. Specifically, I consider the role of genetic variants within the MHC region on SLE. The second part of the thesis aims to model possible disease pathways between genes, disease, intermediate phenotypes and environmental factors using Bayesian Networks, in particular focussing upon Coronary Heart disease and numerous blood biomarkers and related genes. Bayesian Multivariate Adaptive Regression Spline Model: Genetic association studies have the problem that often many genotypes in strong linkage disequilibrium (LD) are found to be associated with the outcome of interest. This makes it difficult to establish the actual SNP responsible. The aim of this part of the thesis is to investigate Bayesian variable selection methods in regions of high LD. In particular, to investigate SNPs in the major histocompatibility complex (MHC) region associated with systematic lupus erythematosus (SLE). Past studies have found several SNPs in this region to be highly associated with SLE but these SNPs are in high LD with one another. It is desirable to search over all possible regression models in order to find those SNPs that are most important in the prediction of SLE. The Bayesian Multivariate Adapative Regression Splines (BMARS) model used should automatically correct for nearby associated SNPs, and only those directly associated should be included in the model. The BMARS approach will also automatically select the most appropriate disease model for each directly associated variant. It was found that there appear to be 3 separate SNP signals in the MHC region that show association with SLE. The rest of the associations found using simple Frequentist tests are likely to be due to LD with the true signal. Bayesian Networks for Genetic Association Studies: Coronary Heart Disease (CHD) is one of many diseases that result from complicated relationships between both genetic and environmental factors. Identifying causal factors and developing new treatments that target these factors is very difficult. Changes in intermediate phenotypes, or biomarkers, could suggest potential causal pathways, although these have a tendency to group amongst those patients with higher risk of CHD making to difficult to distinguish independent causal relationships. I aim to model disease pathways allowing for intermediate phenotypes as well as genetic and environmental factors. Statistical methodology was developed using directed acyclic graphs (DAGs). Disease outcomes, genes, intermediate phenotypes and possible explanatory variables were represented as nodes in a DAG. Possible models were investigated using Bayesian regression models, based upon the underlying DAG, in a reversible jump MCMC framework. Modelling the data this way allows us to distinguish between direct and indirect effects as well as explore possible directionality of relationships. Since different DAGs can belong to the same equivalence class, some directions of association may become indistinguishable and I am interested in the implications of this. I investigated the integrated associations of genotypes with multiple blood biomarkers linked to CHD risk, focusing particularly on relationships between APOE, CETP and APOB genotypes; HDL- and LDL- cholesterol, triglycerides, C-reactive protein, fibrogen and apolipoproteins A and B. Overview: I will begin by introducing the topics of genetics, statistics and directed acyclic graphs with a background on each (Chapters 2,3 and 4 respectively). Chapter 5 will then detail the analysis and results of the BMARS model. The analysis and results of Bayesian networks for genetic association studies will then be covered in Chapter 6.


picture_as_pdf
558338.pdf
subject
Published Version
Available under Creative Commons: NC-ND 3.0

View Download

Atom BibTeX OpenURL ContextObject in Span Multiline CSV OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL EndNote HTML Citation JSON MARC (ASCII) MARC (ISO 2709) METS MODS RDF+N3 RDF+N-Triples RDF+XML RIOXX2 XML Reference Manager Refer Simple Metadata ASCII Citation EP3 XML
Export

Downloads