Multiple Imputation for Individual Patient Data Meta-Analyses.

M Quartagno; (2016) Multiple Imputation for Individual Patient Data Meta-Analyses. PhD thesis, London School of Hygiene & Tropical Medicine. DOI: 10.17037/PUBS.03141186

Copy

The term meta-analysis refers to a set of statistical techniques for combining findings from different studies in order to draw more definitive conclusions about some treatment or exposure effect of interest in a particular context. Recently, meta-analyses which aim to combine the individual observations collected in each study, instead of simple summary measures, have been gaining in popularity in medical research. The main advantage of this so-called Individual Patient Data Meta-Analyses (IPD-MA) is that they have much more statistical power to investigate heterogeneity of the contributing studies and to explore treatment covariate effects. Unfortunately, missing data are a common problem that affects nearly every dataset in clinical or epidemiological studies and therefore also the meta-analyses of such datasets. When not handled properly, missing data can lead to invalid inferences and therefore a lot of research work has focussed on deriving, implementing and disseminating appropriate methods. The motivation for this thesis comes from two IPD-MA, called INDANA and MAGGIC. Some challenges introduced by missing data in these projects include the presence of wholly missing variables in some studies, the variety of types of partially observed variables and the presence of interactions and non-linearities in the substantive models of interest. In this thesis we propose a Joint Modelling Multiple Imputation (JM-MI) approach to overcome these issues. Motivated by the lack of available software, in the first part of this thesis we develop and describe jomo, a new R package for Multilevel MI. A key feature of jomo compared to other packages for MI, is that it allows for the presence of random, or fixed, study-specific covariance matrices in the imputation model, therefore allowing for heteroscedasticity when imputing. Successively we use this package to prove how our proposed method can be as good as standard methods used nowadays to treat missing data in IPD-MA with partially observed continuous variables. Furthermore we show how it performs in more challenging situations, i.e. to impute missing data in studies with few observations or even with systematically missing variables. We then extend the method to include partially observed variables that are not continuous, developing and evaluating a strategy based on latent normal variables to impute categorical data. Finally we use the methods introduced to impute missing data in the two motivating metaanalyses, INDANA and MAGGIC.