Provenance of "after the fact" harmonised community-based demographic and HIV surveillance data from ALPHA cohorts
Background: Data about data, metadata, for describing Health and Demographic Surveillance System (HDSS) data have often received insufficient attention. This thesis studied how to develop provenance metadata within the context of HDSS data harmonisation - the network for Analysing Longitudinal Population-based HIV/ AIDS data on Africa (ALPHA). Technologies from the data documentation community were customised, among them: A process model - Generic Longitudinal Business Process Model (GLBPM), two metadata standards - Data Documentation Initiative (DDI) and Standard for Data and Metadata eXchange (SDMX) and a data transformations description language - Structured Data Transform Language (SDTL). Methods: A framework with three complementary facets was used: Creating a recipe for annotating primary HDSS data using the GLBPM and DDI; Approaches for documenting data transformations. At a business level, prospective and retrospective documentation using GLBPM and DDI and retrospectively recovering the more granular details using SDMX and SDTL; Requirements analysis for a user-friendly provenance metadata browser. Results: A recipe for the annotation of HDSS data was created outlining considerations to guide HDSS on metadata entry, staff training and software costs. Regarding data transformations, at a business level, a specialised process model for the HDSS domain was created. It has algorithm steps for each data transformation sub-process and data inputs and outputs. At a lower level, the SDMX and SDTL captured about 80% (17/21) of the variable level transformations. The requirements elicitation study yielded requirements for a provenance metadata browser to guide developers. Conclusions: This is a first attempt ever at creating detailed metadata for this resource or any other similar resources in this field. HDSS can implement these recipes to document their data. This will increase transparency and facilitate reuse thus potentially bringing down costs of data management. It will arguably promote the longevity and wide and accurate use of these data.
Item Type | Thesis (Doctoral) |
---|---|
Thesis Type | Doctoral |
Thesis Name | PhD |
Contributors | Zaba, B; Todd, J; Slaymaker, E |
Copyright Holders | Chifundo Kanjala |