Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records.

Gareth Hagger-Johnson ORCID logo; Katie Harron; Tom Fleming; Ruth Gilbert; Harvey Goldstein; Rebecca Landy; Roger C Parslow; (2015) Data linkage errors in hospital administrative data when applying a pseudonymisation algorithm to paediatric intensive care records. BMJ open, 5 (8). e008118-. ISSN 2044-6055 DOI: 10.1136/bmjopen-2015-008118
Copy

OBJECTIVES: Our aim was to estimate the rate of data linkage error in Hospital Episode Statistics (HES) by testing the HESID pseudoanonymisation algorithm against a reference standard, in a national registry of paediatric intensive care records. SETTING: The Paediatric Intensive Care Audit Network (PICANet) database, covering 33 paediatric intensive care units in England, Scotland and Wales. PARTICIPANTS: Data from infants and young people aged 0-19 years admitted between 1 January 2004 and 21 February 2014. PRIMARY AND SECONDARY OUTCOME MEASURES: PICANet admission records were classified as matches (records belonging to the same patient who had been readmitted) or non-matches (records belonging to different patients) after applying the HESID algorithm to PICANet records. False-match and missed-match rates were calculated by comparing results of the HESID algorithm with the reference standard PICANet ID. The effect of linkage errors on readmission rate was evaluated. RESULTS: Of 166,406 admissions, 88,596 were true matches (where the same patient had been readmitted). The HESID pseudonymisation algorithm produced few false matches (n=176/77,810; 0.2%) but a larger proportion of missed matches (n=3609/88,596; 4.1%). The true readmission rate was underestimated by 3.8% due to linkage errors. Patients who were younger, male, from Asian/Black/Other ethnic groups (vs White) were more likely to experience a false match. Missed matches were more common for younger patients, for Asian/Black/Other ethnic groups (vs White) and for patients whose records had missing data. CONCLUSIONS: The deterministic algorithm used to link all episodes of hospital care for the same patient in England has a high missed match rate which underestimates the true readmission rate and will produce biased analyses. To reduce linkage error, pseudoanonymisation algorithms need to be validated against good quality reference standards. Pseudonymisation of data 'at source' does not itself address errors in patient identifiers and the impact these errors have on data linkage.


picture_as_pdf
bmjopen-2015-008118.PMC4550723.pdf
subject
Published Version
Available under Creative Commons: 3.0

View Download

Atom BibTeX OpenURL ContextObject in Span Multiline CSV OpenURL ContextObject Dublin Core Dublin Core MPEG-21 DIDL EndNote HTML Citation JSON MARC (ASCII) MARC (ISO 2709) METS MODS RDF+N3 RDF+N-Triples RDF+XML RIOXX2 XML Reference Manager Refer Simple Metadata ASCII Citation EP3 XML
Export

Downloads