Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British health-related studies.

K Nanchahal

; P Mangtani

; M Alston; I dos Santos Silva

; (2001) Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British health-related studies. Journal of public health medicine, 23 (4). pp. 278-285. ISSN 0957-4832 DOI: 10.1093/pubmed/23.4.278

Copy

BACKGROUND: Studies on ethnic variations in health have played an important role in aetiological and health services research. Most routine datasets, however, do not include information on ethnicity. South Asians, one of the largest minority ethnic groups in Britain, have distinctive names that also allow differentiation of the main sub-groups with their important differences in health-related exposures and disease risks. METHODS: A computerized name recognition algorithm (SANGRA) was developed incorporating directories of South Asian first names and surnames together with their religious and linguistic origin. SANGRA was validated using health-related data with self-ascribed information on ethnicity. RESULTS: SANGRA was successful in recognizing South Asian origin in reference datasets, with sensitivity of 89-96 per cent, specificity of 94-98 per cent, positive predictive value (PPV) of 80-89 per cent and negative predictive value (NPV) of 98-99 per cent. Religious origin was correctly assigned in the majority of cases: sensitivity, specificity and PPV were 94 per cent, 91 per cent and 90 per cent for Hindus; 90 per cent, 99 per cent and 98 per cent for Muslims; and 76 per cent, 99 per cent and 94 per cent for Sikhs. SANGRA correctly identified 76 per cent Gujerati and 70 per cent Punjabi names, although only 62 per cent of Gujerati names were sufficiently distinct to be allocated to the Gujerati-only category and only 53 per cent Punjabi names were allocated to the Punjabi-only category. However, specificity and PPV were high for both languages (respectively 97 per cent and 93 per cent for Gujerati, and 99 per cent and 97 per cent for Punjabi). CONCLUSIONS: SANGRA provides a practical and valid method of ascertaining South Asian origin by name and, to a lesser degree of accuracy, of differentiating between the main religious and linguistic subgroups living in Britain. This algorithm will be useful in health-related studies where information on self-ascribed ethnicity is not available or is of a limited nature.

Item Type	Article
Keywords	Algorithms, Asia, Southeastern/ethnology, Database Management Systems, Directories, Ethnic Groups/classification/statistics & numerical data, Great Britain/epidemiology, Health Status, Human, Language, *Names, Patient Admission, Patient Identification Systems, Religion, Software, Support, Non-U.S. Gov't, Algorithms, Asia, Southeastern, ethnology, Database Management Systems, Directories, Ethnic Groups, classification, statistics & numerical data, Great Britain, epidemiology, Health Status, Human, Language, Names, Patient Admission, Patient Identification Systems, Religion, Software, Support, Non-U.S. Gov't
ISI	173321300005

Explore Further

Centre for Global Non-Communicable Diseases (NCDs)

Journal of public health medicine

Full text not available from this repository.

Atom

BibTeX

OpenURL ContextObject in Span

Multiline CSV

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

EndNote

HTML Citation

JSON

MARC (ASCII)

MARC (ISO 2709)

METS

MODS

RDF+N3

RDF+N-Triples

RDF+XML

RIOXX2 XML

Reference Manager

Refer

Simple Metadata

ASCII Citation

EP3 XML

Export

Downloads