• Open Access

Coverage and accuracy of ethnicity data on three Asian ethnic groups in New Zealand


Correspondence to:
Pauline Norris, School of Pharmacy, University of Otago, Box 56, Dunedin 9054, New Zealand. Fax: 64 3 479 7034; e-mail: pauline.norris@otago.ac.nz


Objective: Detecting and eliminating ethnic disparities in access to and outcomes of healthcare relies on accurate ethnicity recording. Studies have shown that there are inaccuracies in ethnicity data in New Zealand and elsewhere. This study examined coverage and accuracy of ethnicity data for three Asian ethnic groups.

Methods: Student researchers from, or with links to, the ethnic groups concerned worked with communities to recruit participants. Names and dates of birth, length of residence in New Zealand and immigration status were recorded. Names and dates of birth were sent to the New Zealand Health Information Service, which attempted to link them with National Health Index ethnicity data.

Results: Only 72% of participants could be linked to an NHI number, and only 48% of those had their ethnicity recorded accurately. Linkage odds were lower for older people, and accuracy was higher for Chinese people compared to the other ethnicities. Length of residence and immigration status did not affect either coverage or accuracy.

Conclusion: Most participants who could be linked had their ethnicity recorded in the broader category of “Asian”, but accuracy was poor at the sub-group level.

Implications: Extreme caution should be applied when examining data about sub-groups within the ‘Asian’ category.

Numerous studies have found differences in use of health services and in outcomes by ethnicity.1–4 The ability to monitor and address these differences depends on accurate recording of patients' ethnicity. However, in many cases ethnicity is not recorded or recorded inaccurately in administrative datasets.

Individuals are best placed to state their own ethnicity and, therefore, self-identification is regarded as the gold standard. However, it is rarely possible for patients to identify and record their own ethnicity. Administrative datasets at best include the patient's self-identified ethnicity recorded by a healthcare provider, usually an administrative staff member. In reality, staff sometime infer ethnicity by observing patient appearance or names, or asking relatives.5 After death, funeral directors may also guess ethnicity through observation or personal knowledge.6 Healthcare staff and funeral directors may find it difficult to ask patients or their families about ethnicity.6,7

Where no ethnicity data has been recorded in administrative datasets, various strategies such as record linkage can be used.8 Names (forenames and surnames) have been used to guess Asian ethnicity,9–12 but this is less useful when inter-marriage between people of different ethnic groups is common. Inter-marriage rates are increasing in many Western countries and the level of inter-marriage between Asians and non-Asians in New Zealand is such that names would be of little use in guessing Asian or non-Asian ethnicity (in 2001, 22% of Asian women and 10% of Asian men had a non-Asian partner).13

Where ethnicity data does exist, a number of studies have identified problems in its completeness and accuracy. Both Gomez et al.5 and West et al.14 compared self-reported ethnicity with medical records and found that while there was a high level of accuracy for ethnicity data overall, particularly for the larger ethnic groups, Native Americans were poorly recorded. Those correctly classified may differ from those misclassified15 and this may make estimates based on existing data inaccurate.

Recording of ethnicity data in New Zealand

Reducing disparities in health status between ethnic groups is a priority in New Zealand health policy.16 Thus, recording accurate ethnicity data is regarded as important, and is now an important part of the formulae used to fund primary care.17

A unique patient identifier, the National Health Index (NHI) is used to identify patients. All people in New Zealand, except those newly arrived, should have an NHI number. Healthcare providers (including hospitals, primary care providers and midwives) collect ethnicity information from patients. The 2001 Census question is the standard question for the New Zealand Health and Disability sector (Figure 1).20 This should ensure that information about ethnicity collected in healthcare facilities is consistent and directly comparable to the information collected from the Census.

Figure 1.

Standard ethnicity collection question (Source: SNZ, 2001 Census).

Data are sent by healthcare providers to the New Zealand Health Information Service (NZHIS), which holds ethnicity information in records linked to NHI numbers. Problems have been identified with the NHI, in particular, significant under-counting of Māori (the indigenous people of New Zealand) has been found.18,19

Asian ethnicity data in New Zealand

In New Zealand, ‘Asian’ includes peoples from East, South-East and South Asia.21 The Asian population now constitutes 9% of the population. There has been a small Chinese community in New Zealand since the mid-nineteenth century. In addition, all three ethnic groups have increased dramatically due to recent immigration (see Table 1).

Table 1.  Population in New Zealand and percentage born outside New Zealand for the three groups.22
Ethnic groupPopulation% born outside New Zealand
Sri Lankan8,31087

While there is some evidence of undercounting of Asians,18 until now there has been no evidence about the accuracy of lower level ethnicity classifications for specific Asian ethnic groups. This is presumably partly due to methodological problems. Targeting general populations of health service users would be inefficient for accessing members of smaller ethnic groups. Low response rates may also be anticipated from some groups, who might be suspicious about the aim of the research.

The aim of this study was to assess the coverage and accuracy of ethnicity information recorded for Sri Lankan, Korean and Chinese people, and to explore whether this varies by length of residence or immigration status.


Sampling technique

We developed a novel approach to locating participants: student researchers from, or with links in, particular ethnic communities worked with these communities. This was to increase the acceptability of the research within these communities and to establish trust. Initial informal discussion with community members guided our methods.

Chinese, Korean and Sri Lankan people were approached through friends, family and contacts of friends and family. These ethnicities were chosen because of the ethnicities and social networks of the student researchers. Churches, temples and dancing groups were approached for some ethnicities. Participants from each of the three ethnicities were recruited from a range of cities around New Zealand. Data were collected in 2008.

Data collection form

We developed a short form asking for participants' full name and date of birth (for identifying individuals' NHI records), length of residence in New Zealand, immigration status and any other ethnicities they identified with. Ethical approval was obtained through the University of Otago. Participants were all at least 18 years old. The data collection form, the information sheet and the consent form were translated into Chinese and Korean. After discussions with Sri Lankan people, we decided a form in English would be most acceptable to this community.

Data entry and analysis

Data were independently double-entered into Excel to reduce any misspellings of names or the misrecording of any other information. Ethnicity was coded according to the system used by the NZHIS. For the Chinese the coding number is 42, while for the Korean and Sri Lankan ethnicities it is 44 (‘Other Asians’). Full names and dates of birth were sent to the NZHIS for matching. The NZHIS used an automated matching process to match participants to an NHI record. If that failed, they attempted manual matching. Thus the process was initially deterministic, but there was a probabilistic back-up in cases where the initial attempt was unsuccessful. The NZHIS records up to three ethnicities for an individual in an NHI record. Where the NZHIS was able to find an NHI record for a person, it provided us with all recorded ethnicities for that person. This was then compared to the ethnicity information collected from participants. Ethnicity was considered to be accurately recorded when the primary ethnicity (Chinese, Korean or Sri Lankan) given by a participant matched any one of those recorded in the NHI system.

Microsoft Excel was used for descriptive analyses, while multiple logistic regression models were fitted using R23 to obtain odds ratios and confidence intervals. The multiple logistic models were fitted with the glm function, with all factors entered into the models.


Three hundred and fourteen participants were recruited: 97 Chinese, 102 Koreans and 115 Sri Lankans (Table 2). Chinese participants tended to be younger than the Chinese population in New Zealand, but both the Korean and the Sri Lankan age profiles closely matched that of the general population of Koreans and Sri Lankans aged 18 years or more in New Zealand.

Table 2.  Characteristics of participants.
 ChineseKoreanSri LankanTotal
Number of participants97102115314
Median age (IQR)22 (3)34 (27)45 (27)30 (28)
Percentage with citizenship/permanent residence87819789
Mean time in NZ (SD)12 (6)9 (4)11 (6)10 (6)
Percentage who provided another ethnicity3111515

The NZHIS was able to link 72% of participants (225/314) to an NHI record. Linkage rates were higher for Chinese and Sri Lankan (each 79%) than for Koreans (56%).

A multiple logistic regression model was fitted to determine if any of the factors recorded affected the odds of being linked to an NHI record (Table 3).

Table 3.  Odds ratios (95% Confidence Intervals) for (a) predictors of being linked to an NHI record (n=298) and (b) predictors of having a correct ethnicity recorded in the NHI record if linking was successful (n=219).
 Odds Ratio(95% CI)p
  1. Notes:

  2. a) p<0.05

  3. b) Nine cases were excluded from the model because of missing data for the Time in New Zealand.

  4. c) Five cases were excluded from the model because of missing data for the Time in New Zealand factor.

Model (a) – Linked to NHI record.b
Time in New Zealand1.05(0.98–1.11)0.1497
Immigration Status
  Citizen/Permanent Resident(reference)  
  Student Visa0.62(0.24–1.64)0.3389
Ethnic Group
  Sri Lankan1.67(0.76–3.67)0.2006
Model (b) – Correct ethnicity in NHI recordc
Time in New Zealand0.98(0.93–1.03)0.4294
Immigration Status
  Citizen/Permanent Resident(reference)  
  Student Visa1.95(0.58–6.49)0.2776
Ethnic Group
  Sri Lankan0.41(0.20–0.84)0.0148a

A statistically significant association was found between age and the odds of being linked to an NZHIS record, with a 3% reduction in odds per year of age over 18 years. The odds ratio for Koreans almost reached statistical significance.

Ethnicity was accurately recorded for 48% of respondents who had an NHI record matched, i.e. 34% of all participants. Sixty per cent of the Chinese sample, 44% of the Koreans and 38% of the Sri Lankans who were matched to an NHI record had their ethnicity recorded accurately.

A multiple logistic regression model was fitted to determine if any of the factors recorded affected the odds of ethnicity being accurately recorded in the NHI record (Table 3).

Statistically significant associations were found, with Koreans and Sri Lankans 64% less likely to have their ethnicity recorded accurately compared to Chinese. Age was no longer found to be statistically significant.

The 31 Chinese people whose ethnicity was not accurately recorded were mostly recorded as other Asian ethnicities: Asian not further defined (seven), South-East Asian (one), other Asian (nine), but five were recorded as European and nine did not have an ethnicity stated.

The 32 Koreans whose ethnicity was not accurately recorded were classified as ‘Asian not further defined’ (12), South-East Asian (six), Chinese (one), Indian (one), other (one) or European not further defined (one). Ten Koreans did not have an ethnicity stated.

The 56 Sri Lankans whose ethnicity was not correctly recorded were classified as ‘Asian not further defined’ (five), South-East Asian (six), Indian (10), Middle Eastern (one), European not further defined (one), Other European (one), other (14). Eighteen Sri Lankans did not have an ethnicity stated.


While we attempted to include a diverse range of people in our study, the sampling strategy means we cannot be sure that they are a representative sample of the populations of interest, nor can we quantify the biases in our sample. Those with few social contacts with people of their ethnicity were probably less likely to be included. However, we believe our strategy was the most feasible way to engage these communities. In particular, research in the Sri Lankan community can be difficult because of tensions between Tamil and Singhalese people and we were fortunate that our study team included a member of each community. Our participants hand wrote their names and this made interpretation difficult. Use of handwritten names may have led us to underestimate the extent of NHI coverage.

Only 72% of our participants were able to be linked to an NHI record. Failure to link could occur because people do not have an NHI record, because their name was spelt differently or they used a different name or date of birth in the study from that recorded by healthcare providers (and therefore in the NHI system). These could be due to cultural reasons, attempts to simplify names by staff, mis-spelling or data entry errors by us or by healthcare staff. This may be particularly likely with Asian names, which may be very unfamiliar to healthcare staff. In addition, the ordering of Chinese and Korean names may cause problems. The finding that older people were less likely to be linked than younger people is somewhat surprising, since we assumed that older people would have had more contact with health services. We suspect the result may be because young people may be more likely to use a consistently Anglicised name.

For the 72% who had an identifiable NHI record, less than half had their ethnicity accurately recorded. Accuracy was higher for the Chinese than for the other groups. This may be because ‘Chinese’ is a separate category and therefore easier to code than the other ethnicities. The misclassifications that were made shed some light on potential sources of error. There seems to be a lack of clarity about the classification of Asian ethnicities, with many participants being classified incorrectly, but still within the general Asian category. This limits the potential to undertake research investigating differences between Asian populations. However, of most concern is the high proportion of participants for whom no ethnicity was recorded (nearly 12% of the whole sample), and the (smaller) proportion misclassified as European or Middle Eastern.

Previous comparisons of ethnicity data recorded in different datasets in New Zealand have shown significant discrepancies. Bramley and Latimer18 compared children's ethnicities recorded in the National Immunisation Register with those recorded in Primary Health Organisations' registers and found a significant misclassification of ethnicity across the Māori, Pacific, European and Asian groups, with the largest discrepancy being for Māori. Swan et al.19 compared hospital records with ethnicity data collected during the Barriers to Diabetes Care in Waikato study. Twenty-nine per cent of Māori, 11% of European and 33% of Pacific patients had their ethnicity incorrectly recorded in the hospital records. Marshall et al. compared ethnicity recorded in the NHI system with that recorded in a large general practice decision support system and found 12% of patients did not have an ethnicity recorded in the NHI system. The extent of disagreement between the two sources had significant implications for the measurement of ethnic-specific levels of risk.24 The Health Utilisation Research Alliance (HURA) encouraged primary care practices to collect ethnicity data, and found that of the 5,917 people who were recorded by practices as Asian, only 39% had Asian as their ethnicity in their NHI record. Many (37%) were missing an ethnicity on their NHI number.17 Our study is the first to look at finer levels of ethnicity classification, and this is likely to be why our results show such poor accuracy of ethnicity recording. In addition, the use of a population-dwelling sample, rather than a sample of healthcare users, may account, at least in part, for the poor level of linkage to NHI records.

Our results suggest that there is an on-going need to improve ethnicity recording in the health sector. In addition, they suggest that extreme caution is needed in interpreting the results of studies that rely on administrative data for identification of members of small ethnic groups.


We would like to thank the New Zealand Health Information Service, particularly Simon Ross; all of our participants; Yun (Wanda) Wang, Justin Kim and Bill Lu for their help.