• Open Access

Australian health-related quality of life population norms derived from the SF-6D

Authors


Correspondence: Dr Richard Norman, Centre for Health Economics Research and Evaluation (CHERE), University of Technology, Sydney, PO Box 123, Broadway, Sydney, New South Wales 2007; e-mail: richard.norman@chere.uts.edu.au

Abstract

Objective : To investigate population health-related quality of life norms in an Australian general sample by age, gender, BMI, education and socioeconomic status.

Method : The SF-36 was included in the 2009/10 wave of the Household, Income and Labour Dynamics in Australia (HILDA) survey (n=17,630 individuals across 7,234 households), and converted into SF-6D utility scores. Trends across the various population subgroups were investigated employing population weights to ensure a balanced panel, and were all sub-stratified by gender.

Results : SF-6D scores decline with age beyond 40 years, with decreasing education and by higher levels of socioeconomic disadvantage. Scores were also lower at very low and very high BMI levels. Males reported higher SF-6D scores than females across most analyses.

Conclusions : This study reports Australian population utility data measured using the SF-6D, based on a national representative sample. These results can be used in a range of policy settings such as cost-utility analysis or exploration of health-related inequality. In general, the patterns are similar to those reported using other multi-attribute utility instruments and in different countries.

As in all developed nations, Australia is expending considerable effort identifying approaches to manage both an ageing population and a rapid proliferation of new and often expensive therapies. In 2008/09, it was estimated that the nation spends 9.0% of GDP (or A$112.8 billion) on healthcare, with government spending representing 69.7% of total expenditure.1 This equates to $5,190 per Australian and places Australia in the middle of health expenditure relative to other developed OECD nations. How this major investment translates into health outcomes is an important consideration for all nations and jurisdictions, and is the underpinning reason for the increasingly systematic use of health technology assessment, both in Australia and elsewhere.

One of the major areas in which different health outcomes might manifest is that of health-related quality of life (HRQoL). Recently, van den Berg2 has reaffirmed the push in favour of reporting population norms for single index measures of HRQoL. This has a number of potentially valuable applications in public health. First, it can be used to estimate the incremental effect of some intervention when control groups are not available3 or, similarly, to estimate the HRQoL of populations with a condition who return to normal health. Both are important concerns in economic evaluation of healthcare, particularly cost-utility analysis in which the outcome is defined in terms of a composite measure of life expectancy and HRQoL (often the quality-adjusted life year, or QALY). A second use of population norms is in the comparison of different sub-populations (e.g. by gender, education, etc), determining where the greatest areas of disutility exist.4 Finally, it can be used to explore issues relating to equity.5

A range of instruments have been developed to explore HRQoL, many of which are amenable to use in health technology assessment (and, more specifically, economic evaluation). One of these, the SF-6D has a particular strength in that it can be derived from the SF-36 specifically for use in economic evaluation.6 This means that the large literature reporting SF-36 data can be translated into SF-6D utility weights with relative ease. Both the SF-6D and the SF-36 are generic HRQoL instruments, but the latter is not designed to be an outcome measure for use in economic evaluation, as it does not investigate the trade-offs between dimensions. The SF-36 contains eight dimensions (physical functioning, role-physical, bodily pain, general health, vitality, role-emotional, social functioning and mental health).7 In the physical functioning dimension, for example, there are 10 items, each with three levels corresponding to ‘limited a lot’, ‘limited a little’ and ‘not limited at all’. These are coded as 1, 2 and 3, and summed to provide a score between 10 and 30. This is then rescaled to a 0–100 scale. It is noteworthy that the SF-36 neither explicitly considers the trade-offs between different dimensions (so each dimension is implicitly equally important), nor places health states on a zero-to-one scale required for the construction of a QALY.

Brazier6 adapted the SF-36, excluding all general health questions and combining role limitation due to physical problems and role limitation due to emotional problems (but retaining the distinction in the labels attached to levels). This work produced the SF-6D, a multi-attribute utility instrument which has been the subject of a series of valuation studies in the UK6,8 and elsewhere.9–12 These valuation studies aim to place all health states within the multi-attribute instrument on a scale where full health is valued at 1 and dead (or health states considered equal to being dead) is valued at 0, with the possibility of health states being valued less than 0. The purpose of these scores is to weight years of life by their quality to produce a quality-adjusted life year (QALY). There is evidence suggesting that the SF-6D has good psychometric characteristics in a range of settings.13,14

In the United States (US), Fryback et al.15 compared population norms using a range of multi-attribute utility instruments, and conclude that there is a striking similarity between the population norms for the six they considered (EQ-5D, HUI2, HUI3, SF-6D, QWB-SA, HALex). Regarding the results from the SF-6D instrument, males reported a slightly higher HRQoL than females, and the values were negatively related to age (from 0.80 in the 35–44 year age range to 0.76 in the 75–89 year age range). Interestingly, they detect higher levels of HRQoL for people aged 65–74 years relative to those in the immediately younger age cohort (55–64). Whether this was an attrition issue or reflective of improved health-related quality of life in this age group is uncertain, and something that could not be explored as the data was cross-sectional. This spike in HRQoL was a common feature in the six instruments they considered, and identifying if this is a phenomenon specific to the US would be of interest. It is, however, important to note that this cross-sectional relationship between instruments does not necessarily translate into a corresponding longitudinal relationship. A recent study by Feeny et al.16 identified a poor longitudinal relationship between various health-related quality of life instruments in patients undergoing cataract surgery or treatment for heart failure.

In Australia, there are existing population norms for the Assessment of Quality of Life (AQoL) instrument.17 In a sample of 2,934 South Australians, the average AQoL utility score is 0.83 (standard deviation 0.20, 95% confidence interval 0.82–0.84). Once the results are subdivided by age category, the scores decrease almost monotonically with age, although there is a very shallow slope before the age of 50. Notably, the trend in the 65–74 year old groups identified by Fryback et al. is not replicated by Hawthorne and Osborne. A number of speculative reasons for this difference might be postulated; however, whether it is something related to the Australian population or to the AQoL instrument (which is not employed by Fryback et al.) is unclear. Employing an instrument used by Fryback et al. in an Australian population might assist in identifying which of the speculative reasons is more sensible.

To the authors’ knowledge, Australian population norms for the other leading instruments (e.g. the EQ-5D, SF-6D, HUI) do not yet exist. The aim of this study is to use a large national representative survey to provide a set of population norms from the SF-6D. This will allow some comparison with results internationally, and identify groups for which poor health-related quality of life is a serious health issue.

Methods

This study uses the Household, Income and Labour Dynamics in Australia (HILDA) survey, which collects information on economic and subjective well-being including the SF-36. Details of the survey have been presented elsewhere.18 It was inspired by other large overseas panel surveys, such as the British Household Panel Survey (BHPS), and began with an initial sample in 2001 drawn from all persons residing in private dwellings in Australia (with some minor exceptions). The initial sample was selected using a multistage approach. First, a sample of 488 Census Collection Districts was selected, each of about 200–250 households. This sample was drawn according to State and metropolitan/non-metropolitan region. Within each Census Collection District, 22–34 dwellings were randomly selected, which yielded the initial sample. The sample is extended over time by adding any children, whether born or adopted by members of the sample, or any new household members resulting from composition changes in the original households. The former group enter the survey on a permanent basis, while the latter remain in the survey as long as they reside with an original sample member.

At the time of analysis, the survey had nine waves of data, beginning in 2001/02. It includes only individuals living in private dwellings (so, for instance, those in nursing homes would not enter into the survey, but can continue providing data if they subsequently move into a non-private dwelling). Wave 9 interviews were conducted between mid-August 2009 and mid-March 2010, using computer-assisted personal-interviewing (CAPI) for the first time. Wave 9 consists of 17,630 individuals across 7,234 households. Of these 17,630 individuals, 13,305 were participants in the first wave of the study, suggesting good retention over time. The various methods for ensuring good retention of panel members have been outlined in depth elsewhere.18 The Wave 9 HILDA survey asked respondents to complete the SF-36 (version 1). This was converted into an SF-6D index score where full health (i.e. all dimensions at the best level) was assumed to be valued at 1, and a health state equivalent to death was valued at 0. As yet, there is no published SF-6D algorithm for Australia; therefore, the UK algorithm was employed (i.e. a consistent version of Model 10 in Brazier et al.)6

Analysis was completed using STATA 11.2. Sampling weights were defined as the inverse sampling probability for each participant, and are supplied alongside the Wave 9 data. These were applied to the generation of the summary statistics to allow for the likelihood of an individual being selected as part of the survey. The application of weights to the data causes 94 individuals to be removed from the weighted sample (assigned a weight of 0). In this context, this does not mean that they were assumed to not be in the general population (which is how a weight of 0 would normally be interpreted). There are two reasons for assigning the zero weight to a respondent. First, these individuals were resident in a private dwelling when recruited, but had become a resident in a non-private dwelling at some point in the interim. An example might be an individual who has moved into aged-care facilities during the life of the survey. While the HILDA survey did not recruit people in non-private dwellings (such as aged-care facilities), it does continue to collect data for individuals who had moved while part of the cohort. The second reason for assigning a zero weight is that the individual was resident in a very remote area of Australia. Exclusion of this group of individuals in this way is consistent with Australian Bureau of Statistics policy.

As an initial exploration, the proportion of individuals reporting no problems (i.e. Level 1) in each of the domains of the SF-6D was calculated, stratified by age and gender. The data were then divided according to four major characteristics of interest, these being age, education, the socioeconomic indexes for areas (SEIFA) measure of relative socioeconomic status by decile, and categories of body mass index (BMI). The education associated with each person was classified in one of nine levels, these being Postgraduate (Masters or Doctorate), Bachelors or Honours, Graduate Diploma or Certificate, Other Diploma, Certificate Level III or IV, Certificate Level I or II, an undefined certificate, Year 12 completion (i.e. completed secondary education), and Year 11 and below. The definition of each of these followed the classification by the Australian Bureau of Statistics.19 The SEIFA measure refers to the level of socioeconomic status of the area in which the respondent lives, so is a broad measure of disadvantage. However, SEIFA is generated using a large number of separately identified geographical areas, the smallest of which are called Census Collection Districts containing approximately 250 households in urban areas, and fewer in rural areas. The categories of BMI followed classification by the World Health Organization,20 defining individuals as underweight (BMI of less than 18.5), normal (18.5–24.9), overweight (25.0–29.9), class I obese (30.0–34.9), class II obese (35.0–39.9) and class III obese (40.0 and higher). For each of these four characteristics, individuals were further stratified by gender, and also presented at the aggregate level. This analysis follows the approach used in the reporting of UK population norms for the EQ-5D developed by Kind et al.21

Results

The proportion of respondents reporting no problems in each of the six SF-6D dimensions is presented in Table 1. Respondents across all ages and genders appear more likely to report problems in certain dimensions (most notably Vitality). It is noteworthy that the responses to the dimensions do not respond in a uniform way to ageing. While the Physical Functioning dimension demonstrates a considerable negative gradient in older respondents, the pattern is either less clear (for example Social Functioning or Pain) or not apparent at all (Mental Health).

Table 1.  Proportion of respondents reporting no problems in each dimension of the SF-6D, stratified by age and gender.
 MalesFemales
Age groupAll18–3031–4041–5051–6061–7071+All18–3031–4041–5051–6061–7071+
n5,2921,1798531,0038316214846,0301,3309821,116950694622
Physical Functioning43.1%74.3%58.3%39.1%23.4%11.4%5.0%36.7%62.5%51.2%34.5%19.9%9.8%4.3%
Role Limitation74.4%83.5%82.0%76.6%72.7%63.5%44.0%65.5%71.5%72.5%71.3%64.7%55.6%37.4%
Social Functioning63.2%67.5%66.0%61.1%61.4%60.8%51.6%55.2%54.7%58.7%56.0%53.1%57.0%47.6%
Pain29.2%43.6%32.8%23.2%21.1%18.7%18.0%25.6%36.1%32.8%22.5%17.2%13.8%15.5%
Mental Health55.7%53.3%58.9%56.4%57.8%55.3%55.5%50.8%48.7%54.5%51.4%48.4%54.0%52.5%
Vitality5.4%8.3%3.0%3.7%2.8%3.7%2.7%3.4%4.0%2.9%2.6%2.4%3.4%2.6%

The results from the analyses based on single dimensions and gender are presented in Table 2, and in Figures 1 to 4. In almost all sub-groups, males report a higher SF-6D utility value than females. For the entire analysis set, this difference is 0.026. This is statistically significant (p<0.0001).

Table 2.  SF-6D Scores by Gender, BMI group, Age, Education and SEIFA decile.
 Age (years) Both gendersMaleFemale
 AllN
Mean (SD)
CI
10,882
0.766 (0.122)
0.763–0.769
5,082
0.780 (0.118)
0.776–0.784
5,800
0.754 (0.123)
0.750–0.758
Age (years)18–30N
Mean (SD)
CI
2,438
0.792 (0.112)
0.786–0.798
1,147
0.811 (0.108)
0.803–0.819
1,291
0.775 (0.113)
0.767–0.782
 31–40N
Mean (SD)
CI
1,792
0.788 (0.112)
0.782–0.795
832
0.800 (0.106)
0.791–0.808
960
0.778 (0.117)
0.769–0.787
 41–50N
Mean (SD)
CI
2,053
0.770 (0.115)
0.764–0.776
968
0.779 (0.112)
0.770–0.787
1,085
0.762 (0.118)
0.753–0.770
 51–60N
Mean (SD)
CI
1,706
0.749 (0.123)
0.742–0.756
791
0.763 (0.119)
0.753–0.773
915
0.738 (0.126)
0.727–0.748
 61–70N
Mean (SD)
CI
1,253
0.737 (0.127)
0.729–0.746
599
0.747 (0.126)
0.736–0.759
654
0.728 (0.126)
0.716–0.740
 71 +N
Mean (SD)
CI
1,015
0.703 (0.129)
0.693–0.713
448
0.717 (0.130)
0.703–0.731
567
0.692 (0.127)
0.678–0.706
BMI group<18N
Mean (SD)
CI
284
0.763 (0.129)
0.746–0.780
82
0.770 (0.129)
0.737–0.803
202
0.760 (0.129)
0.740–0.780
 18–24.9N
Mean (SD)
CI
4,222
0.782 (0.117)
0.777–0.787
1,773
0.798 (0.111)
0.792–0.805
2,449
0.770 (0.120)
0.764–0.776
 25–29.9N
Mean (SD)
CI
3,588
0.768 (0.120)
0.763–0.773
2,006
0.779 (0.119)
0.773–0.785
1,582
0.754 (0.120)
0.746–0.761
 30–34.9N
Mean (SD)
CI
1,563
0.748 (0.121)
0.741–0.756
787
0.763 (0.118)
0.753–0.773
776
0.732 (0.123)
0.721–0.743
 35–39.9N
Mean (SD)
CI
499
0.740 (0.125)
0.727–0.753
194
0.752 (0.126)
0.731–0.773
305
0.732 (0.123)
0.715–0.749
 40>N
Mean (SD)
CI
248
0.700 (0.134)
0.679–0.721
85
0.720 (0.128)
0.685–0.755
163
0.690 (0.136)
0.664–0.716
Highest qualificationMasters or doctorateN
Mean (SD)
CI
415
0.789 (0.102)
0.776–0.802
220
0.797 (0.096)
0.781–0.812
195
0.779 (0.109)
0.758–0.801
 Grad diploma/certificateN
Mean (SD)
CI
607
0.777 (0.110)
0.766–0.788
240
0.789 (0.109)
0.772–0.806
367
0.768 (0.109)
0.754–0.782
 Bachelor or honoursN
Mean (SD)
CI
1449
0.791 (0.105)
0.784–0.798
619
0.811 (0.098)
0.801–0.820
830
0.776 (0.108)
0.767–0.785
 Adv diploma, diplomaN
Mean (SD)
CI
959
0.773 (0.113)
0.765–0.782
457
0.781 (0.112)
0.769–0.792
502
0.767 (0.114)
0.755–0.779
 Cert III or IVN
Mean (SD)
CI
2,150
0.764 (0.122)
0.758–0.771
1,325
0.773 (0.121)
0.765–0.781
825
0.749 (0.123)
0.739–0.760
 Cert I or IIN
Mean (SD)
CI
163
0.755 (0.127)
0.731–0.779
65
0.775 (0.127)
0.734–0.815
98
0.741 (0.126)
0.712–0.770
 Cert not definedN
Mean (SD)
CI
54
0.714 (0.117)
0.680–0.749
17
0.688 (0.117)
0.623–0.753
37
0.724 (0.116)
0.682–0.766
 Year 12N
Mean (SD)
CI
1,727
0.776 (0.120)
0.769–0.783
780
0.790 (0.113)
0.780–0.800
947
0.764 (0.124)
0.754–0.774
 Year 11 and belowN
Mean (SD)
CI
3,353
0.746 (0.132)
0.740–0.751
1,356
0.763 (0.129)
0.755–0.772
1,997
0.733 (0.132)
0.725–0.741
SEIFA decile1st decileN
Mean (SD)
CI
856
0.734 (0.129)
0.721–0.746
391
0.753 (0.123)
0.739–0.767
465
0.718 (0.132)
0.700–0.736
 2nd decileN
Mean (SD)
CI
974
0.74 (0.129)
0.73–0.75
442
0.751 (0.129)
0.737–0.765
532
0.730 (0.129)
0.717–0.744
 3rd decileN
Mean (SD)
CI
1191
0.752 (0.132)
0.742–0.761
548
0.764 (0.128)
0.751–0.777
643
0.740 (0.134)
0.727–0.753
 4th decileN
Mean (SD)
CI
930
0.766 (0.119)
0.757–0.776
434
0.777 (0.117)
0.762–0.792
496
0.756 (0.119)
0.744–0.768
 5th decileN
Mean (SD)
CI
1,070
0.763 (0.122)
0.754–0.772
482
0.776 (0.117)
0.764–0.789
588
0.752 (0.125)
0.740–0.764
 6th decileN
Mean (SD)
CI
1,109
0.766 (0.122)
0.756–0.776
523
0.780 (0.125)
0.766–0.794
586
0.753 (0.118)
0.739–0.766
 7th decileN
Mean (SD)
CI
1,258
0.778 (0.115)
0.771–0.785
605
0.795 (0.109)
0.785–0.805
653
0.762 (0.118)
0.751–0.773
 8th decileN
Mean (SD)
CI
1,164
0.779 (0.114)
0.771–0.787
548
0.792 (0.113)
0.781–0.804
616
0.767 (0.113)
0.757–0.778
 9th decileN
Mean (SD)
CI
1,220
0.780 (0.114)
0.772–0.788
579
0.791 (0.107)
0.780–0.801
641
0.770 (0.120)
0.759–0.781
 10th decileN
Mean (SD)
CI
1,108
0.791 (0.111)
0.784–0.799
529
0.806 (0.103)
0.796–0.817
579
0.778 (0.116)
0.766–0.789
Figure 1.

SF-6D Population Norms by Body Mass Index and Gender.

Figure 2.

SF-6D Population Norms by Age and Gender.

Figure 3.

SF-6D Population Norms by Highest Educational Qualification and Gender.

Figure 4.

SF-6D Population Norms by SEIFA Decile and Gender.

Regarding the relationship between SF-6D and age, Table 2 illustrates that scores decrease monotonically as age increases. The SF-6D scores differ by age groups, for the pooled sample and also when each gender is considered separately (all p<0.0001). The data demonstrate an inverse U-shape over BMI group with lower SF-6D scores reported both by those who are underweight and those who are obese. As with age, the SF-6D scores differ across BMI categories at the aggregate level and for females and males only (for all three analyses, p<0.0001). As with age, the SF-6D scores differ across the highest educational qualification categories at the aggregate level and for females and males separately (for all three analyses, p<0.0001). As expected, higher educational qualifications are associated with higher SF-6D scores. The relationship between SF-6D score and SEIFA decile reflects the a priori assumption that living in an area with relatively less disadvantage is associated with better levels of health-related quality of life. As with the other analyses, the SF-6D scores are clearly not independent of SEIFA deciles at the aggregate level or for females and males only (for all three analyses, p<0.0001).

Discussion

The results presented here suggest that the health-related quality of life data reported in HILDA reflect trends that would have been expected a priori. Utility scores decline in age, particularly beyond 40 years. They decline in higher levels of socioeconomic disadvantage, and also for high and low BMI categories. Similarly, scores tend to increase for more highly educated people. Of these trends, the categories with the largest differences by sub-group are age and BMI. However, the pattern is not consistent across dimensions. Notably, Table 1 demonstrates that a consistent proportion of respondents (48%–59%) across all age groups and both genders report no problems in Mental Health. This is in marked contrast to Physical Functioning and Role-Physical which show a very strong negative association with age.

The results by age and gender reported in Table 2 suggest a slightly different pattern to that reported by Hawthorne and Osborne.17 Indeed, the pattern outlined in the HILDA data is closer to the pattern reported for the SF-6D in the United States,15 possibly suggesting that the choice of instrument has a greater impact on utility scores than does the country of respondents. However, having claimed similarity with the American results, it should be noted that the spike in HRQoL in the 65–74 year old cohort is not replicated in our data; if anything this age range has a more dramatic decline than other age groups. Regarding the comparability of our results and those of Hawthorne and Osborne, it should be noted that the range of scores in the two studies differs. The spread of mean scores across subgroups in the AQoL paper is larger. There are two good reasons for this divergence. First, it may be a product of the sensitivity of the two instruments to aspects of HRQoL. Second, it may result from the studies that have valued health states following different approaches.6,22 For the SF-6D, health states are valued using a Standard Gamble.6 In this, the survey respondent identifies a point of indifference between a certain prospect of the health state being valued and a gamble with a probability (p) of receiving full health, and a complementary probability (1-p) of the worst health state in the instrument. The use of this technique has been shown to be a plausible explanation for a narrower range of resultant scores, as risk-aversion inflates scores associated with health states.23 Unfortunately, it is difficult to disentangle the two explanations. This separation of valuation technique and multi-attribute utility instrument is an area with some research24 but is likely to represent a fruitful area of future research.

A major limitation of this work is that these population norms are based on health state valuations derived from a non-Australian, UK sample.6 This is because, at present, there are no Australian preference weights for the SF-6D. It remains unclear how important this is. Broadly, it might be argued that UK population preferences are likely to be similar to Australian preferences. The only evidence relating to this is for an alternative multi-attribute utility instrument, the EQ-5D. The EQ-5D was first valued in the UK,25 and has since been valued in a range of other countries allowing international comparison.26 Recently, an Australian algorithm has been developed.27 In that study, Australian and UK weights were compared and suggested considerable divergence between the two populations. In particular, the values for health states in the UK sample were almost all lower than in the Australian sample. However, the agreement in the ranking of the health states under the two algorithms was high (Spearman rank co-efficient of 0.9752). Therefore, while using the UK algorithm may lead to an inaccurate estimate of the difference in score between two population sub-groups, the direction of the difference is likely to be correct.

With regard to the population norms presented here, the question is then how these results should be used in decision-making. It is clear they identify groups more or less likely to have issues relating to health-related quality of life. This is important as a focus on (for example) mortality alone may lead to a focus on male health, as life expectancy for males is relatively short. However, any focus in this direction may be erroneous as males report consistently higher utility scores irrespective of how the data are stratified, meaning that the relative aggregate measure of ill-health between genders is more difficult to determine.

A possible application of this data is as a reference case in an economic evaluation component of health technology assessment. For example, if a therapy returns a person in the population to full health, assigning the resultant health state for that person a value of 1 overstates their likely health post-therapy. This is because they will still have characteristics (such as age) likely to limit their health in certain domains. This use of population norms is a valid use of the data and is likely to represent a more accurate prediction of health trajectory than simply assigning a value of 1 to an individual without the condition that is under investigation. However, some caution must be taken in this particular application, as causality has not been established under this type of approach.

This paper reports population norms for the SF-6D in Australia. It is the first study to do so, and identifies sub-populations likely to experience relatively poor health-related quality of life. It does this using the HILDA study, which offers a large population-representative sample of considerable value to researchers. The results are similar to those derived elsewhere, although they would be strengthened through the use of Australian preference weights. With regard to population norms more generally, the challenge remains how best to apply the conclusions to health policy.

Ancillary