Later retirement, job strain, and health: Evidence from the new State Pension age in the United Kingdom

This paper examines the impact of raising the State Pension age on women's health. Exploiting a UK pension reform that increased women's State Pension age for up to 6 years since 2010, we show that raising the State Pension age leads to an increase of up to 12 percentage points in the probability of depressive symptoms, alongside an increase in self-reported medically diagnosed depression among women in a lower occupational grade. Our results suggest that these effects are driven by prolonged exposure to high-strain jobs characterised by high demands and low control. Effects are consistent across multiple subcomponents of the General Health Question and Short-Form-12 (SF-12) scores, and robust to alternative empirical specifications, including “ placebo ” analyses for women who never worked and for men. Standard errors are clustered by year-and-month-of-birth (152 clusters). (affected by the having engaged in paid-work in their life.


| INTRODUCTION
Over the last decade, most Organisation for Economic Cooperation and Development (OECD) countries have increased their statutory pensionable age (SPA) with the aim of enhancing the financial sustainability of pension systems (OECD, 2016). The rationale behind these reforms is that increased employment opportunities, longer life expectancy, and more years spent in good health will enable older people to work longer and retire later (OECD, 2016;OECD, 2017a). Although evidence suggests that health is the most important cause of early retirement (Munnell, Sanzenbacher, & Rutledge, 2015), understanding how policies delaying retirement influence health is critical to assessing the reforms' total utility (Heller-Sahlgren, 2017).
This study assesses the health impact of a recent reform that gradually increased the SPA from age 60 up to 66 years for women born after March 1950 in the United Kingdom. The effect of an increase in SPA is ambiguous as several mechanisms may be at play. The Grossman model considers health as both an investment good that increases productivity and a consumption good that provides utility (Galama & Kapteyn, 2011;Grossman, 1972). On the one hand, workers may invest more in their health if they expect to retire later, for example, by engaging in healthy behaviours if the benefits of a longer working life induced by better health are higher than the costs of reduced leisure-time due to a shorter retirement period (Bertoni, Brunello, & Mazzarella, 2018). Alternatively, as postponing retirement increases the opportunity cost of time, individuals might trade-off health investments (exercising, cooking healthier food, attending medical visits) for working time (Galama & Kapteyn, 2011). Later retirement may also directly enter the health production function, for example, through higher social engagement, mental wellbeing, cognitive function, and other nonfinancial benefits of work (Mazzonna & Peracchi, 2017). The type of occupation, however, may be critical to understanding the way these mechanisms operate. In particular, later retirement for workers in a low occupational grade may increase exposure to work-related psychological and physical strain, which may result in poorer health as a result of extended exposure to these factors (Galama & Kapteyn, 2011). This is consistent with recent literature suggesting that workers subject to high job strain, as measured by high job demand (physical and psychosocial) and low job control (limited decision authority and intellectual discretion), experience worse health outcomes than workers with occupations subject to low job strain (see Marmot et al. (1991), Karasek (1979), and Ravesteijn, Kippersluis, and Doorslaer (2018)).
Several studies examine how health is affected after retirement, with findings being sensitive to the choice of country, empirical strategy, and health outcome. Some studies find that retirement has positive effect on mental (Belloni, Meschi, & Pasini, 2016;Eibich, 2015) and physical health (Bertoni, Maggi, & Weber, 2017;Coe and Zamarro, 2011), while other studies report either a negative (Behncke, 2012;Bonsang, Adam, & Perelman, 2012;Mazzonna & Peracchi, 2017) or no effect of retirement on health (Coe and Lindeboom, 2008;Coe and Zamarro, 2011). By contrast, few studies have evaluated how recent reforms to the SPA influence the health of older people, and existing studies are mixed. A reform that postponed early retirement age by 5 years and reduced early pension replacement rates for civil servants in the Netherlands led to worse mental health (De Grip, Lindeboom, & Montizaan, 2012). In Israel, Shai (2018) found that a 2-year increase in the male SPA led to worsening health, as did Atalay and Barrett (2014) for an Australian reform that raised the female SPA by 5 years (spanned in 20 years).  found that delaying statutory early retirement-age improved healthy behaviours and health satisfaction among Italian men in their 40s. Two studies evaluated the long-term impact of reforms lowering SPA, finding positive effects in the Netherlands within 5 years of retirement (Bloemen, Hochguertel, & Zweerink, 2017) and no effects in Norway (Hernaes, Markussen, Piggott, & Vestad, 2013).
Our study makes three important contributions to the literature. First, to our knowledge, this is the first study on the health effects of a unique UK reform that increased the SPA by up to 6 years over a short time window of 10 years. We differentiate from most earlier studies which examined how health changes after retirement as a result of crosscountry variation in SPA (Mazzonna & Peracchi, 2017), or employer-based retirement windows (Behncke, 2012;Bonsang et al., 2012;Eibich, 2015), by focusing on a national policy change that affected the SPA of a well-defined cohort of women. We implement a difference-in-differences approach comparing the health status of women unable to collect their State Pension because of the SPA change with the health of women of similar age and characteristics who were unaffected by the change by virtue of their birthdate. Second, we are able to examine heterogeneity by job-type and job-stress in the causal effect of postponing SPA on health. Previous studies focusing on the health-transition after retirement found stronger effects on the health of workers from lower socioeconomic status (SES), who face lower life expectancy and income, more barriers to reemployment, and to good-quality care (Belloni et al., 2016;Bertoni et al., 2017;Coe, von Gaudecker, Lindeboom, & Maurer, 2012;Mazzonna & Peracchi, 2017). Yet, there is limited empirical evidence on whether increased exposure to job stressors might explain the impact of extended work horizon on health. By implementing validated indicators of job demand and job control, we investigate the role of longer exposure to job strain as a result of an increase in the SPA. Third, we are able to estimate nonlinearities in the effect of increases in SPA by comparing cohorts that experienced vastly different SPA extensions. This is possible because of the nature of the reform, which led to a relatively wide range of SPA increases (1 to 60 months) in a short period of a few years.
Using Understanding Society, a nationally representative survey with extensive health measures, we find that SPA increases had a negative impact on health: women aged 60-64 years who are no longer eligible to collect their pension due to the reform exhibit worse mental and physical health scores (PCSs) and higher prevalence of clinical depression than women of the same age unaffected by the reform. Moreover, longer extensions of SPA led to higher declines in mental health than shorter extensions. Crucially, the negative health effect of SPA postponement is confined to women from lower-grade routine occupations, and it is largely driven by longer exposure to adverse psychological and physical stressors. As a result, the reform had the undesirable consequence of increasing health inequality by occupational grade, as evidence points to a 12 percentage-point increase in the probability of depressive symptomatology (General Health Question [GHQ] scores) for women in lower-grade occupations, which constitute clinically and economically meaningful changes. Moreover, we find a statistically significant 4.5% decline in PCSs for women in lower-grade occupations, although this is likely to be of less clinical relevance.
In what follows, Section 2 summarises the UK pension reform, Section 3 describes the data and empirical strategy, while Section 4 discusses the results. Section 5 concludes by discussing the implications of our findings.

| STATE PENSION AGE POSTPONEMENT IN THE UNITED KINGDOM
The reform of the female SPA in the United Kingdom, legislated in 1995 and implemented in 2010, affected the minimum age for claiming the Basic State Pension (BSP), which provides an almost-flat level of retirement income, depending on National Insurance contribution years. Although generally low by OECD standards, the BSP corresponds to around 38% of average gross income for retired households in the United Kingdom, thus representing a main income source for a significant proportion of older people (Webber & Mallet, 2017). In 2010, the full annual BSP amounted to £5,077.8 for a single individual and £8,119.8 for a couple (see OECD (2013), PPI (2015) and Lain (2016)). The SPA, which was set at 60 years prior to the reform, was first legislated to raise to 65 between 2010 and 2020 (an effective increase of 1 month every 2 months). Subsequent reforms then legislated for both men's and women's SPA to reach 66 by 2020, 67 by 2028, and 68 by 2046 (see Thurley and Keen (2017) for comprehensive details).
The impact of the reform on pension eligibility, based on birthdate, is substantial ( Figure 1): being born 1 year after March 1950 implies a 1-year delay in SPA. The SPA postponement exceeds 36 months for cohorts born after March 1953. Thus, women born just a few years apart face different eligibility status (above vs. below SPA) at any given age (right panel): for example, a woman aged 60 years in 2009 is above her SPA, while a woman aged 60 years in 2012 is 2 years below her SPA. The Institute for Fiscal Studies estimated that the policy increased female employment rates by 10 percentage points (Cribb & Emmerson, 2018).

| Data
We use data from seven Waves (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) of Understanding Society, an annual survey interviewing household members aged 16+ years in Britain, on health, social, and economics subjects (Lynn (2009)). From information on year and month of birth and interview date, we determine whether an individual lies above or below her SPA when interviewed. We employ self-reported information on employment-status ("in-paid-work," "unemployed," "retired," F I G U R E 1 Change in women State Pension age. Note. Authors' calculations based on Pension Act 1995, 2011 [Colour figure can be viewed at wileyonlinelibrary.com] "looking after family or home," "long-term sick/disabled"), as well as information on living arrangements, employment history, number of children, educational attainment, and SES (detailed hereafter).
We employ three validated measures of mental and physical health (detailed in Appendix 1). The General Health Questionnaire index (GHQ-12) measures psychological distress, on a scale between 0 and 36, with higher values signalling worse health. As a score of 12+ signals the presence of common mental disorders (Goldberg et al., 1997;Goldberg & Williams, 1988), a GHQ-caseness is built to identify respondents lying above and below the cutoff. We further disaggregate the GHQ score in three clinically meaningful factors (anxiety/depression, social-dysfunction, and loss-of-confidence), following Graetz (1991). We normalise the GHQ scores between 0 and 100.
The Short-Form-12 (SF-12, version2) is a generic health-related quality of life instrument which produces a PCS and a mental health score (MCS), each ranging from 0 to 100, with higher values signalling better health. Eight meaningful subfactors are also considered: physical functioning, role-limitations due to (a) physical or (b) emotional problems, mental health, bodily pain, general health, vitality/energy/fatigue, and social functioning (methodological details are available in Ware (2002) and Appendix 1). Both the GHQ and the SF-12 are widely used in the economics literature as generic measures of physical and mental health (Bünnings, Kleibrink, & Weßling, 2017;Clark, 2003;Dustmann & Fasani, 2016;Marcus, 2013;Mitra & Jones, 2017;Schmitz, 2011). In addition, we also employ information on chronic diseases that respondents report to have been diagnosed with by a doctor. 1

| Meaningful changes and effect size
In order to evaluate the clinical relevance of any impact of the SPA reform on health, we follow two approaches. First, we compare our results for continuous health outcomes to the concept of the Minimally Important Difference (MID), that is, a change large enough to be discernible by patients. The general approach to MID is to compute the "effect-size" (ES), that is, the ratio between the estimated effect and the standard deviation of that outcome (Cohen, 2013). Fayers and Hays (2014) suggest that an ES of 0.2 SD represents the threshold for a MID (see also Farivar, Liu, and Hays (2004)). Cohen (2013) proposed operational definitions of small, medium, and large ES corresponding to 0.2, 0.5, and 0.8 SD, respectively (see also King (2011)). For the PCS outcome, we will also adopt the threshold suggested by Schmitt and Di Fabio (2004), who estimated an "anchoring" minimum change in the PCS (6.8 points) which was found to correspond to a unitary change in patients' Global Disability Rating. 2 Indeed, in lack of an accepted gold-standard, and given that both effects-size and anchoring are not immune to criticisms, the clinical literature recommends, when feasible, to use both methods in order to evaluate MIDs (Fayers & Hays, 2014;Jayadevappa, Cook, & Chhatre, 2017).
Second, we estimate the impact of the reform on the aforementioned GHQ-caseness index.

| Econometric specification
We examine the impact of raising the SPA on women's health using a difference-in-differences approach on a narrow sample of women aged 60 to 64 years between 2009 and 2016. We compare the health status of respondents who, at the time of interview, were ineligible for the State Pension because of the reform with that of women of similar age who were eligible to claim their State Pension and were never affected by the reform (see Shai (2018) and ). 3 Age groups 60-63 years are observed both below and above-SPA, while 64 years old are above-SPA in all years ( Figure 1, right panel). Our identification strategy exploits variation in exposure to the reform by birth and interview dates and relies on a comparison of health trends for age cohorts affected by the change (the 60, 61, 62, and 63-year-olds born after April 1950 became ineligible after 2010, 2012, 2014, and 2016, respectively) relative to health trends in age cohorts never affected by the reform (60-64-year-olds born before April 1950 are always eligible to claim their State Pension).
1 Self-reported diagnosis may be affected by measurement error, as some diseases remain undiagnosed, potentially leading to a zero-effect bias. 2 The physical health score (PCS) anchoring suggested by Schmitt and Di Fabio (2004) is based on a sample of 155 patients with upper extremity diagnoses in Minneapolis, Minnesota. 3 To ensure a "clean" control-group, we exclude respondents above their statutory pensionable age (SPA) who had experienced an earlier postponement in their SPA (i.e., previously affected by the reform). Results do not change in analyses that include these respondents (Appendix 1.5).
Starting from an eligible sample of 13,084 observations (4,925 women) aged 60 to 64 years between 2009 and 2016, we dropped 2,004 observations (15.3% of the eligible sample) of women who never worked (not affected by the SPA increase), proxy interviews (235 observations, 1.8%), entries with missing values in our health outcomes or in other control variables (1,162 observations, 8.9%), and 2,309 observations of women observed past their State Pension age but who had been previously affected by the reform (17.6%). Our working-sample thus comprises 7,374 observations (3,531 women).
We estimate the following reduced-form model for health outcome y iat , observed at time t for individual i of age a, born in year-month c: where our main independent variable of interest is an indicator function for being below SPA, that is, an interaction between the individual's age and the interview date, which captures whether individuals were eligible to claim a State Pension. As age is an important determinant of both State Pension eligibility and vulnerability to health shocks, we incorporate fixed effects for age quarters (γ a ) to control for age-specific effects. Similarly, we include fixed effects for interview year quarters (η t ) to capture common trend shocks in health outcomes as well as in employment rates. Finally, we include a linear control for year-month of birth (δ c ) to capture cohort effects in health and labour market attachment. These three variables are not collinear as they are included in different units and functional forms; in addition, we observe women of the same age born in different years and measured at different times. As noted by, for example, Cribb, Emmerson, and Tetlow (2016), such a model assumes that age effects are cohort-and time-constant, cohort effects are time-and age-constant, and time-effects are age-and cohort-constant, where the latter hypothesis represents a "common-trends assumption" which we will discuss in Section 4.2.4, alongside further tests for alternative parametric forms for age-time-cohort effects. We control for additional observable individual-or area-specific characteristics that might confound the analysis (Staubli & Zweimüller, 2013). Due to cohort effects, treatment and control groups might differ on several sociodemographic characteristics. As younger cohorts in the United Kingdom have lower fertility (Kneale & Joshi, 2008), and having children could affect both women's employment/retirement decision and the onset of health conditions (Behncke, 2012), we include a categorical variable for having zero, one/two, or at least three children. Similarly, as younger cohorts are also more likely to attain secondary education and be employed in intermediate/managerial occupations than older cohorts (Mazzonna & Peracchi, 2017;OECD, 2015), we add one categorical variable for highest educational attainment (A-level or higher, GCSE-level, less than GCSE) and one for having (having had) a routine-, intermediate-, or managerial-level job (NS-SEC classification). 4 We account for marital status (being single, widowed/separated, or living with someone) to account for cohort differences in family forms and living arrangements (Sobotka & Toulemon, 2008). We control for country dummies (within the United Kingdom) to account for geographical factors that might directly affect health. Conditional on these controls, our coefficient β captures the impact of being below the SPA as a result of the reform, above and beyond the effect of age, year, and cohort.
We test for health inequality effects of postponing the SPA,by estimating a model including an interaction term between the "policy variable" and the NS-SEC occupational classification (or, in alternative models, the level of jobstrain) (3.2) as follows: In all analyses, standard errors are clustered at the month-of-birth level (154 clusters), as the treatment assignment varies by month-of-birth. Findings are robust to individual-level clustering (Section 4.2.4).

| Effect of different levels of SPA postponement
We test how the health impact of the reform differs with the extent of SPA postponement, which widely differs for women born after March 1950. We modify specification (3.1) as follows: where we introduce dummies for having an SPA increase of 1-6, 7-25, 25-36, ≥36 months (the reference category is "no SPA-postponement"). The SPA increase is nonlinearly related to birthdate, as the SPA is constant for women born before April 1950 or after September 1954 while it increases nonlinearly depending on month-of-birth for women born between March 1950 and September 1954 ( Figure 1, left panel).

| Descriptive evidence
Column (i) of Table 1 summarises descriptive statistics for our original eligible sample, while Column (ii) describes the final sample used in the statistical analysis. Around 40% falls into the manual-routine SES (mostly personal service occupations, sales and customer services, process/plant/machine operatives, and elementary occupations); around 30% belongs to the intermediate SES (mostly administrative and secretarial positions), and 30% falls into the higher SES group (managers and senior officials, health, teaching, and science professionals). 5 As Columns (iii) to (v) show, women ineligible for pension exhibit higher employment, unemployment, and sick/home-carers rates, as-well-as worse MCS and GHQ scores. Appendix Table 10 (Data S1) details the sample's age decomposition. This pattern is further illustrated in Figure 2, where we compare employment and health outcomes for a control group always observed above-SPA (women aged 62-64 years) and two treatment groups whose SPA-status changes over the study period as a result of the reform (60-61 years old). Panels a and b show that retirement rates decrease while working/being sick/caring rates increase for the treated groups when they become ineligible for State Pension, while trends are stable for the control group (see also Cribb et al. (2016) and Staubli and Zweimüller (2013)). Interestingly, Panels d and e show that women observed below-SPA fare worse GHQ and MCS scores. These trends, however, could reflect cohort effects or omitted variable bias; in the next section, therefore, we turn to our econometric approach to isolate the casual impact of the SPA reform. Table 2 reports the results for our main specification (3.1), estimated through OLS for the continuous health outcomes (GHQ, MCS, PCS) and through Linear Probability Model for the GHQ cutoff. In Column 1, we estimate the effect of the reform on female employment rates (measured with a binary indicator for being in paid work). We confirm previous findings (Cribb et al., 2016;Cribb & Emmerson, 2017) of a major shock to the employment of women whose SPA was postponed, estimated in a 10-percentage point increase. Self-reported alternative labour market outcomes are also affected, that is, respondents declaring to be sick (+5%), caring for the house/for someone (+2.5%), unemployed (+2%), and retired (−21%). Table 11 includes full results. 6 5 Compared with our original eligible sample, the final sample is very similar in terms of average age, marital status, and number of children, yet it exhibits generally higher education level and employment-rates, lower retirement/inactive rates, and slighter better average health scores. These differences are likely due to the exclusion of women who never worked, who are more likely to be low-educated, retired or inactive, and in worse health. 6 Running models for "being in paid work" with interaction terms between "being below SPA" and education, marital status, number of children, and job-category, we find no statistically significant heterogeneous effects (available upon request), suggesting that the impact of the reform was quite similar across social groups.

T A B L E 1
Descriptive statistics for the whole sample and by pension eligibility status at interview date Note. The final sample includes women aged 60-64 between 2009 and 2016, observed either above-SPA (never affected by the reform) or below-SPA (affected by the reform), having been engaged in paid work in their life. Column (iv) reports the test for the null-hypothesis of mean-equivalence between Columns (ii) and (iii). The status of being above/below SPA is defined by comparing the individual SPA (based on month-year of birth) and the date of interview. The SPA postponement is a distance measured in months between the individual-specific SPA postreform and the prereform threshold of 60 years old. The job classification follows the National Statistics SEC-3 taxonomy. The GHQ cut-off refers to the Likert GHQ scale (range: 0-36) and takes value 1 for scores of 12+ (Goldberg et al., 1997 Column 2 indicates that being below the SPA due to the reform leads to a significant increase of 1.9 points in GHQ depression scores. Evaluated at the sample-average GHQ score of 30.4, this corresponds to an elasticity of 6.5%. The ES amounts to 14% of the GHQ standard deviation. Although similar ES have been defined as sizeable effects in recent studies in economics (Dustmann & Fasani, 2016), they would not be considered meaningful under the MID rationale (see Section 3.1). On the other hand, we estimate that the SPA reform increases the likelihood of suffering from common mental disorders (GHQ cutoff) by 6.2 percentage points (elasticity of 17%, Column 3). Negative effects (not statistically significant at 10%) are found for both the mental-and physical-health SF-12 scores. Table 3 explores the impact of the length of SPA postponement on health (model (3.3)). Confirming previous results, it suggests that a longer postponement of SPA leads to worse GHQ and MCS scores: for example, relative to women unaffected by the reform, the GHQ-score increases by 1.96 points for those with an SPA increase of 6-24 months (elasticity 6.4%) and by 3.1 for those with an increase of 36 months or more (+10.1%). The latter effect exceeds the MID threshold (small effect, ES = 0.21). A similar result is found for the likelihood of common mental disorders (GHQ-cutoff), and for the MCS, although the latter would not constitute a MID. No clear pattern emerges for PCS scores.

| SES heterogeneity
In Table 4 we show the net impact of being below-SPA for routine (Column ii), intermediate (iii) and managerial (iv) workers, based either on their current or last occupation (model (3.2)), alongside the results from the baseline model (3.1) in Column (i), and the sample average for each outcome. The effect of the reform on employment did not differ systematically by SES: among routine SES women, the reform increased employment by 10 percentage points. This effect is not statistically different than the employment effect for intermediate (+13.5) and managerial (+8) SES groups.
The negative effects of the SPA reform on aggregate measures of health is significantly stronger for women in the lowest occupations and not statistically significant for higher SES workers in managerial occupations. For routine workers, being below-SPA significantly increases GHQ depression scores and reduces physical health SF-12 scores. The effects observed for mental health are of clinical relevance: we estimate an increase of 3.5 points in the GHQ score (elasticity 11.4%), which exceeds the MID cutoff (small effect, ES = 0.25), and a 10.4-percentage point increase in the probability of depression based on the GHQ cut-off. Moreover, results from the analysis on the specific GHQ and SF-12 factors suggest that the lowest SES experience a significant decline in all dimensions of mental health. We also observe a statistically significant 2.4-point reduction in the PCS score (5.5%); although this effect on physical health is clinically relevant under the ES definition (small effect, ES = 0.2), it is not minimally important according to the anchoring approach (cutoff of 6.8 points).

| The wear-and-tear effect
We now investigate whether heterogeneity in the health effect of the reform can be explained by prolonged exposure to jobs characterised by different levels of demand and control at work. This is often referred to as the "wear-and-tear" effect of work, whereby each occupation carries a different level of physical and psychosocial occupational stress (Karasek, 1979;Ravesteijn et al., 2018). We employ a job-exposure matrices (JEM) recently built by Kroll and Lampert (2011)   Note. Columns 1 and 3 report Linear Probability Model estimates for being in paid-work (yes/no) and being above the GHQ cutoff of 12+ dichotomous index of job demand, which summarises five dimensions of occupational burden: Ergonomic Stress, Environmental Pollution, Mental Stress, Social Stress, and Temporal Loads. This measure has been externally validated (see Santi, Kroll, Dietz, Becher, and Ramroth (2013)) and recently applied in economics research (Mazzonna & Peracchi, 2017). We complement this measure with dichotomous index of high/low job control built by Solovieva et al. (2014) from a large survey on adult Finnish workers. The index summarises the degree of decision authority and skill discretion and has been externally validated. Details are included in Appendix 1.3. We were able to match both indices to 99% of our sample through respondents' current or last 4-digit ISCO code. Around 33% of the sample has (had) a high-strain job, that is, highly physically or psychosocially demanding, with low authority and discretion. This includes, for example, housekeeping and restaurant services, personal-carers, salespersons, cleaners, and machine operators. Among them, 85% belong to the routine SES (see Appendix 1.3).
We estimate model (3.1) interacting the SPA-eligibility, job-demand, and job-control indicators. 7 Results ( Table 5) provide evidence that a postponement of SPA has a significant negative impact only on the health of women in high straining occupations (high-demand, low-control): GHQ depression score increases by 3.8 points (27% of the variable's standard deviation) with a mean elasticity of 13%, MCS score drops by 1.1 points (12% of standard deviation, elasticity of 2.1%, only significant at 13%), and PCS drops by 2.1 points (17% of standard deviation, elasticity of 4.5%). The result Note. Columns 6, 8, and 9 report OLS estimates for the GHQ, MCS and PCS index (0 -least distressed; 100 -most distressed); Column 2 reports Linear Probability Model estimates for being above the GHQ cutoff of 12+ (0-36 scale; Goldberg et al., 1997). All estimates refer to Model for GHQ is a meaningful effect according to the MID criterion, with an ES of 0.27 SD. The probability of being clinically depressed (based on the GHQ cut-off) increases by 12 percentage points (average prevalence 33%). Finally, it is worth noting that the increase in labour market attachment is similar in magnitude and not statistically different between the group that is exhibiting a health decline and those who are not. Thus, results do not seem to support the hypothesis that the stronger effect on health for women in routine SES (or in high-strain jobs) is due to a stronger incentive to remain employed.

| Diagnosed diseases
We investigate whether the observed health decline triggered the onset of chronic conditions, which could significantly impact mortality and health care expenditure (Behncke, 2012). Given the recent implementation of the reform, we focus on conditions which can plausibly be affected in the short term (Table 6). For each disease, we exclude respondents who were diagnosed with the condition before entering the study (Moon, Glymour, Subramanian, Avendaño, and Kawachi (2012)). We estimate model (3.1) with a dichotomous dependent variable for

T A B L E 4
Heterogeneous effect of State Pension age postponement by SES  having been newly diagnosed with the disease since the previous interview. We interact the pension-eligibility dummy with the job-control and the job-demand dummies. Results in Table 6 show a statistically significant increase only in the probability of a doctor's diagnosis of clinical depression (+1.5 percentage-points), only among women with high-demand and low-control jobs (average incidence of new depression diagnoses = 1.1%, average baseline prevalence of diagnosed-depression = 5.9%). 8

| Placebo tests
We run a falsification test on the male population, whose work status should not be directly affected by the reform. Although males might adjust their retirement decision as a result of their wife's retirement age, we would expect to see 8 Due to data limitations, we cannot evaluate changes in healthy behaviours (e.g., drinking/smoking/exercising).

T A B L E 5
Heterogeneous effect of State Pension age postponement by job demand and job control level Note. We report OLS coefficients from a model based on (3.2), where we added a three-way interaction term between SPA eligibility (below-SPA), high/low job control, and high/low job demand, which are also added as separate covariates. Columns i-iv report the net effect of being below-SPA for routine (and nonroutine) groups with high-or low-demand jobs. Additional controls include socioeconomic status, fixed effects for age (in quarters), interview year (in quarters), living arrangements and marital status (married, widowed/divorced/separated, single), country, number of children (none, one-two, three, or more), education (low, mid, or high degree), and a linear control for year-and-month of birth. The routine classification follows the National Statistics SEC-3 taxonomy. The status of being above/below SPA is defined by comparing the individual SPA (based on month-year of birth) and the date of interview. High job demand was built and validated by Santi et al. (2013); the job control measure was built and validated by Solovieva et al. (2014); both are linked to respondents' ISCO code. weaker effects for men than for women. After assigning women's SPA to men, based on their birthdate, with find no effect of pension eligibility on male's employment or health (Table 7). We then focus on women who never engaged in paid work and who are unlikely to be induced to work by the SPA postponement. Hence, the reform should not affect this population's health. Due to a small sample (1,410 women aged 60-64 years), we have limited statistical power. However, results suggest that being below-SPA does not affect any health index among those women (Table 7).

| Income effect
Previous research has found that the SPA-change reduced after-tax individual income for women by 20%, it had a smaller effect on household income (−6%), and it increased absolute poverty rates between 6 and 8 percentage points, yet it had no impact on material deprivation (Cribb et al., 2016;Cribb & Emmerson, 2018). In our sample, we estimate that the SPA-reform reduced individual and household median after-tax income by 12% and 6.7%, respectively, and it increased absolute poverty rates by 6 percentage points (baseline poverty rate among untreated women is 11.7%). Full details are in Appendix 1.5.
A reduction in income may translate into negative mental health consequences, for example, by reducing the ability to afford basic goods or increasing the likelihood of individual and household indebtedness (Keese and Schmitz, 2014). We thus re-estimate our models by including household or individual log-income as additional regressor to examine whether this might affect the results (Table 8, Panel 1; models controlling for individual income are available on request). The results are very similar to those in models that do not control for income. Although not a conclusive test, this suggests that the negative effects of the reform on mental and physical health might not only be attributable to the reduction in income.

T A B L E 6
Effect of being below-SPA on new diagnosis of chronic conditions, by SES and job demand level Note. We report OLS coefficients from Model (3.1) where we added a three-way interaction term between SPA eligibility (below-SPA), high/low job control, and high/low job demand, which are also added as separate covariates. Columns i-iv report the net effect of being below-SPA on incidence of new diagnoses, for routine (and nonroutine) groups with high-or low-demand jobs. Additional controls include fixed effects for age (in quarters), interview year (in quarters), living arrangements and marital status (married, widowed/divorced/separated, single), country, number of children (none, one-two, three or more), education (low, mid, or high degree), and a linear control for year-and-month of birth. The routine classification follows the National Statistics SEC-3 taxonomy. The status of being above/below SPA is defined by comparing the individual SPA (based on month-year of birth) and the date of interview. High job-demand is measured through the JEM by Santi et al. (2013); job-control is measured through the JEM by Solovieva et al. (2014); both are linked to respondents' ISCO code. Standard errors are clustered by year-and-month of birth. The sample includes women aged 60-64 years between 2009 and 2016, observed either above-SPA (never affected by the reform) or below-SPA (affected by the reform), having been engaged in paid-work in their life, excluding those reporting to have been diagnosed with the specific disease at baseline. The share of respondents in the main sample (7,374 obs.) who, at baseline, had already been diagnosed for a specific condition is as follows: arthritis 0.311, coronary heart disease 0.012, angina 0.028, heart attack 0.019, liver condition 0.012, diabetes 0.069, high blood pressure 0.251, and clinical depression 0.059. Abbreviations: JEM, job-exposure matrices; SES, socioeconomic status; SPA, State Pension age.

| Econometric specification and common trend assumption
As our sample includes repeated observations for some individuals, we test that our results are robust to clustering the standard errors at the individual level (Table 8, Panel 2). Results are also robust to alternative specifications for age and time, such as adopting (a) a linear specification or (b) a quadratic specification for age-quarter and year-quarter (Table 8, Panels 3 and 4) or dropping the control for year-month-of-birth (upon request). 9 Moreover, implementing alternative GHQ cut-offs (e.g., Aalto, Elovainio, Kivimäki, Uutela, and Pirkola (2012)) leads to very similar results (available upon request). Our results are robust to alternative sample selections, to controlling for long-term conditions diagnosed prior to the policy, and to controlling for partner's age (Appendix 1.5). Furthermore, our effects are similar for respondents that had been diagnosed with a chronic condition and those who were free of a diagnosis prior to baseline (Appendix 1.5).
Our identification approach assumes that treated and control would have had similar health trends in the absence of the reform. We test whether birth cohorts treated by the reform (birth cohort 1950-1955) had different health levels and trends relative to cohorts unaffected by the SPA change (birth cohort 1944-1949)  We also examine health trends for treated and control prior to the reform. As our survey started in 2009, we explore data from the British Household Panel Survey (the precursor of Understanding Society, comparable in both sampling and variables collected), focusing on the GHQ score. We first test whether, in the years 1999-2008, GHQ scores were different between women aged 60-61 and 62-64 years (5,041 observations). Both groups are above-SPA between 1999 Columns ii-v report the net effect of being below-SPA for women with high or low job control and with high or low job demand. Abbreviations: GHQ, General Health Question; MCS, mental health score; PCS, physical health score; SPA, State Pension age. *p < 0.10. **p < 0.05. ***p < 0.01. and 2008, while 60-61-year-olds are affected by the SPA change after 2010. GHQ levels and trends were virtually identical for treated and control (see Figure 3 Panel (i) and Appendix 1.4 for details). Furthermore, we show that the two groups do not differ in the prevalence of functional limitations, using the English Longitudinal Survey of Ageing (years 2002-2008, see Appendix 1.4). Our approach also assumes that age effects are cohort-constant. This assumption may be violated if there is compression of morbidity, that is, cohorts born later have better health at any given age due of postponement of disease or disability (Jagger et al., 2016). This pattern, however, is unlikely to explain our results due to several reasons. First, if there was compression of morbidity, our findings would likely be lower-bound estimates because we find that later born cohorts (exposed to the pension reform) have worse health than cohorts born earlier (unexposed to the reform). If anything, compression of morbidity would lead to underestimation of the true causal effect of the reform. Second, our sample comprises a narrow range of age and cohort groups. This makes the assumption of cohort-constant age effects plausible, as compression of morbidity typically occurs over long time frames. Appendix Figure 5 illustrates the maximum range of birth-years that we observe for each age-group in the sample. As an example, among women aged 60 years, we observe women born between 1948 and 1956. Among women aged 63 years, the birth cohorts range between 1945 and 1950. Hence, the largest cohort differences are at most 8 or 5 years. It is unlikely that such differences would substantially impact the health conditions of women at a given age. We also note that there appears to be no consensus on whether the United Kingdom and other high-income countries are experiencing compression of morbidity. The evidence is, at best, mixed and strongly dependent on method and outcome (Crimmins & Beltrán-Sánchez,-2011;Gondek, Bann, Ning, Grundy, & Ploubidis, 2019;Rechel et al., 2013). For example, a recent study for England covering a 20-years time interval reports compression of morbidity for cognitive impairment and self-perceived health, but dynamic equilibrium for disability (i.e., less severe disability is increasing but more severe disability is not; Jagger et al., 2016).
Although not a conclusive test of morbidity compression, we followed Cribb et al. (2016) and performed an additional robustness test by allowing the age effects to vary by cohort. This is obtained by adding an interaction term between age dummies (in quarters) and birth-year dummies to a set of controls which included dummies for age (in quarters), dummies for time (in quarters), and dummies for birth-year. Because cohort effects are included using birth-year while age and time are measured in quarters, there is no perfect collinearity between the three set of variables. When adding the interaction term between age-quarter and birth-year, identification relies on the fact that women born in the same year may have different pension eligibility depending on their age-quarter at the time of interview: a person aged 62q1 born in 1952q1 would be eligible for pension, while a person aged 62q1 born in 1952q3 would not be eligible. Our results are robust and confirmed under this specification (results available upon request).

| Persistency of the effect for affected women
Our models might capture a very short-term increase in depression and anxiety for being "forced" to work longer in high-strain jobs to qualify for the pension (with a lower lifetime income). Data availability prevents us from establishing whether these effects are long-lasting: the first affected cohorts have only been exposed for a few months/years, while the most affected cohorts reach their SPA in 2017 or later. However, we show that the observed health effects do not dissipate over time for women who are still below SPA. We build a categorical variable capturing the amount of time a person has spent below SPA, at the time of interview, with four levels: having been "below SPA" for 1, 2, 3, or more years at the time of the interview (reference is "being above SPA"). We then estimate model (3.1), interacting the exposure variables with a simplified SES measure (low-SES vs. mid-high SES) as follows: y iat = α + β 1 Á belowSPA for 1 year ð Þ iat + β 2 Á belowSPA for 2 years ð Þ iat + β 3 Á belowSPA for 3 + years ð Þ iat + β 4 Á belowSPA for 1 year ð Þ iat Ã routine + β 5 Á belowSPA for 2 years ð Þ iat Ã routine + β 6 Á belowSPA for 3 + years ð Þ iat Ã routine + β 7 Á routine + γ a + η t + δ c + X 0 iat φ + ε iat:

T A B L E 9
Persistency of health effects for women below SPA Results in Table 9 (and Figure 4) suggest that the health effects are persistent for women in low SES, and, therefore, do not seem to reflect only a "surprise" effect.

| DISCUSSION AND CONCLUSIONS
Our results show that the increase in the female State Pension age in the United Kingdom widened health disparities between 2009 and 2016. Specifically, women in lower socioeconomic groups affected by the reform suffered from declines in mental health, which are economically and clinically relevant (a MID), and increased the prevalence of selfreported clinical depression diagnosis. We also found a statistically significant negative effect on physical health for women in the lower socioeconomic group. However, this effect is likely of less clinical relevance.
Our findings are consistent with evidence on-more limited-SPA reforms in the Netherlands (De Grip et al., 2012) and Israel (Shai, 2018) and further suggest that the main mechanism for the observed health deterioration is longer exposure jobs with high demand and low control at work. This can be interpreted in light of theoretical and empirical literature linking straining occupations with health capital degradation, lower productivity, and worse health (Barnay, 2016;Bildt & Michélsen, 2002;Chandola & Zhang, 2017;Fischer & Sousa-Poza, 2009;Paccagnella, 2016;Ravesteijn et al., 2018).
A possible alternative explanation may lie on the detrimental impact of the reform on after-tax individual and household income and on absolute poverty rates (see also Cribb and Emmerson (2018)). This may be partly due to the fact that rates of reemployment are relatively low for older workers due to their higher skill specificity and the risk of age discrimination (OECD, 2017b), especially for routine and manual workers, who are most affected by computerisation and offshoring (Autor & Dorn, 2009). Moreover, the reduction in income is stronger for women in lower socioeconomic groups, which are also those whose health has been most affected by the SPA reform. However, our results are not affected by the inclusion of income measures as covariates, suggesting that the health deterioration might not only be attributable to changes in income.
Our findings may also be attributable to the lack of awareness of women about the reform, as a late and unexpected SPA-change may disrupt pensions-plans, thus generating anxiety and depression (De Grip et al., 2012;Falba, Sindelar, & Gallo, 2009;van Solinge & Henkens, 2017). However, although before 2010 women in lower SES were indeed less aware of the SPA increase than higher SES women (Clery, Humphrey, & Bourne, 2009;MacLeod et al., 2012), recent evidence from Holman, Foster, and Hess (2018) shows that the SPA-change was "almost common knowledge" across all socioeconomic groups by the time the reform came into place (2010/2011), partly due to an extensive public information campaign. Our own findings (Table 3) suggest that the reform's health effects are larger for women facing larger postponements, who turned 60 years old since 2013, when SPA knowledge was widespread. All in all, this evidence does not seem to support lack of awareness as the main explanation for the increase in mental health problems.
Our results estimate the impact of an SPA-increase on health in the months or first years following the introduction of the reform. We are not able to establish whether these effects are long-lasting, as the first cohorts affected by the reform have only been exposed for a short time. Likewise, our models compare outcomes between cohorts exposed to the old-SPA and those that have been for the first time exposed to the new-SPA regime. One might argue that subsequent cohorts might react differently to an increase in the SPA, an assumption that can only be tested in future waves of Understanding Society.
Our results are robust to several sensitivity tests, including a falsification test for women who never worked and for men, as well as tests for the common-trends assumption. We also show that violation of the assumption of cohort-constant age effects is unlikely in our sample and would not provide a credible explanation for our results.
Although the State Pension age will continue to increase in OECD countries (OECD, 2017a), our findings suggest that negative health consequences of these reforms have been overlooked and should be considered in cost-effectiveness policy evaluations, as they might outweigh some of the potential benefits from later retirement. Mental illness impacts directly on health-care costs, disability benefit payments and service use, as well as indirectly, for example, by imposing a burden on caregivers and their productivity. Mental disorders hamper labour market productivity (Bubonya, Cobb-Clark, & Wooden, 2017) and are a leading cause of disability worldwide, being associated with the onset of other physical health conditions such as cancer, CVD, and diabetes, which may exacerbate their impact on productivity and the economy (Prince et al., 2007). As a result, mental illness constitutes a major burden for public budgets, estimated at more than 4% of GDP in the United Kingdom and in OECD countries (Arends, Baer, Miranda, Prinz, & Singh, 2014;OECD/EU, 2018;WHO, 2013).
There are two possible policy implications from our findings. First, the fact that the negative health effects of the reform are confined to women from highly demanding occupations raises potential questions about fairness, and whether eligibility rules should consider occupation as a potential criterion for SPA (Wester & Wolff, 2010). Second, national policies that increase the State Pension age may need to consider strategies to prevent negative health consequences for women in manual and routine occupations, for example, through inclusive labour market policies that facilitate a smooth transition to retirement (OECD, 2017b).