Population‐based Clinical Practice Research Datalink study using algorithm modelling to identify the true burden of hidradenitis suppurativa

Epidemiology data regarding hidradenitis suppurativa (HS) are conflicting and prevalence estimates vary 80‐fold, from 0·05% in a population‐based study to 4%.


Summary
Background Epidemiology data regarding hidradenitis suppurativa (HS) are conflicting and prevalence estimates vary 80-fold, from 0Á05% in a population-based study to 4%.Objectives To assess the hypothesis that previous population-based studies underestimated true HS prevalence by missing undiagnosed cases.Methods We performed a population-based observational and case-control study using the U.K. Clinical Practice Research Datalink (CPRD) linked to hospital episode statistics data.Physician-diagnosed cases in the CPRD were identified from specific Read codes.Algorithms identified unrecognized 'proxy' cases, with at least five Read code records for boils in flexural skin sites.Validation of proxy cases was undertaken with general practitioner (GP) questionnaires to confirm criteria-diagnosed cases.A case-control study assessed disease associations.Results On 30 June 2013, 23 353 physician-diagnosed HS cases were documented in 4 364 308 research-standard records.In total, 68 890 proxy cases were identified, reduced to 10 146 criteria-diagnosed cases after validation, extrapolated from 107 completed questionnaires (61% return rate).Overall point prevalence was 0Á77% [95% confidence interval (CI) 0Á76-0Á78%].An additional 18 417 cases had a history of one to four flexural skin boils.In physician-diagnosed cases, odds ratios (ORs) for current smoker and obesity (body mass index > 30 kg m -2 ) were 3Á61 (95% CI 3Á44-3Á79) and 3Á29 (95% CI 3Á14-3Á45).HS was associated with type 2 diabetes, Crohn disease, hyperlipidaemia, acne and depression, and not associated with ulcerative colitis or polycystic ovary syndrome.Conclusions Contrary to results of previous population-based studies, HS is relatively common, with a U.K. prevalence of 0Á77%, one-third being unrecognized, criteria-diagnosed cases using the most stringent disease definition.If individuals with probable cases are included, HS prevalence rises to 1Á19%.
What's already known about this topic?
• Previous population-based studies suggest that the prevalence of hidradenitis sup- purativa (HS) may be as low as 0Á05% but existing studies have not quantified undiagnosed cases.
*Plain language summary available online DOI 10.1111/bjd.16101What does this study add?
• Analysis of 4Á3 million research-standard records in the U.K. Clinical Practice Research Datalink demonstrates a HS prevalence of 0Á77% using the most stringent disease definition [95% confidence interval (CI) 0Á76-0Á78%], one-third being undiagnosed cases, rising to 1Á19% (95% CI 1Á18-1Á20%) if probable cases are included.
Hidradenitis suppurativa (HS), also known as acne inversa, is an inflammatory skin disease producing functional impairment similar in magnitude to cardiovascular disease, type 2 diabetes and renal failure, because of pain, purulent discharge and scarring. 1,2HS is defined clinically by characteristic lesions occurring chronically in flexural locations. 3,4Consensus disease definitions differ slightly in the chronicity element, ranging from a history of two to five boils in the axillae or groins. 5,6he prevalence of HS remains controversial, with estimates ranging from 0Á05% from an analysis of patient insurance claims in the U.S.A. 7 to 4% when young adult women were examined in person. 8An accurate prevalence figure is required to quantify the unmet need regarding care of people with HS to assess disease burden and plan healthcare provision.In particular, radical surgery or biological therapies are needed for severe disease that are costly and often require multidisciplinary teams; for example, in the U.K., the National Institute for Health and Care Excellence recently approved adalimumab for moderate-to-severe HS. 9 One explanation for the 80-fold difference in prevalence estimates currently available is lack of recognition of HS, in the context that the average diagnostic delay is 7 years. 10The aim of our study was to provide the most accurate estimate to date of HS prevalence by quantifying physician-diagnosed cases, and also unrecognized HS cases meeting the most stringent consensus disease definition for HS.A primary care data source was chosen because the majority of HS care takes place within a primary care setting.Our second aim was to assess HS disease associations with the largest case-control study performed to date.Accurate epidemiological data is critical at a time when treatment for HS is dramatically changing and now includes biological therapies.

Patients and methods
Our manuscript was prepared in accordance with RECORD (Reporting of Studies Conducted using Observational Routinely-collected Data) 11 (for the RECORD checklist see Appendix S1 in the Supporting Information).The Clinical Practice Research Datalink (CPRD) contains routinely recorded, pseudonymized data from participating primary care practices throughout the U.K. 12 Most diagnoses made in secondary care are transcribed into the CPRD by primary care staff.As of January 2015, it contained records from more than 15Á6 million patients registered at 684 practices.Patient data are labelled by the CPRD as being of acceptable research quality if the patient has been permanently registered at their general practice and has internally consistent records with regard to age, sex, registration and event dates.Approximately 75% of contributing practices based in England (representing 60% of the entire dataset) participate in a CPRD linkage scheme by which patient-level data are linked to the hospital episode statistics (HES) for England.The National Health Service (NHS) number, date of birth, sex and postal code of nondissenting patients are conveyed from the practice to a trusted third party and matched with equivalent identifiers in the target dataset.Only those linkages involving at least two exact matches, which must include an exact match on NHS number, are retained.
The study population was selected from patients with data of acceptable research quality.Physician-diagnosed cases of HS were captured by two specific diagnostic Read codes, M25y100 (Hidradenitis) and M25y111 (Hidradenitis suppurativa), or, in HES data, by the International Classification of Disease (ICD)-10 code L73Á2 (Hidradenitis suppurativa).To capture undiagnosed cases, CPRD Read code algorithms were created to identify patients attending primary care for multiple skin boil consultations (Table 1).An algorithm hierarchy was used with different degrees of stringency for each subalgorithm, based on the HS disease definition (Fig. 1).Some primary care practitioners may not record every skin boil consultation, particularly if the patient also attends with other health issues.To capture these potential cases, treatment codes were used to identify patients receiving multiple short courses of skin-directed antibiotic therapy, such as flucloxacillin, in combination with skin boil Read codes and in the absence of any other indication for the antibiotic, such as eczema, cellulitis or skin ulcers.Prior to validation, the cases of these patients were referred to as 'proxy cases'.HES data were not used to identify proxy cases to avoid double-counting boils recorded in both data sources.

Validation
Validation of proxy cases was carried out by sending questionnaires to the primary care physicians of a subset of patients identified from the algorithms.The questionnaire (Table 2) requested confirmation of characteristic skin lesions in flexural locations on at least five separate occasions, in keeping with the most stringent consensus definition of the disease. 6 total of 176 questionnaires were sent, with the overall number being determined by our study budget in the context that CPRD charges a fee for each questionnaire issued.The number of questionnaires generated to validate each subalgorithm was determined by the number of additional proxy cases identified through each, with a minimum of 10 questionnaires being issued for subalgorithms that captured relatively few cases.

Analysis
Point prevalence on 30 June 2013 was calculated using physician-diagnosed cases and criteria-diagnosed cases as the numerator and the 2013 mid-year population of live patients with research-standard records in CPRD as the denominator.The number of criteria-diagnosed cases was calculated by multiplying the number of proxy cases identified by each subalgorithm with the proportion of cases validated from the relevant set of GP questionnaires.
Annual incidence rates for physician-diagnosed cases were calculated from 1988, when up-to-research-standard practices first contributed to CPRD, until 2013.We used the Brameld-Holman backcasting method, which implements an actuarial retrograde survival curve to derive correction factors for the over-ascertainment of first events for individuals with only a limited duration of CPRD data collection prior to the event. 13he Brameld-Holman method accounts for the prevalent pool effect without the need for a clearance period and avoids data being discarded.

Case-control
Cases and controls were matched in a 1 : 1 ratio based on age, sex and registration at the same primary care practice.The associations between HS and environmental factors and other medical conditions were estimated using conditional logistic regression techniques.Two separate association analyses were conducted for physician-diagnosed cases compared with matched controls and for proxy cases (including criteriadiagnosed HS cases and those proxy cases who did not meet the validation criteria) compared with a further set of matched controls.Univariable associations were reported, and also multivariable analyses that examined combinations of variables demonstrating association in the univariable analysis with a  Yes/no/ unknown P-value < 0Á1.In a post hoc analysis we also tested whether there was evidence that the association between Crohn disease and HS was altered by smoking status because smoking is known to be a risk factor for both conditions.Our study protocol (Appendix S2; see Supporting Information), 15_020R, was prospectively approved by the CPRD Independent Scientific Advisory Committee.No additional institutional review board approval is required for CPRD studies.

Point prevalence
On 30 June 2013 there were 21 575 patients with physiciandiagnosed HS in the CPRD dataset and a further 1778 from the HES dataset; a total of 23 353 individuals with physiciandiagnosed HS.Our Read and treatment code algorithms captured a further 68 890 proxy cases in the CPRD; the breakdown for each subalgorithm is shown in Table 3.Of the 176 GP validation questionnaires sent, 107 (61%) were fully completed and returned; completion rates for each subalgorithm are in Table 3.
The proportions of questionnaires with 'yes' responses to the first three questions, satisfying the most stringent disease definition of at least five flexural skin boils, are documented in Table 3; from these, 10 146 criteria-diagnosed cases were extrapolated.Table 3 also documents the proportions within each subalgorithm confirmed to have had one to four flexural boils with 'yes' responses to questions one and two only, representing a group of probable HS cases.Overall, only one validation questionnaire confirmed the presence of scarring with a 'yes' answer to question four.
Using the denominator of 4 364 308 research-standard patients in the CPRD, the prevalence of physician-diagnosed and criteria-diagnosed HS cases on 30 June 2013 was 7Á7 per 1000 [95% confidence interval (CI) 7Á6-7Á8].When the 18 417 patients with one to four flexural skin boils were included, the upper limit of probable HS prevalence was 11Á9 per 1000 (95% CI 11Á8-12Á0).
The demographic profile of patients with physician-diagnosed HS is shown in Figure 2. Peak prevalence of 15Á1 per 1000 occurred in the fifth decade of life and the mean female : male ratio across all the age groups was 2Á9 : 1.

Incidence of physician-diagnosed cases
The Brameld-Holman actuarial retrograde survival curve used for the backcasting correction for incidence rates is in Figure S1 (see Supporting Information), with the resulting correction factors reducing the number of new physician-diagnosed cases prior to 1994.The mean annual incidence rate for physician-diagnosed cases from 1996 to 2013 was 28Á3 per 100 000 person-years (Table S1; see Supporting Information).Prior to 1995, despite the backcasting correction, annual incidence rates became progressively higher, ranging from 35Á7 per 100 000 person-years in 1995 to 122Á1 in 1988.

Case-control
Descriptive statistics for the occurrence of environmental factors and other medical conditions in cases and controls are in Table 4. Obesity [body mass index (BMI) > 30 kg m À2 ], current smoking and type 2 diabetes mellitus were strongly linked with physician-diagnosed cases of HS, with odds ratios (ORs) of 3Á29 (95% CI 3Á14-3Á45), 3Á61 (95% CI 3Á44-3Á79) and 3Á39 (3Á09-3Á71), respectively, in the univariable analysis (Table 4).There was a significant association between HS and Crohn disease but not ulcerative colitis.The link with Crohn disease was not stronger for smokers than nonsmokers (P = 0Á26 for the interaction test between Crohn disease and never or past/current smoker).Associations were also found a Criteria-diagnosed cases are defined as those individuals confirmed by their primary care physician to have a history of at least five flexural skin boils; b additional probable cases are defined as those individuals confirmed by their primary care physician to have a history of between one and four flexural skin boils.
between HS and hyperlipidaemia, pilonidal sinus, acne vulgaris, depression and hypertension, but not with inflammatory arthritis, polycystic ovary syndrome, psoriasis or Alzheimer disease (Table 4).Multivariable analysis confirmed the univariable analysis results in all cases (Table S2; see Supporting Information).

Discussion
Our HS prevalence figure of 0Á77% is derived from a population of 4Á3 million U.K. residents with research-standard medical records.Importantly, our study has identified patients with previously undiagnosed HS using the most stringent diagnostic criteria available.We have also identified an upper limit of probable HS prevalence of 1Á19%, if those with a history of one to four flexural skin boils on separate occasions are included.This is in the context that patients may not see their GP for every new skin boil, GPs may not record each skin boil, particularly if the consultation also covers other medical issues, and HS disease definitions vary in the number of flexural skin boils required, from two to five. 5,6ur two prevalence figures span the 0Á97% estimate for European prevalence derived from a self-reported questionnaire study of 10 000 members of the French population that  Based on 93 869 control individuals, 69 842 proxy cases including criteria-diagnosed hidradenitis suppurativa (HS) cases and those who did not meet the validation criteria, and 24 027 physician-diagnosed cases of HS.Some non-prevalent cases, meeting diagnostic criteria after 30 June 2013, are included.Smoking status was missing from a total of 10 055 records, including 2820 proxy cases and 592 physiciandiagnosed cases and missing data is included in the denominator for calculation of percentages.Body mass index data were missing from 12% of all records and, again, missing data are included in the denominator for calculation of percentages.
is often cited as the most accurate current estimate. 14In the French study, cases were identified by positive responses to one question within a general skin disease survey and the morphology of skin lesions was not assessed by healthcare professionals.One of the largest population-based HS investigations performed prior to our study used data from the Rochester Epidemiology Project (REP), which maintains a centralized electronic medical record database for the 144 000 people living in Olmsted County Minnesota, U.S.A. 15,16 Identification of HS cases used codes including synonyms of 'hidradenitis', and also the terms 'infection, sweat gland or inflammation, sweat gland'.Validation by chart review identified 178 HS cases, giving a prevalence of 0Á13%.More recently, interrogation of pooled insurance and self-pay databases covering 48 million people in the U.S.A. using ICD-9 coding for HS gave a point prevalence of 0Á10%. 17In both studies, identification of undiagnosed cases by considering multiple flexural skin boil presentations was not attempted.Part of the explanation for the lower U.S.A. prevalence figures than in our data is failure to capture undiagnosed cases.In addition, recognition and coding of HS by healthcare professionals will vary from region to region and variations in health systems may influence whether people with HS persevere with seeking medical attention for their condition, an issue requiring further investigation.
Our finding of peak HS prevalence in women in their fourth and fifth decades of life is in agreement with the French self-reported questionnaire study, in which the female : male ratio was 2Á7 : 1 and the mean age of patients was 43Á2 years. 14Our peak prevalence finding of 1Á51% in women in their fifth decade is lower than the 4% prevalence found by examination of a group of female healthcare professionals and those attending for benign skin lesions in Denmark, but the study had a relatively low sample size of 100, in which there were four cases. 8ur mean annual incidence of physician-diagnosed HS cases of 28Á3 per 100 000 person-years from 1996 to 2013 is more than four times higher than the 6Á0 per 100 000 person-years figure found using REP data for Olmsted County, Minnesota. 16he difference is in keeping with the higher prevalence figure for physician-diagnosed cases found in our study.The more recent study of the U.S.A. population using pooled insurance and self-pay databases found a HS incidence of 11Á4 (95% CI 11Á1-11Á8) per 100 000 population, double the incidence of the REP-derived estimate. 18Our calculated incidence rates were much higher in the first few years of available CPRD data, despite the backcasting correction to negate any prevalent pool effect.The observation is likely explained by bulk computerization of patient paper records during this period, producing an artefact in this result.
Our case-control study is entirely consistent with other, smaller studies previously performed.A recent systematic review of nine relevant studies containing 6174 patients and 24 993 controls confirmed an association between HS and obesity (OR 3Á45, 95% CI 2Á20-5Á38), in addition to an association with diabetes (OR 2Á85, 95% CI 1Á34-6Á08), current smoking (OR 4Á34, 95% CI 2Á48-7Á60) and past smoking (OR 6Á34, 95% CI 2Á41-16Á68). 19Our finding that HS is associated with Crohn disease but not ulcerative colitis is in agreement with other studies, including a cross-sectional investigation of 3267 patients with HS 20 and, in addition, we showed that the association between HS and Crohn disease is not modified by smoking status.The higher rates of smoking, type 2 diabetes mellitus, hypertension and hyperlipidaemia confirmed by our study are in keeping with the nearly doubled risk of cardiovascular-associated death in Danish patients with HS compared with controls (adjusted incidence rate ratio 1Á95, 95% CI 1Á42-2Á67). 21he strongest association demonstrated by our case-control study was between HS and pilonidal sinus, with an odds ratio of 5Á61.The result is expected because both conditions are members of the 'follicular occlusion tetrad' and there is ongoing debate about whether pilonidal sinus may be a phenotypic variant of HS.We confirmed a link between HS and depression, with an odds ratio of 1Á55, which is in keeping with a chronic, painful and socially isolating disease and in agreement with a study of psychiatric comorbidities in 3207 patients with HS (adjusted OR for depression 1Á7, 95% CI 1Á3-2Á1). 22However, we did not assess the timing of disease associations in our study and so cannot comment further on possible causation.
Debate continues regarding a possible link between HS and inflammatory arthritis.A syndrome known as PAPASH (pyoderma gangrenosum, acne, psoriasis, arthritis and suppurative hidradenitis) has been proposed but we did not find evidence of an association between HS and inflammatory arthritis or psoriasis. 23Our finding of no link between HS and polycystic ovary syndrome disagrees with a case-control study based on 2292 patients in the U.S.A. with at least one billing code for HS, which found an odds ratio of 13Á7 (95% CI 4Á00-47Á3). 24However, the study found a HS prevalence of only 0Á08%, reflecting limited case-capture from billing codes, and so case ascertainment bias may have affected the results.We specifically checked for any link between HS and Alzheimer disease because of reports of loss of function gamma secretase gene mutations in some Han Chinese patients with HS and a small proportion of European families with HS. 25,26 Missense gamma secretase gene mutations are implicated in familial Alzheimer disease. 27We found no association between HS and Alzheimer disease, in agreement with a recent casecontrol study from Denmark, 28 although it is possible that our relatively young cohort of patients might subsequently develop dementia later in life.
Considering our study limitations, some relevant data is missing within the CPRD and it is likely that these will not be missing at random because patients who are cases are likely to have greater completeness because of their condition, increasing GP contacts and providing more opportunity for the data items to be recorded.Coding imperfections may affect results, mitigated by our validation of undiagnosed cases using the GP questionnaire.Study budget limitations, rather than statistical considerations, restricted the number of validation questionnaires that were sent and so a sampling error is possible.
We did not validate physician-diagnosed HS cases because false positives are unlikely in the context that HS is a clinically diagnosed, characteristic skin disease with a low chance of being confused with other skin conditions.Review of medical records for patients who had received at least two ICD-9 705Á83 codes for 'hidradenitis' in the Massachusetts General Hospital database was previously conducted, in the context that the 705Á83 code includes neutrophilic eccrine hidradenitis, and recurrent palmoplantar hidradenitis, in addition to HS. 29 Confirmation of HS was obtained in 89Á6% of cases, while inclusion of the terms 'hidradenitis' or 'HS' in the medical record gave positive predictive values of 99Á6% (95% CI 98Á9-99Á9%) and 100% (95% CI 95Á8-100%), respectively, for identification of chart-verified HS.To minimize any risk of false positives we used only the Read codes for 'hidradenitis' and 'hidradenitis suppurativa' in our search for physiciandiagnosed cases in the CPRD and used the specific ICD-10 code for 'hidradenitis suppurativa', L73Á2, when searching the HES data.
Validation in person would have been ideal but is not possible within the ethical parameters of the CPRD.Validation using a GP questionnaire to check for the consensus definition of five flexural skin boils would be prone to false negatives.This is because five skin boil consultations may not be recorded as separate entries in the medical record in those with a known diagnosis.For the same reason, we did not analyse the performance of our algorithms in detecting physician-diagnosed cases.Descriptive statistics comparing controls with proxy cases and diagnosed cases support a conclusion that the proxy group contains mainly patients with HS and also some individuals who do not have HS, whereas the physician-diagnosed HS group is representative of a HS population.For example, smoking is a known HS risk factor and our physician-diagnosed group contained 51% active smokers, compared with 40% in the proxy group and 24% in controls.
In conclusion, analysis of the 4Á3 million U.K. people with research-standard medical records in the CPRD has permitted the most comprehensive estimate of HS prevalence so far obtained, identifying individuals with physician-diagnosed and criteria-diagnosed cases in primary care.Prevalence of HS in the U.K. is 0Á77% using the most stringent validation criteria of five flexural skin boils, rising to a maximum possible prevalence of 1Á19% if patients with a history of one to four flexural skin boils are included.Our results demonstrate that HS is a common condition, highlighting the relative lack of HS clinical trial evidence to date.For example, the recent HS Cochrane review included only 12 randomized controlled trials, involving only 615 participants, one-seventh of the number of trials and participants included in the updated Cochrane review of vitiligo, a condition with a similar prevalence. 30,31Availability of an accurate prevalence figure is one of the key foundations to drive further research in the field of HS.

Fig 1 .
Fig 1. Algorithm hierarchy used to capture unrecognized cases of hidradenitis suppurativa.Schematic representation of subalgorithms used to detect unrecognized hidradenitis suppurativa with differing degrees of stringency.

Fig 2 .
Fig 2. Demographics of physician-diagnosed hidradenitis suppurativa cases.Line graph of the prevalence rate of physician-diagnosed hidradenitis suppurativa in U.K. males, females and in total, subdivided by decade of age.Superimposed is a histogram of the female : male ratio for each decade of age.Prev, prevalence.

Table 2
Validation questionnaire sent to the primary care physicians of a subset of proxy hidradenitis suppurativa cases 'rope-like' scars, atrophic (depressed) scars or sinus tracts in affected sites?

Table 3
Subalgorithm validation from completed questionnaires returned by primary care practitioners

Table 4
Case-control study univariable analysis a