SEARCH

SEARCH BY CITATION

Keywords:

  • human papilloma virus;
  • cervical cancer;
  • prediction modeling

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Knowledge of a country's cervical cancer (CC) burden is critical to informing decisions about resource allocation to combat the disease; however, many countries lack cancer registries to provide such data. We developed a prognostic model to estimate CC incidence rates in countries without cancer registries, leveraging information on human papilloma virus (HPV) prevalence, screening, and other country-level factors. We used multivariate linear regression models to identify predictors of CC incidence in 40 countries. We extracted age-specific HPV prevalence (10-year age groups) by country from a meta-analysis in women with normal cytology (N = 40) and matched to most recent CC incidence rates from Cancer Incidence in Five Continents when available (N = 36), or Globocan 2008 (N = 4). We evaluated country-level behavioral, economic, and public health indicators. CC incidence was significantly associated with age-specific HPV prevalence in women aged 35–64 (adjusted R-squared 0.41) (“base model”). Adding geographic region to the base model increased the adjusted R-squared to 0.77, but the further addition of screening was not statistically significant. Similarly, country-level macro-indicators did not improve predictive validity. Age-specific HPV prevalence at older ages was found to be a better predictor of CC incidence than prevalence in women under 35. However, HPV prevalence could not explain the entire CC burden as many factors modify women's risk of progression to cancer. Geographic region seemed to serve as a proxy for these country-level indicators. Our analysis supports the assertion that conducting a population-based HPV survey targeting women over age 35 can be valuable in approximating the CC risk in a given country.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

The successful introduction of cervical cancer (CC) screening has reduced the CC burden dramatically in developed countries. However, CC is still estimated to be the one of the leading causes of cancer among women worldwide, with approximately 530,000 new cases and 275,000 deaths each year.1 An estimated 85% of the world's CC cases occur in developing countries.2 CC is a paradigm of global health disparity, disproportionately affecting young women from the poorest countries and the most disadvantaged populations.

Quantifying CC rates is a crucial first step towards prevention as it provides important information to policy makers when determining the resources necessary to combat the disease. The most accurate measure of CC incidence can be derived from population-based registries, which provide estimates of disease occurrence in a well-defined population.3 Quality and completeness of data collection as well as accurate measurement of population denominators are essential components of cancer registries. Unfortunately, creating and maintaining accurate cancer registries takes a great deal of resources, and infrastructure for case finding and reporting is often lacking in developing countries. In addition, many cases of CC may go undiagnosed, and therefore unreported. Lack of census data in low-resource countries often prevents the estimation of accurate population denominators.4 Population-based cancer registries cover only 21% of the world's population, and just 8% is covered by registries considered to be of high quality and unbiased enough to produce accurate estimates, such as those included in Cancer Incidence in Five Continents, a peer-reviewed book series of registry data; just 4% of these registries are located in Asia and only 1% in Africa.2 Although registries are frequently national in developed countries, developing countries often have registries that only cover small subpopulations in mainly urban areas that may not be representative of the total population. In the absence of accurate registry data for many countries, prediction models are often utilized to estimate cancer rates. These models are based on historical, usually local, data on cancer incidence and mortality that are linearly projected to estimate current incidence. When historical registry data are unavailable, regional estimates from neighboring countries may be used.2

In contrast to many other types of cancer, CC has an identified necessary cause, namely infection with oncogenic human papillomavirus (HPV). Estimating HPV prevalence in a population via a cross-sectional study is relatively easy compared to the enormous financial costs and logistics associated with creating and maintaining a CC registry. Therefore, a unique opportunity may arise to estimate CC burden in countries without population-based registries using HPV prevalence as a proxy. However, HPV is not a sufficient cause of CC as other independent factors can impact a woman's progression to cancer. Screening, for example, is an important modifier of CC risk. Screening coverage and quality can be difficult to quantify, particularly in developing countries without organized CC screening programs.5

The aim of this analysis is to develop a prognostic model to estimate CC rates in countries without population-based cancer registries as a function of HPV infection detected in the general population of women, geographic region, and other potentially relevant country-level factors. Estimating CC burden is of particular importance now that many countries may be developing strategies to implement HPV vaccination and novel low-cost CC screening techniques.

Material and Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Data collection

We identified 41 countries where age-specific prevalence data on HPV infection were available, namely: Algeria, Argentina, Belgium, Brazil, Canada, Chile, China, Colombia, Costa Rica, Egypt, Finland, France, Greece, Honduras, India, Indonesia, Ireland, Italy, Japan, Kenya, South Korea, Lithuania, Mexico, Mongolia, Morocco, Mozambique, Netherlands, Nigeria, Paraguay, Peru, Philippines, Poland, Senegal, South Africa, Spain, Switzerland, Taiwan, Thailand, United States, United Kingdom, and Vietnam.

Data were obtained as follows:

HPV prevalence

Prevalence of HPV infection was derived from a meta-analysis of age-specific HPV prevalence in women with normal cytology; methods are described in detail elsewhere.6 Briefly, a systematic literature review of PubMed using the keywords “human papillomavirus,” “cervical cancer,” or “normal cytology” was conducted. Studies were restricted to those published between 1995 and 2010 that used polymerase chain reaction or Hybrid Capture 2 for HPV detection. Estimates in women with normal cytology were utilized whenever possible to provide comparable estimates across countries and avoid overrepresentation of women with abnormal cytology obtained from convenience samples. For countries with multiple studies, weighted regression models were used to provide a composite estimate of HPV prevalence. We calculated age-specific prevalence in 10-year age groups as well as age-standardized prevalence for women aged 25–64 using the WHO standard population.7 For both estimates, we calculated the prevalence of HPV-16 alone and a composite of all types (high and low risk).

Cervical cancer incidence

CC incidence was obtained from the International Agency for Research on Cancer (IARC) Cancer Incidence in Five Continents, which provides age-specific cancer incidence and population size from national or regional registries.8 Registry data from the most recent year available were used. For countries where IARC registry data were not available (n = 4), we utilized GLOBOCAN 2008 estimates.9 Age-standardized CC incidence rates in women between 25 and 64 years of age were calculated using the WHO standard population.7 Crude and age-specific CC incidence (in 10-year age groups starting at age 25) was also calculated.

Other risk factors

Each country was categorized into geographic region based on UN classification.10 When available, estimates of country-level screening coverage were obtained from National Reproductive Health Surveys, which were extracted from the WHO/ICO HPV Information Centre database.11 These surveys were conducted on a representative sample of adult women. For countries without survey data, population-based cohort studies, and cross-sectional surveys of women attending screenings were utilized. Because of the heterogeneity of the survey data, which differed in age range of women surveyed, lifetime screening frequency, and screening method, we created a categorical variable of low screening (less than 20% coverage), medium screening (20–50% coverage), and high screening (over 50% coverage).

Country-level indicators were obtained from several published sources to evaluate potential correlates of CC, including sexual and reproductive behavior, health, economic, demographic, and development indicators. Major sources of data included the UNDP World Development Indicators,12 The World Health Report 2006,13 World Bank Development Indicators,14 The WHO World Health Statistics,15 and UNICEF's The State of the World's Children 2000.16 Whenever possible, indicators from the years 2006 to 2008 were utilized. We used the United States CIA World Factbook to determine each country's predominant religion.17

Statistical analysis

Statistical analyses were completed using Stata Version 9.0.18 We used multivariate linear regression to estimate the association between predictors and outcome. CC incidence was explored as a crude, age-specific, and age-standardized rate to determine the best fit to the data. All continuous variables were explored for linear fit, abnormal patterns, and residual distribution. Possible interactions between exposure variables were evaluated. A model with HPV prevalence predicting CC incidence was considered the “base model” and other variables were evaluated in terms of their contribution to the base model using the likelihood ratio test. We chose to develop several models with differing numbers of predictors to determine the optimal model. Models were developed both manually and using a stepwise procedure retaining predictors that had a p-value of ≥ 0.20. The models were tested for heteroskedasticity of residuals, leverage, sensitivity to outliers, omitted variables, and overall goodness of fit.19 Sensitivity analyses using countries with IARC registries only were completed. Model performance was assessed using the adjusted R-squared as an estimate of the variability explained by each model. China was removed as an outlier, since it was shown to have high leverage and large residuals because of its unusually low CC incidence rates, despite high HPV prevalence and low screening levels.

Model validation and country-specific examples

Internal validation was assessed using a split sample procedure. Twenty-five percent, or 10 of the 40 countries, were removed using stratified random sampling within each geographic region. The models were estimated on the remaining 30 countries and used to predict CC incidence rates in the 10 removed countries. Models found to have a considerably lower predictive ability (as measured by their R-squared) in the 10 removed countries compared to their model R-squared were thought to have poor internal validity. A limited external validation was conducted by predicting CC incidence rates in five countries that were not used to develop the model because of incomplete available data on HPV prevalence. Each country's age-standardized HPV prevalence was used in place of missing values of age-specific HPV prevalence. Predicted CC incidence from the external validation of the model containing HPV prevalence and geographic region was then graphed against reported incidence rates to create a visual representation of model performance.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Correlation between HPV prevalence and CC incidence

After a comprehensive assessment of several possible methods of estimating CC incidence and HPV prevalence, age-standardized CC incidence rates in women between the ages of 25 through 64 and HPV prevalence in age groups 35–44, 45–54, and 55–64 were selected as the indicators providing the best fit. CC incidence rates were transformed via square-root to create a normally distributed outcome. Composite HPV prevalence was found to be a better predictor of CC incidence than HPV 16 alone (data not shown). Figure 1 shows the overall association between age-standardized HPV prevalence and CC incidence which had an R2= 0.30. The relation between age-specific CC incidence and HPV prevalence showed slightly higher correlations as age group increased (Supporting Information Appendix, Table 1 and Figs. 1–5), with a jump in correlation in the 55–64 age category, which showed the strongest association: R2= 0.41 (Fig. 2).

thumbnail image

Figure 1. Association between a country's age-standardized HPV prevalence and its square-root age-standardized CC incidence, both in women ages 25–64 (N = 40 countries).

Download figure to PowerPoint

thumbnail image

Figure 2. Association between a country's HPV prevalence in women ages 55–64 and its square-root age-standardized CC incidence in women ages 25–64 (N = 40 countries).

Download figure to PowerPoint

Statistical models

Three different models were ultimately selected (Table 1): Model 1, the base model containing HPV prevalence only; Model 2, containing the base model with the addition of screening coverage; and Model 3, which contained the base model and geographic region.

Table 1. Multivariate association between square-root CC incidence, HPV prevalence and other predictors (N = 40)
inline image

The base model (Model 1, Table 1), which contains HPV prevalence in three age categories from 35 to 64, had an adjusted R-squared of 0.41, indicating that 41% of the variability in CC could be explained by HPV prevalence in these ages. HPV prevalence in the under 25 and 25–34 age categories were removed using stepwise regression as they were not strong predictors of CC when compared to prevalence at later ages. HPV prevalence was also found to be stronger predictor of CC incidence in developing compared to developed countries (Supporting Information Appendix, Table 1).

Adding screening coverage to the base model (Model 2, Table 1) increased the amount of variability explained to 50%. Screening was explored as both a dummy and ordinal variable, but showed better fit as an ordinal variable categorized as low, medium, and high. Unlike HPV prevalence, screening was shown to be a stronger predictor in developed compared to developing countries (Supporting Information Appendix, Table 1).

The addition of geographic region to the HPV base model (Model 3, Table 1) yielded a model explaining 77% of the variability in CC incidence. However, adding screening to this model was not statistically significant (Supporting Information Appendix, Table 2, Model 4), and the likelihood ratio tests comparing the two models showed that the addition of screening did not explain more variability than would be expected by chance.

We explored additional models which contained all of the previously identified variables, along with established cofactors of CC incidence (parity, HIV prevalence, tobacco use, oral contraceptive use) and socioeconomic and health-related indicators including predominant religion, GDP per capita, maternal mortality ratio, neonatal mortality ratio, measles immunization, female literacy, female and male healthy-adjusted life expectancy, births attended by skilled health attendants, number of doctors per 100,000 population and prevalence of tuberculosis. We used manual and stepwise selection to choose a model with indicators that provided the best correlation (Supporting Information Appendix, Table 2, Model 5). Although this model had a higher adjusted R-squared than those previously developed, it showed poor internal validity, indicating overfitting (Supporting Information Appendix, Table 3).

Model validation and country-specific examples

Results from the internal validation found that Models 1 through 3 had increasing R-squared values, explaining 48, 71, and 84% of the variation in CC incidence, respectively (Supporting Information Appendix, Table 3), displaying strong internal validity. The limited external validation demonstrated that rank order of the models remained the same as that of the internal validation, with Model 3 demonstrating the best fit (Supporting Information Appendix, Figs. 6–10). Figure 3 provides an illustration of Model 3 performance in predicting CC burden for five countries. Reported incidence in four of the five countries fell within the 95% confidence interval of model estimates.

thumbnail image

Figure 3. Comparison of Model 3 (model containing HPV prevalence and geographic region) CC incidence estimates and 95% confidence intervals with reported CC incidence rates in five countries not used to develop the model.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Our ecological analysis provides a comprehensive assessment of a number of factors that may be used to predict a country's CC burden. We found that HPV prevalence in women with normal cytology is a strong predictor of CC incidence. Indeed, age-specific HPV prevalence in women ages 35–64 explained 41% of the variation in CC burden. After assessing a number of different methods to estimate HPV prevalence, we found that a composite value of all HPV types generated a better correlation than HPV 16 alone, which may be because composite HPV served as a proxy indicator for sexual mixing. Furthermore, a number of other HPV types besides HPV-16 cause CC. HPV prevalence was a better predictor of CC incidence in developing compared to developed countries, indicating that there may be more modifying factors in developed countries (e.g., better treatment for women with precancerous lesions, higher quality screening, or greater screening coverage). HPV prevalence at later ages was shown to be a better predictor of CC incidence compared to that of women under 35, with prevalence in the oldest age (55–64) providing the strongest correlation by far (Supporting Information Appendix, Table 1 and Figs. 1–5). These results are similar to those observed in previous studies.20 HPV prevalence in women at older ages is more likely to be the result of persistent infections, unlike incident infections generally found in younger women, which often resolve without developing cervical neoplasia. Studies have shown that persistent detection of high-risk HPV among cytologically normal women drastically increases their risk of future CC.21–23 Therefore, HPV prevalence in older women is more indicative of actual CC risk.

Although we found HPV prevalence predicts roughly half of the CC burden, it cannot explain the entire variability in CC, as many factors influence a woman's progression to disease. Geographical region serves as a proxy to control for country-level differences contributing to CC risk, such as screening coverage and quality, health infrastructure, and access to treatment for precancerous lesions. The model containing geographical region and HPV prevalence in women aged 35–64 accounted for 77% of the variation in CC burden and had strong predictive ability while maintaining parsimony. Although CC screening category was a good predictor in a model of only HPV prevalence and screening, it was no longer statistically significant when added to the model of HPV prevalence and geographic region (Supporting Information Appendix, Table 2, Model 4), possibly because both variables are explaining the same variability in CC. In addition, CC screening was found to be a better predictor of CC incidence in developed countries, likely because many developing countries lack organized screening programs (Supporting Information Appendix, Table 1). Nevertheless, CC screening is an important modifier of CC risk, as it allows for the identification and treatment of precancerous lesions to prevent the progression to CC. However, screening is difficult to quantify, as the type, quality, and referral to treatment vary greatly among countries. It is possible that the geographic region categories accounted for the differences in screening practices among countries so the screening variable is no longer necessary. In addition, the categories of low, medium, and high may be too broad to truly capture the effect of screening. More research is needed to accurately describe the quality and frequency of screening, as well as the access to treatment for women found to have precancerous lesions.

To account for the heterogeneity within each screening category, we created an additional model with several macrolevel indicators which were shown to correlate with CC incidence in previous studies and may serve as a proxy for screening quality and access to treatment24 (Supporting Information Appendix, Table 2, Model 5). Although this model explained a large amount of the variability in CC incidence, it displayed poor internal validity, indicating overfitting. In our analysis, we found HPV prevalence together with geographical region was the most effective and parsimonious variable to predict CC burden. However, as a result of the limited information available on age-specific HPV prevalence in the literature, our models contained only 40 country-level observations, which does not allow a thorough analysis of macrolevel indicators. Therefore, we cannot rule out the predictive ability of these variables; a study with a larger dataset may be able to more accurately assess their contribution.

Figure 3 shows that reported CC incidence in Norway, Sweden, Croatia, and The Russian Federation all fell within the 95% confidence intervals of model estimates while incidence in Denmark was higher than estimated; this result may be due to the low HPV prevalence estimates for Denmark, which were similar to those of Sweden, although the reported CC incidence rates for Denmark were higher. Model prediction was generally conservative, with lower projected estimates than those reported, with the exception of The Russian Federation, where model estimates were higher than reported incidence, largely due to the high HPV prevalence estimates we obtained for The Russian Federation. These results demonstrate the sensitivity of the model to HPV prevalence estimates and highlight the need for accurate prevalence data to inform model predictions. Because HPV prevalence estimates were incomplete for the countries used for external validation, with missing data on HPV prevalence in the 45–54 age group (the strongest predictor of CC incidence) in all countries, the results are likely a lower bound of model performance.

An ecological study of this type has several limitations. First, current CC incidence rates are the result of exposure from decades prior, not current HPV prevalence, as was used in our model and the number of years between reported CC incidence and measured HPV prevalence varied by country. Second, CC incidence and prevalence data were not always estimated in the same region of the country, and rates for both may vary by region. Indeed, heterogeneity in CC burden throughout the country may have been partly responsible for China's poor model fit. HPV prevalence data was largely obtained from rural areas of China where CC incidence is high,25 while registry data on CC incidence were from cities where the burden is lower. Future analyses may want to consider evaluating the ability of region-specific HPV prevalence in a country to predict that same region's CC incidence. Third, HPV prevalence was estimated from a meta-analysis of numerous studies, many of which used different screening assays (HC2 or PCR) and included a different spectrum of HPV types. This may introduce heterogeneity in the prevalence data that makes it harder to capture true geographic differences. Finally, HPV prevalence data in developing countries are limited, so our model included both developed and developing countries. It may be difficult to generalize the effect of screening and treatment in developed countries to developing countries. When more data are available on HPV prevalence and CC incidence in developing countries, this analysis should be revisited.

CC carries an enormous cost of lost life years for women in developing countries, and this burden is projected to increase in the absence of adequate intervention.5 Because countries without cancer registries are often the ones with the highest CC burden, it is essential to develop tools to quantify CC risk. Models providing a general estimate of CC incidence can be useful to policymakers as they decide how to best allocate resources to prevent CC. Our analysis support the assertion that conducting a population based HPV survey targeting women over age 35 may provide valuable information in approximating the CC risk in a given country. In the absence of accurate national registry data, prediction models are a promising method to quantify CC burden.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

The study sponsor had no role in the study design, collection, analysis, or interpretation of data, the writing of the report, or the decision to submit the paper for publication. FXB, XC, SS, and LB have all received institutional support from HPV vaccine trials and epidemiological studies sponsored by GlaxoSmithKline, Merck and Sanofi Pasteur MSD, screening and HPV testing trials partially supported by Qiagen, and personal support in the form of travel grants to scientific meetings and honorarium for consultancy occasionally was granted by either GlaxoSmithKline, Merck, Sanofi Pasteur MSD, Roche or Qiagen.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Supporting Information

Additional Supporting Information may be found in the online version of this article.

FilenameFormatSizeDescription
IJC_27835_sm_SuppInfo.docx71KSupporting Information Tables

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.