National Breast and Cervical Cancer Early Detection Program data validation project
This article has been contributed to by US Government employees and their work is in the public domain in the USA.
The findings and conclusions in this article are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
The objectives of this study were to evaluate the quality of national data generated by the National Breast and Cervical Cancer Early Detection Program (NBCCEDP); to assess variables collected through the program that are appropriate to use for program management, evaluation, and data analysis; and to identify potential data-quality issues.
Information was abstracted randomly from 5603 medical records selected from 6 NBCCEDP-funded state programs, and 76 categorical variables and 11 text-based breast and cervical cancer screening and diagnostic variables were collected. Concordance was estimated between abstracted data and the data collected by the NBCCEDP. Overall and outcome-specific concordance was calculated for each of the key variables. Four screening performance measures also were estimated by comparing the program data with the abstracted data.
Basic measures of program outcomes, such as the percentage of women with cancer or with abnormal screening tests, had a high concordance rate. Variables with poor or inconsistent concordance included reported breast symptoms, receipt of fine-needle aspiration, and receipt of colposcopy with biopsy.
The overall conclusion from this comprehensive validation project of the NBCCEDP is that, with few exceptions, the data collected from individual program sites and reported to the CDC are valid and consistent with sociodemographic and clinical data within medical records. Cancer 2014;120(16 suppl):2597-603. © 2014 American Cancer Society.
The Breast and Cervical Cancer Mortality Act passed by Congress in 1990 guided the Centers for Disease Control and Prevention (CDC) in establishing the National Breast and Cervical Cancer Early Detection Program (NBCCEDP). This legislation authorized the CDC to establish the first national federally supported program to increase access to and use of breast and cervical cancer screening and diagnostic services for low-income women who are uninsured or underinsured. The NBCCEDP is administered through cooperative agreements with state and territorial health departments, tribes, and tribal organizations. This program currently funds all 50 states, the District of Columbia, 5 US territories, and 12 American Indian and Alaska Native tribes or tribal organizations. Since their inception in 1991, NBCCEDP-funded programs have screened over 4.5 million women and have diagnosed 62,121 breast cancers, 3458 invasive cervical cancers, and 163,548 premalignant cervical lesions (available at: http://www.cdc.gov/cancer/nbccedp/about.htm; Accessed May 28, 2014).
To monitor and assess program performance, all NBCCEDP programs are required to submit surveillance data to the CDC for women who received services within the program. Clinical services are provided to eligible women through program-established networks of more than 22,000 providers across different settings. Programs collect clinical data from these providers through a variety of systems and standardize the data to report to the CDC. These data, referred to as the Minimum Data Elements (MDEs), provide valuable information for understanding who uses the program and for assuring the provision of complete and timely service for women screened. The CDC reviews the MDEs regularly to monitor and evaluate the program; to report results to Congress, program partners, and the public; and to analyze the data for publication in reports and professional journals.[2-8] Although data are routinely assessed at the program level, we conducted a supplemental evaluation of data quality to assess the accuracy of the NBCCEDP MDE data maintained by the CDC. For a sample of records, we abstracted clinical information directly from the medical records and compared those data with the information submitted to the CDC in the MDEs by NBCCEDP-funded programs.
MATERIALS AND METHODS
Procedures for protecting human subjects were completed at the CDC as well as within the participating programs. This project evaluated data that are collected for screening and diagnostic services provided to NBCCEDP-eligible women through many service delivery mechanisms. Primarily, care was delivered by participating physicians, clinics, and hospitals or through a combination of these providers and county health departments. Trained medical record abstractors collected data from sampled medical records. A computer-based tool was used to collect the information from the medical chart. Specific rules were developed for data abstraction to ensure consistency in the data-collection process. We selected 6 of the largest state programs for the study, which contributed greater than 30% of the MDE data for the relevant period to validate the largest proportion of the MDE data in the most efficient way. For the project, we targeted programs with larger screening volumes to ensure sufficient overall sample sizes without excessive burden on individual providers. After selecting the participating programs, we selected 181 unique providers from a list of participating NBCCEDP providers. We determined the record type by the screening test type (either mammogram or Papanicolaou [Pap] test), whether the screening test led to additional diagnostic follow-up, and whether the diagnostic follow-up led to a cancer diagnosis (Table 1). In total, we sampled 5603 breast and cervical cancer screening records from the 6 state programs and included screenings that took place between late 1996 and the end of 2004. We selected the case records using a 2-stage, stratified, random sample design. We selected program providers at the first stage, selecting the largest providers with certainty. At the second stage, we stratified the program records of the selected providers into 6 types: normal, abnormal, or cancer screening outcomes for both mammography and Pap tests. To ensure sufficiently powered samples of all record types, we selected cancer screening outcomes with certainty. We assessed data quality by comparing screening, diagnostic, and final diagnosis data from the CDC MDE database with data obtained from the medical records.
Table 1. The Number of Abstracted Medical Records by Record Type: 1996 to 2004
|Mammography screening|| |
|No diagnostic follow-up||1076|
|Diagnostic follow-up with no diagnosis of cancer||1069|
|Diagnostic follow-up with a diagnosis of cancer||816|
|Pap test screening|| |
|No diagnostic follow-up||1074|
|Diagnostic follow-up with no diagnosis of cancer||887|
|Diagnostic follow-up with a diagnosis of cancer||681|
Medical chart abstractions began in May 2004 and continued through July 2006. We abstracted all sampled records within each state over a period of 6 to 8 weeks. Medical records were located in various provider settings, including hospitals, local and state health departments, screening centers, and mobile facilities.
We developed a customized, laptop-based data abstraction tool for the collection of 76 categorical variables and 11 text-based variables, which captured text from all diagnostic test reports and pathology reports. The variables selected for evaluation were those that were considered most relevant for assessing the quality and timeliness of program services. Text fields were added to collect the verbatim results of all medical tests, including the radiology and pathology reports of mammograms and biopsies, in addition to the information captured as categorical variables. Demographic variables collected included date of birth, race, and ethnicity.
Screening variables that were collected for breast cancer early detection included previous mammogram (yes or no) with date, symptoms (yes or no), clinical breast examination (CBE) performed and results (normal or abnormal), and screening mammogram result (Breast Imaging Reporting and Data System [BI-RADS] codes: normal, benign, probably benign, suspicious abnormality, highly suggestive of malignancy, assessment incomplete). In the overall and outcome-specific concordance estimates for screening mammogram results, we excluded 686 records (23.2%) that did not have a code collected directly from the radiology report. These excluded reports primarily occurred before 1999, before the full adoption of BI-RADS coding by radiologists. Screening variables that were collected for cervical cancer early detection included previous Pap test (yes or no) with date and Pap result (1991 Bethesda system categories: normal, infection/reaction, atypical squamous cells [ASC] of unknown significance [ASCUS], low-grade squamous intraepithelial lesion, high-grade squamous intraepithelial lesion, or invasive squamous cell cancer; 2001 Bethesda system categories subdivided ASCUS into 2 categories: ASC of undetermined significance (ASC-US) and ASC-cannot exclude high-grade intraepithelial lesions (ASC-H)).
Diagnostic variables that were abstracted both for breast cancer and cervical cancer included the types of procedures performed, final diagnosis, and status of treatment. At least 10% of the NBCCEDP records were reabstracted by a second abstractor to monitor inter-rater reliability throughout the abstraction process, maintaining an average inter-rater reliability of 95%.
MDE data were collected from 1996 through 2004. To compare results between the abstracted data, which we considered the “gold standard,” and the corresponding MDE data, we estimated 4 performance measures: abnormal mammogram screening and abnormal Pap test rates and breast and cervical cancer detection rates. These outcome-specific screening rates are routinely used in analyzing the MDEs.
We estimated overall concordance as the percentage agreement between the values of the corresponding abstracted and MDE variables. Because many of the outcomes are relatively rare, overall concordance has limited utility for assessing the MDE data quality. Therefore, to evaluate MDE data quality more thoroughly, we calculated outcome-specific concordance for each outcome level of the evaluated variables. For each outcome-specific concordance, we identify the abstracted records with the given outcome, then link these abstracted records to the corresponding MDE records. The outcome-specific concordance is defined as the percentage of these linked MDE records that also have the given outcome. For example, in the screening mammography results evaluation variable, we identify the abstracted records with a “normal” outcome (ie, no diagnostic follow-up) value and link these abstracted records to the corresponding MDE records. The outcome-specific concordance is the percentage of these linked MDE records that also indicates a “normal” mammography screening value. In the assessment of both overall and outcome-specific concordance, we excluded records with missing abstracted values because of archived or otherwise unavailable medical charts for the corresponding variable.
We report 95-percent confidence intervals (CIs) for both overall and outcome-specific concordance estimates. We weighted all concordance estimates to account for the differential sampling rates by screening record type and outcome. We adjusted the weights to account for providers who had either refused to participate or had closed and for missing or archived sample records that we could not replace and had to exclude from the project. To estimate the standard errors of all weighted concordance estimates appropriately, we used SUDAAN statistical software (RTI International, Research Triangle Park, NC) to account for the complex sample design.
We analyzed the data abstracted from 5603 medical charts for 1996 through 2004 and compared those data with the MDE data reported in the NBCCEDP. To assess whether the differences between the abstracted/true values and the MDE data values altered program outcome measures significantly, we estimated 4 outcome rates for the abstracted data and the MDE data (Table 2). There were no significant differences between the abstracted and MDE data in these performance measures.
Table 2. Outcome Performance Measures in Abstracted Data and Minimum Data Elements: 1996 to 2004
|Abnormal mammogram|| || |
|Breast cancer detection|| || |
|Abnormal Pap|| || |
|CIN 2 or worse|| || |
Demographic variables were analyzed to assess potential differences between abstracted and MDE data among patient-specific variables. These variables included date of birth and race and ethnicity (Table 3). Date of birth was recorded very accurately (97.9%). Race and ethnicity were often missing from the abstracted medical records and from a substantial proportion of reports in the MDE data. Despite missing race and ethnicity reports from 5.6% of all medical records in the data abstraction, there was a high overall concordance of available race and ethnicity data, indicating that a result entered into the MDEs for race and ethnicity was usually accurate. Among records with reported racial and ethnic data, concordance ranged from 98.4% among non-Hispanic whites to 82.4% among non-Hispanic Asians or Pacific Islanders. Conversely, much of the race and ethnicity discordance (288 records) occurred among those records classified as undocumented in the abstraction.
Table 3. Overall and Outcome-Specific Concordance for Demographic Variables: 1996 to 2004
|Non-Hispanic white||2444||98.4 (97.0-99.1)|
|Non-Hispanic black||999||94.5 (90.0-97.0)|
|Non-Hispanic Asian/Pacific Islander||75||82.4 (37.6-97.3)|
|Non-Hispanic Alaskan Native/American Indian||175||85.3 (69.7-93.6)|
|Date of birth: Same date||5602||97.9 (96.9-98.5)|
Data items that were validated to assess the recording of medical procedures related to the screening examinations for both breast and cervical cancer are presented in Table 4. Overall concordance for whether a woman had had a previous mammogram was 92.8%. MDE data were more accurate for determining whether a woman did have a previous mammogram than whether she had not had a previous mammogram. Overall concordance of the MDE and abstracted data for the date of the previous mammogram was approximately 90%, whereas breast symptoms had an overall concordance of approximately 88%. Breast symptoms were noted in the medical chart about 33% of the time where the MDE data reported no symptoms. However, if abstraction verified that no symptoms were reported, then concordance of the MDE data was 97.1% (95% CI, 95.4%-98.2%). The overall concordance for whether or not a CBE was performed was 86.5% (95% CI, 72.3%-94%). However, if a CBE was identified from the medical record during abstraction, then, in almost all cases, the MDE data also indicated that a CBE had been done. The overall concordance for screening mammography result was 91.6% (95% CI, 87.6%-94.4%).
Table 4. Overall and Outcome-Specific Concordance for Cancer Screening Variables: 1996 to 2004
|Breast cancer screening variables|| || |
|Previous mammogram||2716||92.8 (89.4-95.2)|
|Month of previous mammogram||1726||88.3 (84.6-91.2)|
|Year of previous mammogram||1964||90.7 (86.3-93.8)|
|Breast symptoma||1962||87.9 (84.1-90.9)|
|Clinical breast examination performed||2961||86.5 (72.3-94.0)|
|Clinical breast examination result||2813||84.5 (68.9-93.1)|
|Screening mammogram resultsb||2275||91.6 (87.6-94.4)|
|Probably benign||57||87.6 (69.0-95.7)|
|Suspicious abnormality||200||92.1 (84.9-96.0)|
|Highly suggestive of malignancy||175||83.1 (67.1-92.2)|
|Assessment incomplete||950||77.6 (62.6-87.7)|
|Cervical cancer screening variables|| || |
|Previous Pap test||2642||81.0 (74.3-86.2)|
|Month of previous Pap test||1438||80.4 (74.7-85.0)|
|Year of previous Pap test||1683||86.3 (82.1-89.6)|
|Pap test results||2642||89.9 (86.6-92.5)|
Whereas the concordance for diagnostic mammography was 97%, the accuracy of the MDEs in reporting whether a diagnostic mammogram was conducted was 70.2% (Table 5). The overall concordance rate for fine-needle aspiration (FNA) was high (99.6%), because the proportion of women who had an FNA was very low. The outcome-specific concordance for having an FNA was low at 58.2%. The concordance both for ultrasound and for breast biopsy was approximately 99%. Among cervical cancer diagnostic variables, the overall concordance for colposcopy with or without biopsy was very high (98.8% and 99.7%, respectively). However, the outcome-specific concordances for having had a colposcopy with or without biopsy were <65%.
Table 5. Overall and Outcome-Specific Concordance for Cancer Diagnostic Variables: 1996 to 2004
|Breast cancer diagnostic variables|| || |
|Diagnostic mammogram performed||2961||97.0 (96.0-97.8)|
|Fine-needle aspiration||2961||99.6 (99.2-99.8)|
|Repeat clinical breast examination and surgical consult||2961||94.5 (92.7-95.9)|
|Breast ultrasound||2961||99.0 (98.5-99.3)|
|Breast biopsy||2961||99.6 (99.3-99.7)|
|Breast cancer diagnosis||2961||99.3 (98.9-99.5)|
|Not diagnosed/refused/lost||2116||99.96 (99.93-99.98)|
|Breast cancer||774||95.6 (84.5-98.9)|
|Treatment for breast cancer started||774||87.6 (78.3-93.2)|
|Cervical cancer diagnostic variables|| || |
|Colposcopic biopsy||2,642||99.7 (99.3-99.9)|
|Colposcopy with biopsy||2,642||98. 8 (97.3-99.4)|
|Cervical cancer diagnosis||2,642||99.6 (99.2-99.8)|
|Not cancer/undocumented||1,951||99.91 (99.86-99.94)|
|CIN 2 or worse||648||86.8 (79.9-91.6)|
|Treatment for cervical cancer started||648||86.5 (81.9-90.1)|
The concordance for whether or not a woman had had a previous Pap test was about 80% (Table 4). The outcome-specific concordance for having had a previous Pap test was higher, at 88.3%. However, the accuracy of the MDEs in identifying whether a woman did not have a previous Pap test was far lower (69.7%). This indicates that the MDEs report a previous Pap in instances where the medical record did not provide evidence of a previous screening. The concordance for Pap test results was 89.9% (95% CI, 86.6%-92.5%).
Participating programs in the NBCCEDP have collected and submitted to the CDC data from breast and cervical cancer screening and diagnostic services for over 4.5 million women screened. These data provide valuable information for monitoring and assessing program performance for the underserved women who are eligible for the program. Programs are required to conduct data management, quality-assurance, and evaluation activities. A CDC data management contractor provides technical assistance to programs, including software tools for data collection and validation, feedback reports, and systematic data-quality reviews conducted twice a year. Programs use these and other tools to identify invalid, incomplete, or unexpected variations within their local data systems and to produce provider feedback reports for assessment and data correction. Programs engage in other quality-assurance activities, including record audits, provider training on data reporting systems, and routine use of data for program monitoring. Although the program employs data-quality systems to monitor program services, a comprehensive evaluation of the validity of the national aggregate data at CDC has not been conducted previously. Consequently, the purpose of this project was to assess the correlation between data maintained in CDC's MDE database and patients' medical records.
There were protocol inconsistencies between the validation abstraction and the NBCCEDP in the capture of “other” race, multiple races, and missing race or ethnicity data. Validation protocols dictate that all data elements are abstracted as they appear in the medical records, such that abstracted race and ethnicity data were classified as undocumented if not reported on the medical record. In contrast, NBCCEDP administrative records were a valid source of these data for the MDEs.
We reviewed several other problematic variables to determine how data quality could be improved. The MDE data often did not report any breast symptoms although they were noted in the medical charts. During data abstraction, we determined that 1 program had not collected symptom information but assumed a result of “no symptoms.” On the basis of this finding, the CDC has directed all programs to report only information that is substantiated in the medical records.
Among the screening mammography variables, the outcome-specific concordance for “assessment incomplete” was lower than that for other mammography outcomes. This lower concordance is likely related to confusion about capturing the BI-RADS code when mammography is immediately followed by additional imaging studies. In many instances, additional imaging studies are performed on the same day as the initial mammogram, and the radiologist assigns a final BI-RADS code on the radiology report after additional tests are complete. The use of diagnostic mammography and ultrasound before assignment of a final BI-RADS code is more frequent in the United States than that reported in other countries. The common radiologist practice of recording only a final BI-RADS code contributes to the difficulty in capturing the results of more than 1 test conducted on the same day, noting that it was not uncommon for the final BI-RADS code to be reported as the screening mammography outcome. In this circumstance, according to MDE guidelines, providers are directed to code these mammograms as “assessment incomplete” and not reflect the final BI-RADS code of the diagnostic mammography. Assessing the complete diagnostic workup leading to the final imaging result for the women within the program is an important measure of service quality and cost. To reflect diagnostic workup more accurately in the MDE data, updates to the NBCCEDP protocols were made after this study to capture all diagnostic testing procedures and results.
The low concordance for certain Pap test results may be because of the change in the reporting system during the period from which the records were abstracted. The Bethesda system used for reporting specific Pap results changed in 2001, adding new categories (ASC-H) that were not part of the old system. Abstracted data included results from both the previous 1991 Bethesda system and the updated 2001 reporting system.
The outcome-specific concordance for undergoing a colposcopy with biopsy was <75%. Pathology reports were available for standard biopsies, endocervical curettage (ECC), or loop electrosurgical excision procedures. Confusion about what was truly considered a biopsy and whether the biopsy was done in conjunction with a colposcopy could explain the low outcome-specific concordance. ECC was a common procedure noted in the abstracted medical records, in which the final diagnosis was often determined based on the pathology reports associated with the ECC. If a standard cervical biopsy and an ECC were conducted at the same time, then the structure of the MDE data fields could not accommodate separate ECC and biopsy pathology report results. The recommendation to the program was to clarify what is meant by a cervical biopsy and to collect separate information about the occurrence and resulting pathology from an ECC. In 2009, the MDE was revised to report ECC, loop electrosurgical excision procedures, and cold knife conization as separate procedures in addition to colposcopy with biopsy.
To assess the validity of the MDE data, we assumed that the medical chart abstraction was the “gold standard.” Although this is a common assumption, there are limitations to this approach. There were instances in which the medical record did not clearly indicate what took place or specific information was missing. Because we excluded records with missing abstracted values from the concordance calculations, the extent of missing data also must be taken into consideration when using concordance measures to assess MDE data validity. Translating medical data into specific variable definitions is not always straightforward, so that different interpretations are sometimes possible for the same results. Protocol inconsistencies between the validation abstraction and the NBCCEDP in the capture of “other” race, multiple races, and missing race or ethnicity data resulted in conservative other/multiracial concordance estimates. In addition, because the program accessed administrative records to capture race or ethnicity data that may have been missing from the medical records, the undocumented outcome-specific concordance estimates are not meaningful.
In addition, records are archived based on individual program protocols or provider needs, so that the sampled records were not always available. Reports from providers of diagnostic services were not always available in the referral providers' charts. For example, if a woman had an abnormal screening test and was referred to a specialty provider for diagnostic follow-up, then there were instances in which the specialist's notes or consultation report were missing from the referral provider's records. The absence of information from the medical chart could mean that either a test or procedure was not done or that data were missing from the medical chart. Also noteworthy is the possibility that the results may not be generalizable to lower volume providers, because the sampling frame excluded these providers to guarantee a sufficient sample of records.
Currently, as the only national cancer screening and early detection program in the United States, the NBCCEDP has screened over 4.5 million women. The NBCCEDP MDE database captures screening and diagnostic services data across a variety of clinic settings, containing a wealth of information regarding patterns of care, screening and follow-up, and outcome measures. Although the NBCCEDP employs data-quality systems to monitor program services, a comprehensive evaluation of the validity of the data had not been previously conducted. The purpose of this project was to assess the correlation between data maintained in the CDC's national MDE database and patient medical records. On the basis of the results from the analysis, a list of recommendations was developed to address issues that were identified for particular variable outcomes. Primarily, these recommendations focused on further clarification of variable definitions to improve the consistency of data collected from the various programs. These recommendations led to the following changes implemented in 2009 to the MDE reporting requirements for the national program: 1) report cancer stage from the cancer registry, 2) report the final imaging assessment, and 3) report more detailed information from the cervical biopsies. These reporting changes will improve the quality and completeness of the MDEs.
The overall conclusion from this comprehensive validation project of the NBCCEDP database is that, with few exceptions, the data collected from individual program sites and reported to CDC as MDEs are valid and consistent with sociodemographic and clinical data within medical records.
This Supplement edition of Cancer has been sponsored by the US Centers for Disease Control and Prevention (CDC), an Agency of the Department of Health and Human Services, under the Contract #200-2012-M-52408 00002.This work was supported in part by contract MTS2002-Q-000618 from the CDC.
CONFLICT OF INTEREST DISCLOSURES
The authors made no disclosures.