• breast cancer;
  • electronic medical records;
  • bioinformatics;
  • Surveillance, Epidemiology, and End Results (SEER) registry;
  • data linkage;
  • outcomes research;
  • comparative effectiveness


  1. Top of page
  2. Abstract


Understanding of cancer outcomes is limited by data fragmentation. In the current study, the authors analyzed the information yielded by integrating breast cancer data from 3 sources: electronic medical records (EMRs) from 2 health care systems and the state registry.


Diagnostic test and treatment data were extracted from the EMRs of all patients with breast cancer treated between 2000 and 2010 in 2 independent California institutions: a community-based practice (Palo Alto Medical Foundation; “Community”) and an academic medical center (Stanford University; “University”). The authors incorporated records from the population-based California Cancer Registry and then linked EMR-California Cancer Registry data sets of Community and University patients.


The authors initially identified 8210 University patients and 5770 Community patients; linked data sets revealed a 16% patient overlap, yielding 12,109 unique patients. The percentage of all Community patients, but not University patients, treated at both institutions increased with worsening cancer prognostic factors. Before linking the data sets, Community patients appeared to receive less intervention than University patients (mastectomy: 37.6% vs 43.2%; chemotherapy: 35% vs 41.7%; magnetic resonance imaging: 10% vs 29.3%; and genetic testing: 2.5% vs 9.2%). Linked Community and University data sets revealed that patients treated at both institutions received substantially more interventions (mastectomy: 55.8%; chemotherapy: 47.2%; magnetic resonance imaging: 38.9%; and genetic testing: 10.9% [P < .001 for each 3-way institutional comparison]).


Data linkage identified 16% of patients who were treated in 2 health care systems and who, despite comparable prognostic factors, received far more intensive treatment than others. By integrating complementary data from EMRs and population-based registries, a more comprehensive understanding of breast cancer care and factors that drive treatment use was obtained. Cancer 2014;120:103–111. © 2013 American Cancer Society.


  1. Top of page
  2. Abstract

Advances in breast cancer diagnosis and treatment[1-4] offer many effective options, and raise questions about the comparative effectiveness of different care pathways.[5-7] National initiatives prioritize comparing the effectiveness of treatments in diverse practice settings,[8-10] requiring demographic and long-term follow-up data from their populations.[11-13] Studies of real-world cancer outcomes, outside of clinical trials, have been limited by the fragmentation and lack of detail in available data. Population-based registries such as the Surveillance, Epidemiology, and End Results (SEER) Program excel at tracking demographics and incidence, but lack essential details regarding treatments and diagnostic tests.[14, 15] Institutional electronic medical records (EMR) contain extensive treatment information; however, they are subject to a measurement bias of unknown magnitude, namely the underreporting of care delivered outside the institution and its outcomes.

Linking EMR-derived data across health care systems offers the promise of more complete information, as well as the challenge of disagreement between institutions, which may require laborious review of patients' charts for resolution. We linked data from the EMRs of an academic medical center and a multisite community practice in the same catchment region. To provide a gold standard for patient identification and treatment summaries, we also linked to the statewide population-based California Cancer Registry (CCR; a SEER component).[16] Our hypothesis was that this 3-way data linkage would offer a practical and scalable approach for identifying patients treated in more than 1 health care system, and would provide information concerning variability in cancer care that could not be obtained otherwise.


  1. Top of page
  2. Abstract

Data Resource Environment

Our project (Oncoshare) began in 2009 to integrate data from EMRs of Stanford University Hospital (SU) and the Palo Alto Medical Foundation (PAMF). SU is an academic medical center; PAMF is a multisite community practice located in Alameda, San Mateo, Santa Clara, and Santa Cruz counties in California. SU (“University”) is within 1 mile of the nearest PAMF (“Community”) site. Community patients have health maintenance organization and fee-for-service insurance; University patients have various insurance plans, including Medicaid. Although inpatient care provided by Community physicians sometimes occurs in University facilities, the institutions are legally and financially separate, with nonoverlapping staff. All research was approved by University and Community Institutional Review Boards and the State of California Institutional Review Board (for use of CCR data).

Clinical Data Extraction

We extracted data from University and Community EMRs (Epic Systems, Verona, Wis) and from a University warehouse for clinical data collected before Epic implementation in 2007. All University clinical systems data since the mid-1990s reside in the Stanford Translational Research Integrated Database Environment (STRIDE), a warehouse and integration platform for research data extraction and analysis.[17] Real-time electronic data feeds supply clinical information to STRIDE via Health Level Seven International (HL7) technology; extract, transform, and load processes out of Epic and into STRIDE occur daily. STRIDE contains 1 terabyte of data in the form of transcribed dictations and physicians' text notes, billing codes, laboratory and pharmacy orders, medication and radiotherapy administration records, laboratory results, and radiology and pathology reports. University chemotherapy data have been available from the Epic Beacon provider order entry system since 2008. Community clinical data are housed in 3 EMR systems: Epic for everything except chemotherapy orders; IDX medical billing software (GE Healthcare, Pittsburgh, Pa) for billing information; and IntelliDose (IntrinsiQ, AmerisourceBergen Specialty Group, Burlington, Mass), an ancillary computer system dedicated to chemotherapy and used since 2000. To ensure uniform coding, chemotherapy data elements in each EMR were mapped to RxNorm,[18] a standardized drug lexicon, and diagnostic test data elements were mapped to National Cancer Institute codes.[19] We identified clinically important interventions, including surgery, chemotherapy, and radiotherapy, and emerging diagnostic tests (breast magnetic resonance imaging [MRI], positron emission tomography [PET], and genetic testing for BRCA1 and BRCA2 [BRCA1/2] mutations). We excluded interventions occurring > 90 days before cancer diagnosis.

CCR Data Addition

We requested CCR records, with all data fields including age, race/ethnicity, tumor stage, tumor grade, histology, receptors (estrogen receptor [ER], progesterone receptor [PR], and human epidermal growth factor receptor 2 [HER2]), and treatment summaries (comprising reports from any California institution of a patient having receiving surgery, chemotherapy, and/or radiotherapy) for all patients with breast cancer treated at University and/or Community facilities from 2000 through 2010. Census block groups were geocoded based on patients' residential addresses at the time of diagnoses. The 3% of individuals whose address could not be precisely geocoded were assigned to a census block group within their county of residence. We assigned neighborhood socioeconomic status (SES) using a previously developed and widely used index that incorporates 2000 US Census data regarding education, income, occupation, and housing costs, based on selection via principal components analysis.[20] We categorized this measure by quintiles based on the distribution of the composite SES index across California. CCR and EMR records were linked using names, social security numbers, medical record numbers, and birthdates. All personal identifying information was removed, and clinical encounter dates were randomly offset by 30 days before research use of the data.[21]

Patient Cohort Identification

We defined cohorts representing all patients treated for breast cancer at Community and/or University facilities from January 1, 2000 through January 1, 2010. Eligible patients were female, aged ≥ 18 years, and met at least 1 of the following criteria within the period: 1) the CCR reported a breast cancer diagnosis and/or treatment at Community and/or University facilities; and 2) University and/or Community billing records included a diagnostic code for breast cancer or ductal carcinoma in situ (International Classification of Diseases, 9th revision [ICD-9] codes 174.9 or 233.0), billed by a breast cancer specialist (defined as a surgeon, medical oncologist, or radiation oncologist). The treating institution was based on clinician affiliation, not location; a Community surgeon operating at the University was coded as Community. Institution was determined first by EMR-based billing records. Patients who had University records of undergoing breast cancer-specific interventions (surgery, chemotherapy, and radiotherapy) were coded as University, and likewise for Community, as confirmed by the CCR. For patients lacking treatment records, the institution was defined by billing records for cancer-related diagnostic tests including PET and genetic testing, and if there were no such records, by presence in University or Community internal tumor registries, which report to the CCR. MRI was not used to determine treating institution because before 2006 some Community patients visited the University for MRI only. After generating separate University and Community cohorts (defined hereafter as “EMR-CCR cohorts”), we linked these 2 EMR-CCR cohorts to identify patients treated at both institutions.

Quality Assurance and Analytical Cohort Development

We validated and applied an algorithm to link records across data sources.[21, 22] To ensure subjects' eligibility, we developed analytical cohorts, from which we excluded patients lacking data regarding all of the following (considered essential for analyzing breast cancer care): stage of disease, tumor receptors (ER, PR, and HER2), and any diagnostic or treatment intervention. We applied more stringent inclusion criteria for patients identified in EMRs only but not in the CCR, because review of physicians' notes and pathology reports in EMRs revealed that many such patients had received breast cancer ICD-9 codes erroneously, often coincident with prophylactic mastectomy or tamoxifen used for breast cancer risk reduction. These stringent inclusion criteria were cancer-specific pathology data (stage and/or tumor receptors) and treatments (chemotherapy and/or radiotherapy). This algorithm was applied within each institution before linking EMR-CCR cohorts, and to the overall cohort after linkage.

Statistical Analysis

Patient characteristics, receipt of treatments, and diagnostic tests were tabulated before and after linkage of University and Community EMR-CCR cohorts. After linkage, measures for patients treated at University, Community, and both institutions (“Both”) were compared using the chi-square statistic. All P values were 2-sided.


  1. Top of page
  2. Abstract

Analytical Cohorts

We identified a maximally inclusive University cohort of 8892 patients. Applying our eligibility criteria left 8210 patients (92.3%) in the University analytical cohort. Repeating these steps, we identified a maximally inclusive Community cohort of 6304 patients, and retained 5770 (91.5%) in the Community analytical cohort; adding these cohorts produced an apparent total of 13,980 patients. Linked records from the University and Community EMR-CCR cohorts yielded a maximally inclusive cohort of 13,238 unique patients, of whom we retained 12,109 (91.5%) in the combined analytical cohort (Fig. 1).


Figure 1. Patient identification and inclusion are shown in analytical cohorts for (Top) University, (Middle) Community, and (Bottom) University and Community combined.

Download figure to PowerPoint

Patient Characteristics Before and After EMR-CCR Cohort Linkage

Before linking University and Community EMR-CCR cohorts, University patients appeared to be younger, with a lower SES and worse cancer prognostic factors than Community patients (Table 1). Linked EMR-CCR cohorts identified a third group of patients who were treated at both institutions (defined hereafter as “Both”). Both patients were significantly more likely to be Asian (University-only, 14%; Community-only, 13.9%; and Both, 17.2%) and of highest-quintile SES (University-only, 49.2%; Community-only, 64.6%; and Both, 75.2%). Both patients had intermediate prognostic factors, including age (< 40 years: University-only, 10.9%; Community-only, 3.7%; and Both, 10%), stage (III or IV: University-only, 13.6%; Community-only, 6.8%; and Both, 10.2%), tumor receptor subtype (for the poor-prognosis subtypes,[23] HER2-positive or ER-, PR-, and HER2 negative: University-only, 29.1%; Community-only, 14.5%; and Both, 25.9%), and grade (grade 3: University-only, 32.3%; Community-only, 19.8%; and Both, 29.5% [P < .001 for each reported 3-way comparison]). As prognostic factors worsened, including decreasing age, increasing stage of disease, increasing grade, and less favorable receptor subtype,[24-26] an increasing percentage of Community patients (but not University patients) fell into the Both category.

Table 1. Patient Characteristics, Ascertained Before and After Linking University and Community EMR Data
 Before Linking DataAfter Linking Data
CharacteristicUniversityCommunityUniversity OnlyCommunity OnlyBothPercentage in Both
No.%No.%No.%No.%No.%University Community
  1. Abbreviations: Community, community-based practices; EMR, electronic medical record; HER2, human epidermal growth factor receptor 2, HR, hormone receptor (estrogen receptor [ER] and progesterone receptor [PR]); University, university-based practices.

  2. a

    P value was derived using the chi-square statistic (<.001) for comparison between patients from university-based practices, community-based practices, and both after EMR data linkage.

  3. b

    HR-positive tumors were positive for both ER and PR and HR-negative tumors were negative for both ER and PR. Receptor subtype was not available for patients with stage 0 disease, because HER2 was not tested.

Age at diagnosis, ya            
Y of breast cancer diagnosisa            
Socioeconomic statusa            
Lowest quintile2993.6%370.6%2934.6%310.8%60.3%2.0%16.2%
Second quintile6527.9%1622.8%6039.5%1122.9%512.7%7.8%31.3%
Third quintile91611.2%3005.2%82513.1%2075.3%934.9%10.1%31.0%
Fourth quintile148718.1%100217.4%119318.9%71418.4%29415.5%19.8%29.2%
Highest quintile453355.2%393068.1%311249.2%251164.6%143075.2%31.5%36.3%
Tumor receptor subtype (stages I-IV)a            
Missing data for any receptor234932.1%230044%134626.4%145045.7%33421.8%19.9%18.7%
HR-positive, HER2-negativeb207042.1%207039.6%227544.6%126639.9%80452.4%26.1%38.8%
HR-negative and HER2-negative (triple-negative) b29210%2925.6%59711.7%1564.9%1368.9%18.6%46.6%

Treatments and Diagnostic Tests, Before and After EMR-CCR Cohort Linkage

Treatment information was most often available from the CCR, but diagnostic test information was available only from EMRs, through providers' notes and billing (Table 2). For example, CCR data identified approximately 95% of all women with evidence from any source of having undergone mastectomy, but institution-specific data identified only 25% to 50% of these cases. For women in the Both category, the institution-specific data performed better, reflecting a greater yield from combining EMR-derived data from 2 institutions. For chemotherapy, Community billing data offered somewhat more complete case finding than those from the University. Linked University and Community EMR-CCR cohorts revealed that the use of all interventions was highest among the Both patients. For example, mastectomy use was 39.7% for University-only, 30.5% for Community-only, and 55.8% for Both, and these data were similar for bilateral mastectomy (University-only, 8%; Community-only, 5.2%; and Both, 13.2%). Figure 2 illustrates the differential use of MRI among patients in the University-only (32.9%), Community-only (32.8%), and Both (66%) categories by 2009 (P < .001 for each 3-way comparison).

Table 2. Diagnostic Test and Treatment Use, Ascertained Before and After Linking University and Community EMR Data
 Before University-Community EMR Data LinkageAfter University-Community EMR Data Linkage
 UniversityUsers Identified by Data SourceCommunityUsers Identified by Data SourceUniversity-OnlyUsers Identified by Data SourceCommunity-OnlyUsers Identified by Data SourceBothUsers Identified by Data Source
 No. (%)No. (%)No. (%)No. (%)No. (%)No. (%)No. (%)No. (%)No. (%)No. (%)
  1. Abbreviations: CCR, California Cancer Registry; Community, community-based practices; EMR, electronic medical record; MRI, magnetic resonance imaging; PET, positron emission tomography; University, university-based practices.

  2. a

    P value <.001 for comparison between patients from university-only practices, community-only practices, and both after EMR data linkage.

  3. b

    Available from CCR only.

  4. c

    Available from EMR only.

Total8210 5770 6321 3886 1902 
Mastectomya3545 (43.2%) 2172 (37.6%) 2510 (39.7%) 1187 (30.5%) 1062 (55.8%) 
EMR: physician billing records 904 (25.5%) 821 (37.8%) 732 (29.2%) 409 (34.5%) 581 (54.7%)
EMR: facility billing records 1845 (52%) 1000 (46%) 1115 (44.4%) 499 (42%) 904 (85.1%)
CCR 3367 (95%) 2076 (95.6%) 2390 (95.2%) 1137 (95.8%) 983 (92.6%)
Unilateral mastectomyb2615 (31.9%) 1637 (28.4%) 1887 (29.9%) 935 (24.1%) 731 (38.4%) 
Bilateral mastectomyb752 (9.2%) 439 (7.6%) 503 (8%) 202 (5.2%) 252 (13.2%) 
Chemotherapy a3426 (41.7%) 2021 (35%) 2624 (41.5%) 1169 (30.1%) 897 (47.2%) 
EMR: facility billing records 133 (3.9%) 404 (20%) 114 (4.3%) 229 (19.6%) 188 (21%)
EMR: drug administration records 822 (24%) 1115 (55.2%) 659 (25.1%) 662 (56.6%) 596 (66.4%)
CCR 3235 (94.4%) 1707 (84.5%) 2468 (94.1%) 951 (81.4%) 778 (86.7%)
Radiotherapya4284 (52.2%) 2661 (46.1%) 3340 (52.8%) 1748 (45%) 1028 (54%) 
EMR: facility billing records 2022 (47.2%) 1468 (55.2%) 1653 (49.5%) 1008 (57.7%) 802 (78%)
CCR 3845 (89.8%) 2377 (89.3%) 2972 (89%) 1556 (89%) 877 (85.3%)
MRIa, c2402 (29.3%) 576 (10%) 1777 (28.1%) 414 (10.7%) 740 (38.9%) 
Diagnostic (<1 y from diagnosis)1944 (23.7%) 412 (7.1%) 1438 (22.7%) 306 (7.9%) 601 (31.5%) 
Screening (>1 y from diagnosis)930 (11.3%) 217 (3.8%) 692 (10.9%) 147 (3.8%) 299 (15.7%) 
PETa, c440 (5.4%) 296 (5.1%) 353 (5.6%) 163 (4.2%) 216 (11.4%) 
BRCA1/2 genetic testinga, c755 (9.2%) 145 (2.5%) 585 (9.3%) 101 (2.6%) 208 (10.9%) 

Figure 2. The use of breast magnetic resonance imaging is shown by year and treating institution: University, Community, and Both.

Download figure to PowerPoint


  1. Top of page
  2. Abstract

To study breast cancer care beyond the walls of a single institution, we linked state registry records with data extracted from the EMRs of 2 health care systems, one of which was community-based and one of which was university-affiliated. This 3-way data linkage generated unique insights. We found a 16% patient overlap between nearby health care systems, which enables an estimate of the magnitude of missing treatment information in single-institution studies. We discovered a striking care pattern, with Community patients increasingly likely to be treated at both institutions as their cancer prognosis worsened, and with Both patients receiving the most intensive intervention despite having intermediate cancer prognostic factors. These findings illustrate how efforts to compare outcomes across real-world settings must account for measured and unmeasured risk factors and patient preferences.

Previous studies have integrated complementary databases, supplementing SEER-derived data with treatment details from Medicare claims[27, 28] and health maintenance organizations.[29, 30] This study's novelty lies in linking data from the EMRs of nearby yet independent health care systems, anchored by data from the CCR, a SEER component. We assessed data quality by reviewing several hundred deidentified patient records and evaluating agreement between all sources; rare conflicts were adjudicated by physician review.[21, 22] The 3-way linkage identified the most informative source for each variable, with the CCR being most informative regarding treatment use, and EMRs the only source of diagnostic test data. Missing data were reduced by the 3-way linkage, with Both patients having the most data available.

We encountered limitations in extracting research data from EMRs. We extracted structured data from billing, drug ordering, and administration records, and performed simple natural language processing of diagnostic reports, but many important concepts remain buried in the unstructured paragraphs of the clinicians' notes. These include nuances of decision-making that lack representation elsewhere, notably physician recommendations and patient preferences. EMRs also promise a wealth of clinical detail that cannot be obtained from administrative databases or registries, including the images and reports of radiologic examinations and genomic sequencing tests. Some of this information can be extracted and encoded as discrete data elements (for example, Breast Imaging-Reporting and Data System [BI-RADS] scores for mammogram and breast MRI), whereas identifying the determinants of treatment choices may require advances in natural language processing. The accurate retrieval of such specific patient information from unstructured, free-text EMR notes remains an active area of research.[31, 32] Given the unique potential of EMRs to enhance the understanding of cancer outcomes, studies to optimize the clinical and research uses of EMRs should remain a high priority.[33, 34] Some limitations may be addressed through EMR changes, with structured fields facilitating data extraction; others require new data sources, including patient-reported information.[8, 35] Bridging such gaps should be a priority of emerging data integration initiatives.[36, 37] Health information technology is developing rapidly, and the decade between 2000 and 2010 witnessed the implementation of EMRs and complementary databases. EMR modules for clinical data exchange between University and Community (Care Everywhere network; Epic Systems) and between patients and physicians (Patient Portal; Epic Systems) were activated in 2012, and should enhance both clinical care and research. In the future, standardized data representation models will facilitate the interoperability of digital health data between institutions.

Patients in the Both category offer an intriguing glimpse across health care systems. This category comprised 16% of patients, disproportionately representing those with top-quintile SES and intermediate cancer prognostic factors. Without information regarding physician referrals and patient preferences, we do not know why patients accessed both systems, but the overrepresentation of sicker Community patients in the Both category suggests tertiary center consultation on challenging cases. The Both patients are remarkable for their significantly greater use of every intervention studied, including mastectomy, chemotherapy, radiotherapy, MRI, PET, and genetic testing. One explanation might be that University-only and Community-only patients actually accessed other health care systems, leading us to underestimate their test use; however, such potential underascertainment cannot explain treatment differences recorded in the CCR, which aggregates statewide cancer data comprehensively because of mandated reporting. Previous studies reported rising mastectomy rates[38-42] despite a lack of survival benefit,[4, 43, 44] and found correlations with an increase in diagnostic testing.[39, 45, 46] The high SES noted among patients in the Both category might explain their greater use of interventions that are usually considered optional, such as MRI and bilateral mastectomy,[25, 47-50] but we lack information regarding other factors that may drive use of these interventions, including family cancer history and clinical trial participation. Assessing the value added by specific interventions[51-53] will require a deeper understanding of the patient, physician, and health care factors that shape the care patterns we observed.

Integrating breast cancer data from 2 EMRs and the state registry proved feasible and informative, and broadened our understanding of care beyond what could be achieved from just one or two data sources. This approach offers insight regarding real-world treatment across health care systems, which can advance comparative effectiveness and outcomes research in oncology.


  1. Top of page
  2. Abstract

Supported by the Susan and Richard Levy Gift Fund; Regents of the University of California's California Breast Cancer Research Program (#16OB-0149); the Stanford University Developmental Research Fund; and the National Cancer Institute's Surveillance, Epidemiology, and End Results Program under contract HHSN261201000140C awarded to the Cancer Prevention Institute of California. The collection of cancer incidence data used in the current study was supported by the California Department of Health Services as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103885; the National Cancer Institute's Surveillance, Epidemiology, and End Results Program under contract HHSN261201000140C awarded to the Cancer Prevention Institute of California, contract HHSN261201000035C awarded to the University of Southern California, and contract HHSN261201000034C awarded to the Public Health Institute; and the Centers for Disease Control and Prevention's National Program of Cancer Registries, under agreement #1U58 DP000807-01 awarded to the Public Health Institute. The ideas and opinions expressed herein are those of the authors and endorsement by the University or State of California, the California Department of Health Services, the National Cancer Institute, or the Centers for Disease Control and Prevention or their contractors and subcontractors is not intended nor should be inferred.


  1. Top of page
  2. Abstract

Dr. Belkora has acted as a consultant for the Palo Alto Medical Foundation Research Institute. Dr. Blayney has received a grant from Blue Cross/Blue Shield of Michigan and honoraria and travel expenses from the American Society of Clinical Oncology, Saudi Cancer Foundation, Cancer Centers of Excellence, Physician Resource Management, Oregon Health Sciences University, and United HealthCare. He also owns stock in IBM, Oracle Corporation, and Google Inc. Dr. Luft has received a grant from the Richard and Susan Levy Foundation.


  1. Top of page
  2. Abstract
  • 1
    Favourable and unfavourable effects on long-term survival of radiotherapy for early breast cancer: an overview of the randomised trials. Early Breast Cancer Trialists' Collaborative Group. Lancet. 2000;355:17571770.
  • 2
    Early Breast Cancer Trialists' Collaborative Group (EBCTCG). Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. Lancet. 2005;365:16871717.
  • 3
    Berry DA, Cronin KA, Plevritis SK, et al. Effect of screening and adjuvant therapy on mortality from breast cancer. N Engl J Med. 2005;353:17841792.
  • 4
    Veronesi U, Cascinelli N, Mariani L, et al. Twenty-year follow-up of a randomized study comparing breast-conserving surgery with radical mastectomy for early breast cancer. N Engl J Med. 2002;347:12271232.
  • 5
    Katz SJ, Lantz PM, Janz NK, et al. Patient involvement in surgery treatment decisions for breast cancer. J Clin Oncol. 2005;23:55265533.
  • 6
    Katz SJ, Morrow M. The challenge of individualizing treatments for patients with breast cancer. JAMA. 2012;307:13791380.
  • 7
    Morrow M, Jagsi R, Alderman AK, et al. Surgeon recommendations and receipt of mastectomy for treatment of breast cancer. JAMA. 2009;302:15511556.
  • 8
    Selby JV, Beal AC, Frank L. The Patient-Centered Outcomes Research Institute (PCORI) national priorities for research and initial research agenda. JAMA. 2012;307:15831584.
  • 9
    Sox HC, Greenfield S. Comparative effectiveness research: a report from the Institute of Medicine. Ann Intern Med. 2009;151:203205.
  • 10
    VanLare JM, Conway PH, Sox HC. Five next steps for a new national program for comparative-effectiveness research. N Engl J Med. 2010;362:970973.
  • 11
    Methodology Committee of the Patient-Centered Outcomes Research Institute (PCORI). Methodological standards and patient-centeredness in comparative effectiveness research: the PCORI perspective. JAMA. 2012;307:16361640.
  • 12
    Hershman DL, Wright JD. Comparative effectiveness research in oncology methodology: observational data. J Clin Oncol. 2012;30:42154222.
  • 13
    Miriovsky BJ, Shulman LN, Abernethy AP. Importance of health information technology, electronic health records, and continuously aggregating data to comparative effectiveness research and learning health care. J Clin Oncol. 2012;30:42434248.
  • 14
    Bickell NA, McAlearney AS, Wellner J, Fei K, Franco R. Understanding the challenges of adjuvant treatment measurement and reporting in breast cancer: cancer treatment measuring and reporting. Med Care. 2013;51:e35e40.
  • 15
    Lodrigues W, Dumas J, Rao M, Lilley L, Rao R. Compliance with the commission on cancer quality of breast cancer care measures: self-evaluation advised. Breast J. 2011;17:167171.
  • 16
    California Cancer Registry. Accessed June 7, 2013.
  • 17
    Lowe HJ, Ferris TA, Hernandez PM, Weber SC. STRIDE–an integrated standards-based translational research informatics platform. AMIA Annu Symp Proc. 2009;2009:391395.
  • 18
    Hernandez PN, Podchiyska T, Weber SC, Ferris TA, Lowe, HJ. Automated mapping of pharmacy orders from two electronic health record systems to RxNorm within the STRIDE clinical data warehouse. AMIA Annu Symp Proc. 2009;2009:244248.
  • 19
    National Cancer Institute Enterprise Vocabulary Services. Accessed June 7, 2013.
  • 20
    Yost K, Perkins C, Cohen R, Morris C, Wright W. Socioeconomic status and breast cancer incidence in California for different race/ethnic groups. Cancer Causes Control. 2001;12:703711.
  • 21
    Weber SC, Lowe H, Das A, Ferris T. A simple heuristic for blindfolded record linkage. J Am Med Inform Assoc. 2012;19:e157e161.
  • 22
    Weber SC, Seto T, Olson C, Kenkare P, Kurian AW, Das AK. Oncoshare: lessons learned from building an integrated multi-institutional database for comparative effectiveness research. AMIA Annu Symp Proc. 2012;2012:970978.
  • 23
    Carey LA, Perou CM, Livasy CA, et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. JAMA. 2006;295:24922502.
  • 24
    Polychemotherapy for early breast cancer: an overview of the randomised trials. Early Breast Cancer Trialists' Collaborative Group. Lancet. 1998;352:930942.
  • 25
    Carlson RW, Allred DC, Anderson BO, et al;National Comprehensive Cancer Network. Invasive breast cancer. J Natl Compr Canc Netw. 2011;9:136222.
  • 26
    Darby S, McGale P, Correa C, et al. Effect of radiotherapy after breast-conserving surgery on 10-year recurrence and 15-year breast cancer death: meta-analysis of individual patient data for 10,801 women in 17 randomised trials. Lancet. 2011;378:17071716.
  • 27
    Guadagnolo BA, Liao KP, Elting L, Giordano S, Buchholz TA, Shih YC. Use of radiation therapy in the last 30 days of life among a large population-based cohort of elderly patients in the United States. J Clin Oncol. 2013;31:8087.
  • 28
    Snyder CF, Frick KD, Herbert RJ, et al. Quality of care for comorbid conditions during the transition to survivorship: differences between cancer survivors and noncancer controls. J Clin Oncol. 2013;31:11401148.
  • 29
    Hershman DL, Kushi LH, Shao T, et al. Early discontinuation and nonadherence to adjuvant hormonal therapy in a cohort of 8,769 early-stage breast cancer patients. J Clin Oncol. 2010;28:41204128.
  • 30
    Kurian AW, Lichtensztajn DY, Keegan TH, et al. Patterns and predictors of breast cancer chemotherapy use in Kaiser Permanente Northern California, 2004–2007. Breast Cancer Res Treat. 2013;137:247260.
  • 31
    Edinger T, Cohen AM, Bedrick S, Ambert K, Hersh W. Barriers to retrieving patient information from electronic health record data: failure analysis from the TREC Medical Records Track. AMIA Annu Symp Proc. 2012;2012:180188.
  • 32
    Ohno-Machado L. Realizing the full potential of electronic health records: the role of natural language processing. J Am Med Inform Assoc. 2011;18:539.
  • 33
    Hersh WR, Weiner MG, Embi PJ, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013;51(8 suppl 3):S30S37.
  • 34
    Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20:144151.
  • 35
    Basch E, Abernethy AP, Mullins CD, et al. Recommendations for incorporating patient-reported outcomes into clinical comparative effectiveness research in adult oncology. J Clin Oncol. 2012;30:42494255.
  • 36
    Stewart AK, McNamara E, Gay EG, Banasiak J, Winchester DP. The Rapid Quality Reporting System–a new quality of care tool for CoC-accredited cancer programs. J Registry Manag. 2011;38:6163.
  • 37
    Kurian AW, Edge SB. Information technology interventions to improve cancer care quality: a report from the American Society of Clinical Oncology Quality Care Symposium. J Oncol Pract. 2013;9:142144.
  • 38
    Gomez SL, Lichtensztajn D, Kurian AW, et al. Increasing mastectomy rates for early-stage breast cancer? Population-based trends from California. J Clin Oncol. 2010;28:e155e157.
  • 39
    Katipamula R, Degnim AC, Hoskin T, et al. Trends in mastectomy rates at the Mayo Clinic Rochester: effect of surgical year and preoperative magnetic resonance imaging. J Clin Oncol. 2009;27:40824088.
  • 40
    Tuttle TM, Habermann EB, Grund EH, Morris TJ, Virnig BA. Increasing use of contralateral prophylactic mastectomy for breast cancer patients: a trend toward more aggressive surgical treatment. J Clin Oncol. 2007;25:52035209.
  • 41
    Tuttle TM, Jarosek S, Habermann EB, et al. Increasing rates of contralateral prophylactic mastectomy among patients with ductal carcinoma in situ. J Clin Oncol. 2009;27:13621367.
  • 42
    Collins ED, Moore CP, Clay KF, et al. Can women with early-stage breast cancer make an informed decision for mastectomy? J Clin Oncol. 2009;27:519525.
  • 43
    Hwang ES, Lichtensztajn DY, Gomez SL, Fowble B, Clarke CA. Survival after lumpectomy and mastectomy for early stage invasive breast cancer: the effect of age and hormone receptor status. Cancer. 2013;119:14021411.
  • 44
    Fisher B, Anderson S, Bryant J, et al. Twenty-year follow-up of a randomized trial comparing total mastectomy, lumpectomy, and lumpectomy plus irradiation for the treatment of invasive breast cancer. N Engl J Med. 2002;347:12331241.
  • 45
    Tuttle TM. Magnetic resonance imaging and contralateral prophylactic mastectomy: the “no mas” effect? Ann Surg Oncol. 2009;16:14611462.
  • 46
    King TA, Sakr R, Patil S, et al. Clinical management factors contribute to the decision for contralateral prophylactic mastectomy. J Clin Oncol. 2011;29:21582164.
  • 47
    Bedrosian I, Hu CY, Chang GJ. Population-based study of contralateral prophylactic mastectomy and survival outcomes of breast cancer patients. J Natl Cancer Inst. 2010;102:401409.
  • 48
    Daly MB, Axilbund JE, Buys S, et al;National Comprehensive Cancer Network. Genetic/familial high-risk assessment: breast and ovarian. J Natl Compr Canc Netw. 2010;8:562594.
  • 49
    Mainiero MB, Lourenco A, Mahoney MC, et al. ACR Appropriateness Criteria Breast Cancer Screening. J Am Coll Radiol. 2013;10:1114.
  • 50
    Saslow D, Boetes C, Burke W, et al;American Cancer Society Breast Cancer Advisory Group. American Cancer Society guidelines for breast screening with MRI as an adjunct to mammography. CA Cancer J Clin. 2007;57:7589.
  • 51
    Blayney DW, McNiff K, Hanauer D, Miela G, Markstrom D, Neuss M. Implementation of the Quality Oncology Practice Initiative at a university comprehensive cancer center. J Clin Oncol. 2009;27:38023807.
  • 52
    Neuss MN, Desch CE, McNiff KK, et al. A process for measuring the quality of cancer care: the Quality Oncology Practice Initiative. J Clin Oncol. 2005;23:62336239.
  • 53
    Schnipper LE, Smith TJ, Raghavan D, et al. American Society of Clinical Oncology identifies five key opportunities to improve care and reduce costs: the top five list for oncology. J Clin Oncol. 2012;30:17151724.