• Open Access

Reliability of collecting colorectal cancer stage information from pathology reports and general practitioners in Queensland


Correspondence to:
Dr Peter Baade, Epidemiology Unit — Viertel Centre for Research in Cancer Control, The Cancer Council Queensland, PO Box 201, Spring Hill, Queensland 4004. Fax: 07 3258 2310; e-mail: Peterbaade@cancerqld.org.au


Objective: To investigate the reliability of collecting colorectal stage information from pathology reports and general practitioners in Queensland, Australia.

Methods: A longitudinal study of colorectal cancer survivors conducted in 2003 and 2004 (n=1966, response rate=57%) obtained stage information from clinical specialists (n=1334), general practitioners (GP) (n=1417) and by extracting stage from pathology reports (n=1484). Reliability of stage information was determined by comparing stage from GPs and pathology reports with that reported by the clinical specialists, using a weighted kappa.

Results: GPs and pathology reports each had a similar level of agreement with clinical specialists, with kappa scores of 0.77 (0.75-0.80) (n=1042) and 0.78 (0.75-0.81) (n=1152), respectively. Results were similar when restricting to records staged by all three methods (n=847). GPs had similar levels of agreement with clinical specialists within each stage, although pathology reports tended to under-stage patients in Stage D (0.37). Collapsing stage into two categories (A or B, C or D) increased the reliability estimates from the pathology reports to 0.91 (0.88-0.93), but there was little change in GP estimates 0.79 (0.75-0.83).

Conclusions: Extractions from pathology reports are a valid source of broad stage information for colorectal cancer.

Implications: In the absence of clinical stage data, access to pathology records by population-based cancer registries enables a more accurate assessment of survival inequalities in colorectal cancer survival.

Interpretation of differences in cancer survival between population subgroups or over time requires accurate information on cancer stage.1–3 However, clinical stage is not routinely collected by population-based cancer registries in Australia or overseas. Some population-based registries, such as the SEER program,4 EUROCARE5 and the New South Wales cancer registry in Australia,6 report a summary measure of stage that is based on information from pathology reports or medical records, however error rates between 12 and 35% for prostate, lung and breast cancers have been reported.7–10

Collecting information on cancer stage using resource-intensive methods such as reviews of medical charts is not practical for large-scale studies or population-based registries. GPs, as the gatekeepers to more specialised cancer services, are a potential source of information about cancer stage, as are pathology reports. A Western Australian study suggested that colorectal cancer is probably the cancer most amenable to obtaining stage from pathology reports, with the limitation that information on metastasis is often lacking.11

A longitudinal study of colorectal cancer survivors12 collected stage information from pathology reports, general practitioners and clinical specialists in Queensland. This paper reports on the reliability of this stage information by comparing stage from pathology reports and general practitioners with that of clinical specialists.


Data were collected as part of the Colorectal Cancer and Quality of Life Study, a population-based, longitudinal study of the predictors of quality of life up to five years after diagnosis. Full details are described elsewhere.12 Briefly, all eligible cases of colorectal cancer were identified through the Queensland Cancer Registry. Study participants had a first, histologically confirmed, primary diagnosis of colorectal cancer between 1 January 2003 and 31 December 2004, and were aged between 20 and 80 years at diagnosis. Of 3,426 eligible participants, 1,966 (57.4%) completed a baseline telephone interview. The University of Queensland's Behavioural and Social Science Ethical Review Committee approved the study's procedures.

Stage information from doctors

During the telephone interview, participants were asked to name their treating doctor(s), including their GP, surgeon and medical oncologist where applicable. Questionnaires were mailed to these doctors approximately 12 months after the participant's diagnosis, requesting a range of clinical information including the stage at diagnosis (based on the Australian Clinico-Pathological Staging (ACPS) system). The ACPS system is essentially an extension of the traditional Dukes staging method, and is based on a small number of key variables: direct spread, lymph node metastases, and known residual tumour.13 Reminder letters were sent at six and 12 weeks. If the surgeon or medical oncologist preferred, study personnel extracted the required information from medical records, including letters of correspondence, outpatient notes, operative reports, chemotherapy and radiation therapy charts, at the treating hospital or the clinician's rooms. In these record reviews, stage was recorded only if it was stated explicitly in the record. No effort was made to interpret stage from the clinical information provided. When stage information differed between surgeons and medical oncologists, the data from surgeons was used.

Stage information from pathology reports

A research officer trained in the interpretation of colorectal cancer pathology reports extracted stage information from pathology reports held by the Queensland Cancer Registry. Documentation pertaining to the time up to two months post diagnosis was reviewed. Tumour information was extracted from the pathology report, using the ‘Tumour, Nodes, Metastasis’ (TNM) staging system.14 In the absence of information on metastases (MX), it was assumed that there were no metastases (MO). This method has been used previously in the extraction of stage from pathology reports.11 The TNM was then classified according to Australian Clinico-Pathological Staging. The research officer did not attempt to stage if there was any ambiguity regarding the level of invasion of the tumour from the information on the pathology report. The records for 50 participants were subsequently staged by a second reviewer to assess inter-rater reliability.

Statistical Analyses

Other studies have suggested that stage data from clinical records are generally considered to be the best available.10,15 These analyses focused on the reliability of stage information obtained from pathology reports and GPs compared to clinical specialists.

Reliability was determined by comparing reported stage from each source to that reported by the clinical specialist, using a weighted Kappa. The weighted Kappa calculates the chance corrected agreement between reporters, and takes into account the magnitude of the differences in agreement. It has been suggested that Kappa estimates between 0.4-0.6 represent ‘moderate agreement’, 0.61-0.8 ‘substantial agreement’ and 0.81-1.00 ‘almost perfect agreement’.16 Confidence intervals for the Kappa estimates were calculated using the bootstrap method (1000 repetitions). Kappa estimates were also stratified by (clinical specialist-defined) stage to determine the consistency of agreement. Per cent agreement was calculated to assist interpretation.

Pairwise comparisons with the stage reported by clinical specialists were carried out separately for stage reported by GPs and stage extracted from the pathology reports. In these comparisons, records with missing stage information were deleted. Since these two comparisons were then based on different (albeit overlapping) samples, we also conducted pairwise comparisons on those records that had non-missing stage information from all three collection methods.

We assessed the impact that collapsing stage into broader categories had on the measured levels of reliability. Five year survival for Stage A is approximately 88%; 70% for Stage B; 43% for Stage C; 7% for Stage D.17 This supported the collapsing of stage into ‘Local/Locally advanced’ (Stage A and B) versus ‘Regional/Distant’ (Stage C and D).

The current Australian treatment guidelines for colorectal cancer recommend different treatment based on the stage of the tumour.18 Those with Stage A are typically treated with surgery alone. Patients with Stage C and D are recommended to have chemotherapy. There is some ambiguity regarding treatment for patients with Stage B depending on the prognosis and site of the tumour. This supports the grouping of stage into likely treatment groups of ‘surgery only’ (Localised — Stage A), ‘possible chemotherapy’ (Non-localised — Stage B) and ‘at least chemotherapy’ (Regional and Distant — Stages C and D).


Reporting of stage

Approximately 94% of participants had a report of stage available from at least one of the three data sources. There were 1,950 participants (99.2%) who reported having been treated by a clinical specialist, and of these, stage information was obtained for 1,417 (72.6%). This combined clinical stage information was based on surgical stage for 95.5% of patients, and was obtained from medical oncologists for the remaining 4.5%. When stage information was obtained from both surgeons and medical oncologists (n=427), agreement was very high (92% concordance). Nearly all (97.7%) participants reported having a regular GP, and for 1,332 (69%) of these participants we obtained stage information from the GP.

Since an eligibility requirement for this study was to have histologically confirmed colorectal cancer, all respondents had a pathology report available in the Queensland Cancer Registry. There was sufficient information to extract stage from the pathology record for 1,480 (75.3%) of the participants in the study. Of the 50 pathology records that were coded separately by two researchers, only three were different (94% agreement).

There were 1,152 records (58.5% of the total sample) available for the pairwise comparisons between clinical specialists and extractions from pathology reports, while 1,042 records (53.0%) were used for the pairwise comparisons between clinical specialists and general practitioners. A total of 847 records (43.1% of sample) had stage information from all three sources.

There were no apparent differences in the sex (χ2=0.01, p=0.934) or age (t=−0.48, p=0.631) distribution of respondents who were not included in the pairwise comparisons between clinical specialists and pathology report extractions (n=814) compared to those who were (n=1,152). Similarly, no differences were observed for the comparisons between general practitioners and clinical specialists by sex (χ2=0.34, p=0.561) or age (t=1.15, p=0.251). For both pairwise comparisons respondents were more likely to be included if they lived outside the south-east Queensland corner (pathology reports and clinical specialists: χ2=9.63, p=0.022; general practitioners and clinical specialists: χ2=27.07, p<0.001) and if they lived in more socioeconomically disadvantaged areas (pathology reports and clinical specialists: χ2=11.16, p=0.025; general practitioners and clinical specialists: χ2=18.27, p=0.001). However, neither remoteness (χ2=6.41, p=0.699) nor area of socioeconomic disadvantage (χ2=5.90, p=0.921) was significantly associated with the stage as reported by clinical specialists (n=1417).

Reliability of stage reporting

The level of agreement between the pathology reports and clinical specialists was similar to the agreement between general practitioners and clinical specialists (Table 1). Both comparisons suggested about 82% of records within the pairwise comparisons had identical stage, with weighted Kappa of nearly 0.80. There was no evidence of any bias caused by the different (yet overlapping) samples used for the pairwise comparisons, with levels of agreement very similar when limiting the comparisons to records with non-missing data for all three data sources.

Table 1.  Comparisons between stage as provided by general practitioners and extracted from pathology reports, using the stage from clinical specialists as the comparison group.
Type of comparisonStatistic Stage groups 
  A, B, C, DA, B, C/DA/B, C/D
  1. Notes:

  2. (a) Weighted Kappa — chance corrected agreement between reporters, taking into account the magnitude of the differences in agreement (include 95% confidence interval)

  3. (b) % Agreement — proportion of exact stage agreement

  4. (c) All comparisons — when stage information available for both groups involved in the comparisons (eg. pathology extraction and clinical specialist).

  5. (d) All three groups — when stage information available for all three groups of pathology extraction, GP and clinical specialist.

Pathology reports
All comparisons (N=1,152)
 W Kappa0.78 (0.75-0.81)0.83 (0.80-0.86)0.90 (0.88-0.93)
 % Agr81.886.795.2
All three groups (n=847)
 Kappa0.79 (0.76-0.82)0.84 (0.81-0.87)0.91 (0.88-0.94)
 % Agr82.287.395.5
General practitioners
All comparisons (N=1,042)
 Kappa0.77 (0.75-0.80)0.78 (0.75-0.81)0.79 (0.75-0.83)
 % Agr79.384.189.5
All three groups (n=847)
 Kappa0.77 (0.74-0.80)0.78 (0.74-0.81)0.78 (0.73-0.82)
 % Agr80.483.789.1

The level of agreement varied by stage. Of those cancers staged as ‘A’ by the clinical specialist, over 90% had the same stage extracted from the pathology reports (K=0.73; Table 2). The per cent agreement was slightly lower for cancers staged by the clinician as B (79%) with nearly 20% of these cancers coded instead as Stage A from the pathology reports. However, the kappa estimate was similar (K=0.74) to that for Stage A. Agreement with Stage C cancers was the highest of all stages (92%, K=0.82). Agreement with Stage D cancers was lowest (27%; K=0.38), with the majority (54%) of clinical Stage D cancers being coded instead as Stage C from the pathology reports (Table 2).

Table 2.  Stratification of percent agreement and Kappa by stage, comparing to stage as determined by clinical specialists.
 Clinical Specialist Stage
  1. Notes:

  2. (a) Kappa — chance corrected agreement between reporters on each stage

  3. (b) % Agreement — proportion of exact agreement on each stage

Pathology reports (n = 1152)
General practitioner (n = 1,042)

In contrast, the level of agreement between clinician and GPs was generally consistent across all stages (Table 2), with agreement ranging from 75% to 85% (K= 0.68-0.77) for the four stages.

The level of agreement between GPs and clinical specialists changed only slightly regardless of how the stage categories were collapsed (Table 1). In contrast, the increased agreement between pathology-reported stage and clinician stage for “A, B, C/D” reflects the large proportion of clinician Stage D that were coded as Stage C from pathology reports (Table 2). The even higher agreement for “A/B, C/D” (Table 1) reflects the smaller proportion of clinician Stage B that was coded as Stage A from the pathology forms. The sensitivity and specificity estimates (with specialist stage as the “true stage”) for this dichotomised stage were 91.0% and 98.2% for pathology stage, and 88.3% and 90.5% for general practitioner stage.


This study found that the stage information extracted from pathology reports had about 80% agreement with the stage obtained from clinical specialists, and that this agreement increased to 95% when collapsing stage to A/B and C/D. A similar level of agreement between GPs and clinical specialists was observed, however collapsing stage had little impact on the levels of agreement.

When comparing the pathology-based stage with that of clinical specialists, the most obvious difference was the under-reporting of Stage D tumours when relying on pathology reports. This has also been found for colorectal cancer stage in other Australian states (New South Wales6 and Western Australia19). The lower accuracy for more advanced cancers is due in large part to the lack of information about metastases provided on the pathology report.11 Only 14% of histology reports on colorectal cancer tumours in the Western Australian Cancer Registry contained all of the necessary information to be fully staged.19 Similarly, a study conducted in New Zealand found that less than 4% of pathology reports unequivocally reported the presence of metastasis.20

In contrast to the under-reporting advanced cancers of pathology-based stage, the accuracy of GPs compared to clinical stage were consistent across all four categories. Any disagreement between GPs and specialist clinicians could reflect GPs having access to less detailed clinical information and increased likelihood of losing contact with the patient during the 12 months after diagnosis.

Our study suggests that if four-level stage is required for research purposes, then there would be a misclassification of stage of at least 20% using pathology records or GPs. The results for pathology extractions are similar to those reported elsewhere for colorectal cancer6 and other cancers.7–9,21 Using simulation analyses, Yu and colleagues6 suggested that this imprecision could make previously significant area-specific variation non-significant.

However, it is possible that a broader measure of stage could be useful when used to adjust for disease spread in population-based studies. The very high agreement between pathology extractions and clinical specialists when the stage categories are collapsed suggests a high degree of accuracy when using the collapsed categories of ‘localised’, ‘locally advanced’ and ‘regional/distant’ and further into ‘localised/locally advanced’ and ‘regional/distant’. This increase in agreement for the collapsed categories is not simply due to fewer categories; the same process for GPs resulted in very little change in the agreement.

Missing data

There was insufficient information to extract stage from approximately one-quarter (24.7%) of the pathology reports. This proportion of missing data is considerably higher than the 14% reported from the SEER registries22 and 18% in New South Wales.6 In this study, the pathology extracts were obtained by the research officer as soon as possible after diagnosis, to facilitate prompt recruitment for the main study. If additional relevant information arrived in the cancer registry following recruitment of a patient, then this subsequent information was not used in this study. A subsequent review of 50 randomly selected cases that had missing data in the main study noted that 18 (36%) of these were now able to be staged. Assuming this proportion held across the remaining missing records, it could reduce the per cent of missing pathology stage to about 15%, within the range of other similar studies.

We did not include missing data in the reliability calculations. This has particular relevance when stage information was available from clinical specialists, but missing for the pathology reports or general practitioners. Excluding these records may have over-estimated the reliability estimates.


Estimates of agreement with clinical specialists for GPs and pathology reports were based on pairwise comparisons using different yet overlapping samples. Although this could potentially bias the comparisons between the groups, when we repeated the analysis using only those records with non-missing data from all three sources very similar results were observed.

Only those people who agreed to take part in the initial study were eligible to be staged. The relatively low response rate (57%), and the under-representation of older (70 to 80 years) colorectal cancer survivors, those with rectal cancer and those with more advanced disease from the initial study12 may have implications for the generalisability of these results. In particular the lower proportion of patients with less advanced disease may have spuriously inflated our reliability estimates. However, our findings were consistent with those previously reported,11 in that stage information sourced from pathology reports has least reliability when metastasis is involved, and this limitation needs to be considered when sourcing stage information from pathology reports.


The ability to accurately interpret inequalities in cancer survival between population subgroups or changes over time requires accurate information on cancer stage.1–3 Since clinical stage information is not being collected in Australian population-based cancer registries, the uncertainty of whether observed inequalities in colorectal cancer survival23–25 are due to differentials in diagnosis patterns, treatment practices, or a combination of both, limits our capacity to intervene to reduce these inequalities for colorectal cancer patients in this country.

This study, the first of its type in Queensland has demonstrated that it is possible, and feasible, to reliably collect a broad measure of disease spread for colorectal cancer patients from pathology reports routinely submitted to the state cancer registry, and that pathology reports are a better alternative than sourcing stage information from the primary gatekeepers of medical care, general practitioners.

Although this method of assessing disease spread cannot take the place of clinical stage information, the limited progress in developing population-based clinical cancer registries in Australia means that, at least for the immediate future, sourcing stage information for colorectal cancer patients from pathology reports may be the best alternative. The results from this study demonstrate that pathology-sourced stage is a valid measure of disease spread when compared against stage information obtained from clinical specialists.


This method of collecting stage requires a lower level of resources than other potential methods of collecting cancer stage such as chart reviews, making it feasible to gather this information for large-scale population-based studies on colorectal cancer, and increase our ability to correctly interpret reasons for observed inequalities in colorectal cancer survival. However, that there are still differences between pathology stage and clinical stage, particularly when metastasis is involved, highlights the importance of integrating clinical stage information into Australia's population-based cancer registries to improve cancer control in this country.


Funding for this study was provided by The Cancer Council Queensland. The authors acknowledge Mrs Lyn McPherson and Mrs Heather Day for their contribution to the data collection.