Do doctors who order more routine medical tests diagnose more cancers? A population‐based study from Ontario Canada

Abstract Background The overuse of medical tests leads to higher costs, wasting of resources, and the potential for overdiagnosis of disease. This study was designed to determine whether the patients of family doctors who order more routine medical tests are diagnosed with more cancers. Method A retrospective population‐based cross‐sectional study using administrative health care data in Ontario Canada. We investigated the ordering of 23 routine laboratories and imaging tests 2008‐20012 by 6849 Ontario family physicians on their 4.9 million rostered adult patients. We compared physicians’ test utilization and calculated case‐mix adjusted observed to expected (O:E) utilization ratios to categorize physicians as Typical, Higher or Lower testers. Age‐sex standardized rates (cases/10 000 patient years) and Rate Ratios were determined for cancers of the thyroid, prostate, breast, lymphoma, kidney, melanoma, uterus, ovary, lung, esophagus, and pancreas for each tester group. Results There was wide variation in the use of the 23 tests by Ontario physicians. 26% and 24% of physicians were deemed Higher Testers for laboratory and imaging tests, while 41% and 38% were Typical Testers. The patients of higher test users were diagnosed with more cancers of thyroid (laboratory [RR 1.61, 95% CI 1.39‐1.87] and imaging [RR 2.08, 95% CI 0.88‐2.30]) and prostate (laboratory [RR 1.10, 95% CI 1.03‐1.18] and imaging [RR 1.05, 95% CI 1.00‐1.10]). Conclusion There is a wide variation in the ordering of routine and common medical tests among Ontario family doctors. The patients of higher testers were diagnosed with more thyroid and prostate cancers.

factors including the media, the public, and many physicians believe that more tests are better as they might uncover treatable disease 11,12 ; in clinical practice, physicians order routine tests or panels of routine tests for many reasons including defensive medicine. 13 The indiscriminant or inappropriate use of routine tests will uncover subclinical malignant and nonmalignant disease in the general population but the term OverDiagnosis specifically refers to "disease that ultimately will not cause symptoms or early death." 12 In oncology, this refers to the identification of small, asymptomatic, or undetectable cancers that may never become symptomatic or life-threatening. 14,15 Routine screening with tests such as prostate-specific antigen, mammography, neck ultrasound, and computed tomography chest are all known to be associated with the OverDiagnosis of prostate, 16 breast, 17 thyroid, 18,19 and lung 20 cancers, respectively. Other cancers that have been implicated in the OverDiagnosis story include uterus, 21 kidney, 15 melanoma, 22 and esophagus 23 all of which can be uncovered by specific tests that can be ordered or performed with minimal indications.
The objective of the study was to determine whether variations in the use of routine laboratory and imaging testing by physicians were associated with variations in the rates of cancer detection in their patients. The universal health insurance program and the availability of linked health care data on all patients at the Institute of Clinical Evaluative Sciences in the province of Ontario, Canada provides the opportunity to answer this question.

| Study population
The patient population included all adults age 40-75 as of 1/1/2008 excluding 7773 women who gave birth during F I G U R E 1 Flowchart of study population study period (as they would have had more tests) and patients without health care coverage ( Figure 1). Patients who became ineligible for health care coverage after 2008 (109 271) were included until ineligibility. Approximately 3% of the population is not covered by the provincially funded universal health care (OHIP) and we have no access to information on them. These include transients, tourists, and those covered by Federal health insurance including active members of armed forces, indigenous persons living on reserves, inmates of federal prisons and some refugees. In Ontario, Canada, most patients register or "sign on" to a single family physician in a group of family physicians who are then paid by the government by a mix of capitation payment, fee-for-service, and incentive fees. Although rostered patients can and do go to the ER and to "walk-in" clinics, they usually are seen by their registering family doctor or another provider in the same group. They do not attend other family doctors groups. We identified the Usual Providers of Care (UPCs) caring for the study population 2008-2012. Physicians without complete information in the ICES Physician Database (age, sex, practice type), those with small practices (<200 rostered patients) (3996), and those who were not primarily involved with primary care (9607) were excluded. Also, excluded were 263 478 patients who were not rostered to a study UPC or rostered at all as we wanted to look at the tests and cancers of patients who were going to their regular doctor if they had one.

| UPC characteristics
Age, sex, years in practice and practice type (in 2008).

| Study cancers and patient cancer rates
The outcome of interest was a new diagnosis of cancer. Based on the potential for OverDiagnosis, 11 cancers were selected including thyroid and prostate (definite) and breast, Non-Hodgkin's Lymphoma, kidney, melanoma, uterus (possible). Cancers of ovary, lung, esophagus, and pancreas were included as controls. We assumed that almost all the aggressive cancers would progress within our time frame, would become clinically apparent and therefore be diagnosed at the same rates regardless of rates of medical tests. We calculated age-sex standardized cancer incidence rates for each cancer (cases/10 000 patient years).

| Test selection and utilization
A panel of common laboratory and imaging tests was developed based on meetings with 5 local family physicians (academic and community-based) and a general internist. Tests had to be common, routinely ordered, potentially overused as screening tests and span a wide variety of clinical indications. Cancer-specific tests with either formal cancer screening programs (fecal occult blood, mammography) or informal cancer screening (prostate-specific antigen, Papanicolaou test) were excluded. A total of 27 tests were selected (see Table 3). This study was about the impact of the variation in the rates of use of tests and therefore to establish a relationship between the rates of test use and rates of cancer diagnoses, tests had to demonstrate variation in use. To compare test rates, we first calculated the total # of each test performed on the population of each subLHIN and then age and sex standardized each subLHIN's test rates (# of tests/10 000 person-years) to the entire study population. We assessed test rate variation by comparing test rates across the small geographic subLHINs using the Systematic Component of Variation (SCV) 30,31 for each test for each subLHIN. According to Appleby et al, 30 the SCV is the appropriate measurement of variation for research in this setting, SCVs greater than 3 are likely to be due largely to differences in practice style or medical discretion, SCVs up to 10 are considered high variation and SCVs >10 very high variation.

| Physician test utilization
We calculated observed-to-expected (O/E) ratios for each UPC for each test using indirect standardization, using the entire study population as the standard population. Indirect standardization was used as we were investigating variations in test utilization across physicians relative to the whole study population; indirect standardization allowed us to explore these variations while removing the confounding effect of patient age and sex. Observed (actual) counts were the # of tests in study period of those patients in the practice of each UPC for their patient years. The expected count was the # tests a UPC might order based on his/hers case mix if his/her test utilization was identical to that observed in the entire study population. This count was done by initially calculating a rate for each of 14 age/sex strata (ie, male 40-44, female 70-75 etc) across Ontario for each test for the total patient years in each strata. An expected count was estimated for each strata, with the total sum being the expected count for each UPC for each test. To create composite O/E ratios for all the laboratory tests and all the imaging tests for each UPC, all of their patients observed 13 laboratory test and 10 imaging test counts were summed. Then, the expected counts for the total laboratory and imaging tests for all those patients were generated using indirect patient age-sex standardization for each UPC. The combined test O/E ratios are based on the ratio of the combined total observed and total expected test counts (13 laboratory tests or 10 imaging test).

| Physician testers
To assess over and underuse of tests by UPCs, we created 6 levels of physician testers based on the O/E ratios of all the UPCs for the combined laboratory and for the combined imaging tests. Six levels were the smallest number of groups that could provide estimates for lower and typical testers as well as the opportunity to assess dose-response.. Typical testers were defined as an O/E ratio of 0.75-1.25. We selected <0.5 for strong lower testers, 0.5-0.75 for mild lower testers, 1.25-1.5 for mild higher testers, 1.5-2.0 for moderate higher testers and >2.0 for strong higher testers. The thresholds for all tester groups were defined a priori and were based on our assumption that a 25% and especially a 50% increase or decrease in the ordering of routine tests would be of clinical significance.

| UPC testers vs cancer incidence
The relationship between tester groups and cancer incidence rates was evaluated by examining age-sex standardized incidence rates and risk ratios.

| Statistical analysis
All statistical tests were two-sided with significance of 0.05. Poisson regression models were used to control for effects of

| Study population and cancers
The study populations included 4 923 765 residents of Ontario and 6849 UPCs (Table 1). There was no difference in patient age or comorbidity across the tester groups. Males, rural patients, and lower SES tended toward few tests. At least one of the 11 study cancers was diagnosed in 139 248 patients during 2008-2012 (Table 2). Prostate, breast, and lung had the greatest number of new cases and the highest rates of new cases. Ovary, pancreas, and esophagus were the least common new diagnoses.

| Test utilization
There were large differences in overall rates (tests/100 patient years) and in the variations in the rates for the selected tests across the 97 subLHINs (data not included). Chest X-ray and Abdominal Ultrasound had the highest median rates and Limb CT, Neck Ultrasound and Spine CT had the highest variations in rates for imaging tests. Serum cholesterol/triglycerides, electrolytes, and Glutamate Pyruvate Transaminase had the highest median rates, and Ferritin, Vitamin B12, and Alkaline Phosphatase the highest variation in rates for the laboratory group. We rejected 4 tests with SVC less than 3 (X-ray of chest, foot and knee, and pelvic CT scans). A total of 22 tests with SCV range between 5.9 and 38.9 were selected for evaluation. Abdominal Ultrasound (SCV = 2.4) was retained as it was included in previous work. 18

| UPC tester groups
The distribution of the UPC tester groups (Strong Lower Testers, Mild Lower Testers, Typical Testers, Mild Higher Testers, Moderate Higher Testers, Strong Higher Testers) based on the O/E ratios for both the laboratory and imaging test groups are presented on Figure 2. 26% and 24% of the UPCs were higher testers for laboratory and imaging, respectively. 33% and 38% of the UPCs were lower testers for laboratory and imaging tests. Overall, UPCs in the Typical Tester group ordered an average of 18,688 laboratory and 669 imaging tests in 2008-2012. The Higher Testers ordered on average 59% (29,800) more laboratory and 80% (1,207) more imaging tests than the Typical Testers. The Lower Testers ordered on average 61% (7,259) fewer laboratory and 57% fewer imaging tests. The distributions for 3 Higher Tester groups for both laboratory and imaging tests were stable over time comparing the cohorts of 1/1/08-31/5/10 to 1/6/10 to 31/12/12 (data not included).

| Cancer rates by UPC tester groups
The age-sex standardized rates (cases/10 000 patient years) for our study population of select cancers by the UPC tester groups are presented on Figure 3. Only thyroid cancer had significantly higher cancer incidence rates in all higher testers (average rates of 4.2/10 000 patient years and 4.4/10 000 patient years among all higher testers of laboratory and imaging tests vs. overall rate of 3.2/10 000 patient years) and lower cancer incidence rates in all lower testers (average rates of 2.7/10 000 patient years and 2.7/10 000 patient years among all lower testers of laboratory and imaging tests vs. overall rate of 3.2/10 000 patient years) for both the laboratory and imaging groups of tests ( Figure 3A,B). Prostate cancer demonstrated increased incidence rates with some of higher tester groups for both laboratory and imaging (rates of 32.7/10 000 patient years among strong higher tester of laboratory tests and 31.3/10 000 patient years among moderate higher tester of imaging tests vs. overall rate of 29.6/10 000 patient years) and decreased incidence rates with some of the lower tester laboratory and imaging groups (rates of 28.5/10 000 patient years among moderate lower tester of laboratory tests and 28.6/10 000 patient years among strong lower tester of imaging tests vs. overall rate of 29.6/10 000 patient years) ( Figure 3C,D). Pancreas cancer had higher incidence in the moderate higher imaging testers group of UPCs ( Figure  3F). Of the remaining 8 study cancers (breast, ovary, lung, esophagus, uterus, kidney, melanoma, and NHL), none had statistically significant increases in rates with increasing testing noting that breast, uterus, kidney ( Figure 3E) and Non-Hodgkins Lymphoma demonstrated nonstatistically

Lab tests
Imaging tests significant increases in rates with some higher imaging user groups.

| Rate Ratios (RRs) of a cancer diagnosis by UPC tester groups
The cancer incidence RRs for UPC test groups when controlling for patient factors (age, sex, and comorbidity) and physician factors (age and sex) are presented for select cancers on Figure 4. The overall results are reported in Table  S1. The reference group is the Typical Testers. Similar to the results of rates on Figure 3, thyroid ( Figure 4A,B) was the only cancer to be diagnosed more by all higher testers and less by all lower testers for both laboratory and imaging tests. The only other cancers to have significant increases with increasing use of tests were prostate ( Figure 4C) and pancreas cancers ( Figure 4E). Of the remaining 8 study cancers (breast, ovary, lung, esophagus, uterus, kidney, melanoma, and NHL), none had statistically significant risk (ie, OR = 1.0) with increasing testing noting that breast, uterus and Non-Hodgkins Lymphoma (Figure 4D), demonstrated nonsignificant increases with more tests. Adjustment for additional physician factors (years in practice) or patient factors (income, rurality, and deprivation) or using sexstratified analyses changed the levels of significance but not the significant results.

| DISCUSSION
The objective of this study was to determine whether variations in the overall rates of ordering routine laboratory and imaging testing by physicians were associated with variations in the rates of cancer detection in their patients. Our method included a variety of selected tests noting that patients will have had these tests performed for legitimate reasons, for unrelated but important reasons, and for no reason aside from screening. There is no information on what the correct, ideal, or appropriate rates of our selected tests might be across a population and we based our classification of testers on the test rates of the average or Typical Testers. Our results are not meant to reflect ideal testing physicians or ideal treating physicians. We found that 24% and 26% of physicians were higher laboratory and imaging testers, respectively, compared to their peers and as expected that thyroid cancer and prostate cancers were diagnosed more by higher laboratory testers and higher imaging testers. These findings are consistent with the literature on screening, OverDiagnosis and subsequent overtreatment for both thyroid and prostate cancer. We also found that the patients of doctors who were lower testers were diagnosed with fewer thyroid cancers. We did not expect to and did not find a relationship between higher testing and cancers of lung, esophagus and ovary. Nonstatistically significant trends were seen in other cancers implicated in the F I G U R E 3 The age/sex standardized rates (cases/10 000 patient years) for the UPC laboratory and imaging tester groups. OverDiagnosis literature (breast, uterus, kidney) suggesting that the overuse of tests, subclinical disease, and a proportion of indolent cases along with the potential for overtreatment might be part of the stories in those cancers. Unexpectedly, we found that the patients of doctors who were lower testers were diagnosed with more ovary, lung, esophagus and kidney cancers; reasons for this (perhaps social reasons by patients) would be speculative and are outside the scope of this study. Our finding that rates of pancreas cancer are related to rates of routine testing (Figures 3F and 4D) was unexpected. Pancreas cancer was diagnosed more often by the Moderate higher imaging tester group and had consistent marginal results throughout all the analyses of imaging tests. The incidence of pancreas cancer is not changing in Canada 32 and mortality is slowly declining. 33 In the United States, however, the incidence is slowly rising and mortality is flat 34 which is the typical pattern of an overdiagnosed cancer. 15 There appears to be a role for screening the <10% of patients with a family history and there are recognized premalignant lesions (pancreatic intraepithelial neoplasia and intraductal papillary mucinous neoplasm) that potentially could be picked up by high testers. The other surprise cancer was Non-Hodgkins Lymphoma. Although the rates for Non-Hodgkins Lymphoma never achieved statistical significance, evidence of a marginal increase in rates with higher users was a consistent finding throughout our analysis unlike all the other cancers we tested. The incidence of Non-Hodgkins Lymphoma is declining and survival is improving in Canada and the United States.  We made a number of assumptions for this study. We, for example, assumed that doctors who ordered more routine tests (perhaps some inappropriate or unnecessary) on behalf of their patients would also order more disease-specific tests (or cancer-specific tests) as screening tests. This is a reasonable assumption that cannot be proven as the data linking family doctors to breast, cervix and colon screening testing is incomplete. We assumed the rates of UPC tests from 2008 to 2012 would reflect practice for the few years before 2008.
The complete linked dataset for a very large study population and their doctors is the strength of this study; however, there are potential limitations. First, we chose an informal test selection process instead of a more rigorous modified Delphi process. We felt this was appropriate for our question as were looking for common tests ordered by family physicians on a background of Choosing Wisely and common sense. It is unlikely that different tests would have been selected by a more time consuming and expensive process. Second, in the absence of any relevant literature on the comparative clinical impact of test overuse and underuse, the authors assumed that the cutpoints of 25%, 50% or 100% more tests had clinical relevance. We did post hoc explore other statistical options such as standard deviations and there was no difference in the overall results (data not included). Third, we could not analyze cancer rates by Stage or extent of disease as "stage data" was not completely or reliability available for the 11 sites during the study time. A future study with "stage data" might compare the overuse of tests and early stage disease. Fourth, we could not assess the indications for the tests including patient wishes. Fifth, there were missing patients including 263 478 who could not be assigned to a UPC and women who gave birth during the study period (7773) as they would have had more routine tests. Patients who became ineligible for health care coverage after 2008 (109 271) were included until ineligibility. We excluded over 12,000 family physicians including those without information in the ICES dataset, UPCs with small practices and UPCs who were not involved in full-time primary care noting that there is no reason to suspect that similar doctors with similar patients would not have similar testing behavior within our health care system. Sixth, our results are specific to the 23 tests, the 11 common cancers and the universal health care system in Ontario and may not be generalizable to other tests, other cancers and other health care funding systems. Finally, in Ontario, routine laboratory tests done on outpatients at hospitals are not billed to OHIP and therefore do not appear in our datasets. We therefore may have underestimated the rates of routine laboratory tests but this represents only 5% of tests 35 and is unlikely to influence our results.

| CONCLUSION
Due to the wide variation in the ordering of common and routine laboratory and imaging tests, family doctors in Ontario Canada could be classified into Typical, Higher and Lower testers. As predicted by the literature on OverDiagnosis, the patients of physicians who were Higher testers were diagnosed more often with thyroid and prostate cancers. The overuse of medical tests in a health care system leads to the OverDiagnosis with downstream implications of overtreatment and increased costs. Mechanisms to address and correct overuse of tests through education would result in reductions in morbidity and cost.

ACKNOWLEDGMENTS
This project was approved by the Ethics Review Board of Queen's University # EPID-465-14. This project was approved by the ICES Cancer Program and by the institutional review board at Sunnybrook Health Sciences Centre, Toronto, Canada. ICES is an independent, nonprofit research organization funded by annual grant by the Ontario Ministry of Health and Long-term Care.The opinions, results, and conclusions reported in the paper are those of the authors. No endorsement by ICES or the Ontario Ministry of Health and Long-Term Care is intended or should be inferred. Parts of this work are based on data and information provided by Cancer Care Ontario (CCo). The opinions, results, view, and conclusions reported in the paper are those of the author and do not necessarily reflect those of CCo. No endorsement by CCo is intended or should be inferred. Parts of this work are based on data and information compiled and provided by the Canadian Institute of Health Information. The analyses, conclusions, opinions, and statements expressed herein are those of the author and not necessarily those of CIhI. The study coordinator was Ms Susan Rohland. Medical test consultants included Drs Richard Glazier, Richard Birtwhistle, George Southey, David Marcassa, Robyn Brown and Johanna Murphy. Presented at Preventing Overdiagnosis, Barcelona, Spain, Sept 2016.