Reply

Authors


To the Editor:

We appreciate Drs. Schmajuk and Yazdany's thoughtful comments. In choosing gold standards for validation studies, the 2 methods (using ACR criteria for RA [1] or a diagnosis made by rheumatologists) have their merits and flaws. Studies like ours and Dr. Gabriel's validation study of RA patients from Olmsted County, Minnesota using ACR criteria may underestimate the true RA cases if there is underdocumentation of RA criteria in the medical records (2). The use of a diagnosis of rheumatic diseases made by rheumatologists in the clinical charts as the gold standard, like in the studies by Bernatsky et al and Singh et al, may overcome the problem of incomplete chart documentation of criteria for rheumatic diseases, but this method has other potential problems (3, 4). Since billing claims are based on the diagnoses made at clinic visits, the diagnoses using claims data will have high PPVs if rheumatologists accurately file billing claims according to their final diagnoses. In other words, PPVs do not reflect the accuracy of clinical diagnoses made by rheumatologists; rather, they indicate the reliability of the billing claims filing process. Another potential problem with using a rheumatologist's diagnosis is the subjective nature with which clinical diagnoses are made, particularly in rheumatic diseases, and this is one of the main reasons why standardized diagnostic criteria are necessary for research in rheumatic diseases. There is currently no established guideline regarding what levels of sensitivity, specificity, and PPVs are considered acceptable for algorithms used to identify rheumatic diseases in administrative databases, and the accepted levels may vary depending on study purposes. For example, for studies involving rare diseases (diseases affecting <200,000 individuals in the US [5]), an algorithm with high sensitivity will be needed, whereas for a relatively common disease like RA, algorithms with higher PPVs and specificities may be more important.

With regard to the questions posed by Drs. Schmajuk and Yazdany, in our study, we used outpatient and inpatient VA data for patients receiving care at the VA hospital, and not Medicare data. We agree with Drs. Schmajuk and Yazdany that the exclusion of patients who are billed for RA care from non-VA sources will increase the probability of identifying patients receiving care within the VA health care system and create a cleaner cohort. However, we respectfully disagree that the results will be more valid, since this depends on the purpose of the diagnostic algorithms. In the study by Singh et al, only patients seen in their VA rheumatology clinic were included, and their results may not be generalized to other VA clinics. In our study, our purpose was to test algorithms that will allow the inclusion of patients who received care at the VA (for example, primary care) but who were not seen by VA rheumatologists. Therefore, it was necessary to include patients receiving care for RA from non-VA sources. We realized that without including Medicare data, patients who received RA care from non-VA sources and whose VA physicians did not enter an RA diagnosis code at any VA clinic visits were missed in the algorithms used in our study. However, our study's purpose, to test the accuracy of International Classification of Diseases, Ninth Revision RA diagnosis codes for diagnosing RA in the various VA health care settings, would not be affected by the failure to detect such patients from the Medicare data.

Sensitivity, specificity, and receiver operating characteristic curves are known intrinsic properties of tests, but PPVs and negative predictive values are dependent on the accuracy of classification and various extrinsic factors (such as disease prevalence). The calculations suggested by Drs. Schmajuk and Yazdany using weighted PPVs for total population are based on the assumption that RA prevalence is the same in both the sample and the total population for each category. These assumptions may not always be true, and it was for this reason that we did not extrapolate our random-sample PPV results to the total population in Table 2 of our article.

We agree that the use of 4 of 7 1987 ACR criteria for an RA diagnosis may be problematic if physicians do not document the ACR RA criteria into clinic charts when diagnosing RA. Although this is a problem for charts documented by nonrheumatologists, it should be less of an issue for rheumatologists, since it was noted in the study by Singh et al that 92% of their patients who met their gold standard had 4 or more ACR criteria (4). The review of nonrheumatologist clinical notes was a meticulous process carried out using word finding methods with various combinations of “rheum,” “rhuem,” “reum,” “ruem,” and “arthritis” in addition to regular screening processes. The words, which helped to identify areas of the chart where documentation of non-VA rheumatologist care was located, were mostly found in the assessment/plan section of the clinical notes. We were satisfied that our screening processes were accurate when both reviewers independently identified identical charts of the 64 patients who fulfilled the criteria of an RA diagnosis based on “patient self-report of being managed by a non-VA rheumatologist for RA.” The anti-CCP antibody test was added because of its high specificity, but we agree that its ability to detect additional cases of RA was limited by the small number of these tests being done in our random sample of RA patients.

We agree with the comment that face-to-face physician encounters are important and this is the reason we used the criterion of having 2 RA diagnosis codes made at least 6 months apart instead of 6 weeks that was used by previous studies. We used this criterion because of the known limitations of the VA information system, since various personnel are able to enter diagnosis codes for encounters without verification by trained staff. In a random sample of 60 patients (10 from each of the 6 categories) taken from 543 patients, we looked for the number of encounters over a 6-month period in the following categories: 1) physician visit, 2) nonphysician visit (such as treatment room visits for infusions), and 3) telephone followup (Table 1). We found that 6 weeks were insufficient because, in a 2-month window, there was a mean of only 1 physician visit. However, in a 6-month window, there were a minimum of 2 physician visits and a mean of 3 physician visits. In other words, our criteria of 2 diagnosis codes at least 6 months apart increased the chance of having at least 2 physician visits occurring in between the 2 RA diagnosis codes. This is important because we have noted in our article that there may be other nonphysician visits occurring in proximity to physician visits that may have used the same diagnosis codes.

Table 1. Range and mean number of encounters in various time windows*
 1 month2 months3 months4 months5 months6 months
PNPTELPNPTELPNPTELPNPTELPNPTELPNPTEL
  • *

    P = physician visit; NP = nonphysician visit; TEL = telephone followup.

Mean111121232233243354
Minimum000000000000100200
Maximum29241744274633663476359

Bernard Ng MBBS, MMed*, Fawad Aslam MBBS†, Hong-Jen Yu MS‡, * Michael E. DeBakey VA Medical Center Health Services, Research and Development Center of Excellence, and Baylor College of Medicine, Houston, TX, † Baylor College of Medicine, Houston, TX, ‡ Michael E. DeBakey VA Medical Center Health Services, Research and Development Center of Excellence, Houston, TX.

Ancillary