Strategies for improving the diagnostic specificity of the frequency doubling perimeter
Dr N. M. Jansonius
Department of Ophthalmology
University Hospital Groningen
PO Box 30 001
9700 RB Groningen
Tel: + 31 50 361 2510
Fax: + 31 50 361 1709
Purpose: To evaluate various strategies designed to improve the specificity of the interpretation of results obtained with the frequency doubling technology perimeter (FDT) used in the full-threshold mode.
Methods: Three different strategies were compared using data from 452 glaucoma patients and 237 healthy subjects: combining several FDT parameters from a single test, combining the FDT test with a GDx test, and confirming an abnormal FDT test result with a repeat test.
Results: Confirming an abnormal FDT test result with a repeat test yielded a specificity increase of 0.10, from 0.80 to 0.90, at the expense of some loss of sensitivity for early but not for moderate or severe glaucoma. Combining several FDT parameters from a single test and combining FDT with GDx did not yield any noticeable increase in diagnostic performance.
Conclusions: A modest increase in FDT diagnostic performance can be obtained by the confirmation of an abnormal test result with a repeat test.
Glaucoma is a chronic disease that may cause irreversible blindness. Early stages of glaucoma do not result in any symptoms. Treatment of glaucoma may arrest or slow down its progress (Heijl et al. 2002).
Screening for glaucoma may, therefore, be advisable and various studies have suggested that the frequency doubling technique (FDT; Carl Zeiss Meditec Inc., Dublin, California, USA) may be suitable for this purpose (Johnson & Samuels 1997; Quigley 1998; Cello et al. 2000). Sensitivities and specificities reported in previous data vary. The FDT parameter with the best diagnostic performance was found to be the number of depressed test-points p < 0.01 in the total deviation probability plot, with a specificity of 0.81 at a sensitivity of 0.90 (cut-off point > 1; Heeg et al. 2005). False-positive test results will be much more common than true-positive test results with this specificity because the prevalence of glaucoma is only about 1% in the general population (Wolfs et al. 2000).
Several investigators have tried to improve the diagnostic performance of FDT. Khong et al. (2001) investigated whether the specificity of FDT could be improved by confirming an abnormal test result. They found that this increased the specificity from 0.62 to 0.69. Horn et al. (2003) evaluated the diagnostic performance of the combined use of FDT (C20-5 screening mode) and the nerve fibre analyser (GDx; Laser Diagnostic Technologies, San Diego, California, USA) and found that the combined use of both techniques was superior to testing with only one method.
The aim of the present study was to investigate the extent to which the strategies mentioned above, and others, are able to decrease the number of false-positive FDT test results in a large group of subjects. Three strategies were explored in order to do this. Firstly, we tried to combine the FDT result (number of depressed test-points p < 0.01 in the total deviation probability plot) with another FDT parameter. Secondly, we added information from a GDx measurement. Thirdly, we added a second FDT measurement.
Material and Methods
Patient data and study protocol
The data and other details have been previously described (Heeg et al. 2005). To summarize, 875 regular visitors to the glaucoma service at our outpatients department underwent frequency doubling perimetry and a GDx measurement as a baseline measurement for a prospective follow-up. In this cohort, 452 patients were classified as glaucoma patients. A glaucoma patient was defined as a patient with a reproducible visual field defect in at least one eye using conventional perimetry. Defects had to be compatible with glaucoma and without any other explanation. In addition to these 452 glaucoma patients from the glaucoma service, 237 healthy subjects were recruited from outside the hospital. Very few exclusion criteria were applied so that the samples were as representative as possible of the populations studied. All 452 glaucoma patients underwent at least one FDT and one GDx test. A second FDT test was performed on 120 of the glaucoma patients. All 237 healthy subjects completed at least one FDT and 108 of them completed a GDx test. A second FDT measurement was performed on 129 healthy subjects. Thus, the various strategies were explored in different groups and subgroups.
Frequency doubling technique
Testing was performed with the frequency doubling perimeter (FDT; Version 2.60; Carl Zeiss Meditec Inc., Dublin, California, USA) using the C-20 full threshold mode. An FDT measurement was considered positive if there was an abnormal test result, or an unfinished test, in at least one eye. An abnormal test result was defined as > 1 depressed test-point p < 0.01 in the total deviation probability plot (TD > 1; Heeg et al. 2005).
Three additional parameters were studied: mean deviation in dB (MD), pattern standard deviation in dB (PSD), and the number of depressed test-points p < 0.01 in the pattern deviation probability plot (PD). For details of FDT, see Maddess & Henry (1992), Johnson & Samuels (1997) and Maddess et al. (1999).
Testing was performed with the nerve fibre analyser (GDx, Version 2.0.10; Laser Diagnostic Technologies, San Diego, California, USA). We confined the analyses within this study to the GDx parameter the Number. The Number is a global parameter that gives an indication of glaucoma probability. Its value ranges from 0 (no glaucoma) to 100 (glaucoma). A GDx measurement was considered positive if there was an abnormal test result or an unfinished test in at least one eye. An abnormal test result was defined as the Number > 29 (Heeg et al. 2005).
Six images were recorded for each eye. Another six images were made if the first series did not contain an image with sufficiently high image quality. High image quality required a well centred optic nerve head, an in-focus image, equal illumination in all quadrants and an absence of motion artefacts. A mean image was created if at least two images with high image quality were available. For scanning laser polarimetry details, see Weinreb et al. (1990) and Dreher & Reiter (1992).
Three different strategies to improve the diagnostic performance of FDT were explored:
- 1combining TD with another parameter of FDT: MD, PSD or PD;
- 2combining TD with GDx parameter the Number, and
- 3repeating the FDT measurement after a positive test resultwhere TD is the number of test-point locations depressed at the p < 0.01 level in the total deviation probability plot, MD the mean deviation in dB, PSD the pattern standard deviation in dB, and PD the number of test-point locations depressed at the p < 0.01 level in the pattern deviation probability plot.
For strategies 1 and 2, the parameters MD, PSD, PD and the Number were combined with TD and the resulting sensitivities and specificities were subsequently calculated. Combining parameters can be carried out in many ways. We used the AND/OR operators. If the OR operator is used for defining an abnormal (i.e. positive) combination, then there must be at least one abnormal parameter. An abnormal combination in the case of the AND operator requires both parameters to be abnormal.
For strategy 3, only the AND operator was applied (see Discussion), so that the overall test result was considered positive if both the first and second tests were abnormal.
In addition to exploring the effects of the strategies on the group as a whole, the strategies were also tested after the exclusion of patients with early glaucoma. Early glaucoma was defined as MD(HFA) ≥ − 6 dB in the worse eye.
Table 1A and B shows the sensitivity and specificity for combinations of TD with FDT parameters MD, PSD and PD (Table 1A) and GDx parameter the Number (Table 1B). The original cut-off points (Heeg et al. 2005) were used for both TD and the second parameter. At this cut-off point, TD alone had a sensitivity of 0.90 in the 452 glaucoma patients, a specificity of 0.81 in the healthy subjects as presented in Table 1A, and a specificity of 0.86 in the subgroup of healthy subjects as presented in Table 1B. As expected, the application of the AND operator resulted in an increase in specificity and a decrease in sensitivity, whereas the opposite occurred in the case of the OR operator. As can be seen in Table 1A, no substantial increase in diagnostic performance could be obtained by combining FDT parameters. The combination of TD AND the Number resulted in a specificity increase of 0.08 (from 0.86 to 0.94) at the expense of a sensitivity decrease of 0.07 (from 0.90 to 0.83). This is shown in Table 1B. After the exclusion of patients with early glaucoma, the sensitivity of TD AND the Number increased to 0.99. This value should be compared to the sensitivity of TD alone, which is 1.00 after the exclusion of patients with early glaucoma.
Table 1. Sensitivity and specificity after combining TD (the number of depressed test-points p < 0.01 in the total deviation probability plot) with a second parameter of FDT (2A; based on 452 glaucoma patients and 237 healthy subjects) and with GDx (2B; based on 452 glaucoma patients and 108 healthy subjects). Combinations were performed with both AND and OR operators. Original cut-off points from Heeg et al. (2005) were used for all parameters.
|MD||< − 1.8||0.90||0.72|
|PSD||> + 4.8||0.91||0.68|
|TD AND MD|| ||0.87||0.83|
|TD OR MD|| ||0.92||0.70|
|TD AND PSD|| ||0.85||0.84|
|TD OR PSD|| ||0.96||0.65|
|TD AND PD|| ||0.83||0.84|
|TD OR PD|| ||0.97||0.56|
|TD AND Number|| ||0.83||0.94|
|TD OR Number|| ||0.96||0.69|
Table 2 shows the results for strategy 3, which involved the repetition of an FDT measurement after a positive test result. The first test yielded an abnormal result in 105 of 120 glaucoma patients and in 26 of 129 healthy subjects. The resulting sensitivity (0.88) and specificity (0.80) in this subgroup of 249 subjects do not differ from those of the original 452 glaucoma patients and 237 healthy subjects (0.90 and 0.81, respectively).
Table 2. Sensitivity and specificity after repeating an abnormal first FDT test (FDT1) with a second FDT test (FDT2) for a subgroup of 249 patients (120 glaucoma patients and 129 healthy subjects). An abnormal FDT test was defined as more than one depressed test-point p < 0.01 in the total deviation probability plot. For same location, at least two depressed test-points had to have the same location on retest. For the last strategy, a second test had only to be performed if the worse eye on FDT1 had 2–4 depressed test locations (< 2 in both eyes on FDT1: negative final test result, no retest required; > 4 in any eye on FDT1: positive final test result, no retest required).
| FDT1 > 1||0.88||0.80|
| FDT1 > 1 AND FDT2 > 1||0.83||0.88|
| FDT1 > 1 AND FDT2 > 1, same location||0.82||0.90|
| FDT1 > 5 OR (FDT1 > 1 AND FDT2 > 1), same location||0.84||0.90|
|Early glaucoma excluded|
| FDT1 > 1||0.99||0.80|
| FDT1 > 1 AND FDT2 > 1||0.99||0.88|
| FDT1 > 1 AND FDT2 > 1, same location||0.97||0.90|
| FDT1 > 5 OR (FDT1 > 1 AND FDT2 > 1), same location||0.99||0.90|
Applying the AND operator yielded a specificity increase of 0.08 (0.80 to 0.88) at the expense of some sensitivity (0.83 instead of 0.88). The requirement that at least two depressed test locations have the same location on retest resulted in a specificity of 0.90 at a sensitivity of 0.82. Repeating a test with five or more depressed test locations in any eye appeared to be superfluous. If these tests had been considered as positive without performing a retest, this would have prevented 87 of 105 retests carried out in the glaucoma patients, and nine of 26 retests carried out in the healthy subjects, without any loss of diagnostic performance. This approach is depicted as the last strategy in Table 2. The sensitivity in this strategy returned to 0.99 after the exclusion of patients with early glaucoma. After the exclusion, 77 glaucoma patients remained.
Only the third strategy of those tested yielded some gain in diagnostic performance. However, even when employing that strategy a substantial increase in specificity appeared to be impossible without a small decrease in sensitivity for early glaucoma cases. Because this might also be achieved by simply raising the cut-off point of TD, we explored that as well. Table 3 presents the results. Increasing the cut-off point of TD as a single parameter is seen to be less effective than the third strategy in increasing the diagnostic performance.
Table 3. Sensitivity and specificity of TD (number of depressed test-points p < 0.01 in the total deviation probability plot) as a function of its cut-off point based on 452 glaucoma patients (n = 329 after the exclusion of early glaucoma) and 237 healthy subjects.
|> 1||All patients||0.90||0.81|
| ||Early glaucoma excluded||1.00||0.81|
|> 2||All patients||0.86||0.85|
| ||Early glaucoma excluded||0.99||0.85|
|> 3||All patients||0.82||0.88|
| ||Early glaucoma excluded||0.97||0.88|
In this study, we evaluated three strategies intended to improve the specificity of FDT. The most successful strategy appears to be confirming an abnormal FDT test result with a repeat test. This strategy yields a substantial increase in specificity at the expense of some loss of sensitivity for early but not for moderate and severe glaucoma. Neither combining FDT with another FDT parameter nor combining FDT with GDx seems to be a sensible approach. Khong et al. (2001) found a specificity of 0.62 using the FDT C20-5 screening mode, which improved to 0.69 after a second measurement in patients with abnormal first test results. It is difficult to compare their results with our approach because they excluded almost 50% of the original 223 subjects, whereas we excluded very few patients. Nevertheless, the specificity increase they found (0.07) seems to be surprisingly similar to that in our results. Joson et al. (2002) measured the effect of repeated testing in healthy subjects using the C20-5 screening mode. They found a specificity increase from 0.85 to 0.96. Neither Khong et al. (2001) nor Joson et al. (2002) acknowledged the fact that repeated testing might compromise sensitivity. Khong et al. (2001) did mention a sensitivity, but the sensitivity of 1.00 they found was based on results from only two patients. The corresponding 95% confidence intervals for n = 2 would range from 0.22 to 1.00 (Abramson & Gahlinger 2001). Horn et al. (2003) obtained sensitivities of 0.85 and 0.64, for FDT and GDx, respectively, at a predefined specificity of 0.95 in 252 patients. When FDT was combined with GDx the sensitivity increased by 0.07 to 0.92. The increase in diagnostic performance found by Horn et al. (2003) when combining both devices is difficult to compare with our results because there are several methodological differences between the two studies.
As mentioned in Methods, there are many ways to combine parameters. Horn et al. (2003), for example, used a two-dimensional discriminant analysis to combine their FDT score with the GDx Number. In our study, we selected the straightforward AND/OR operators. The AND operator is especially interesting from a logistical point of view: a second test need only be performed if the first test yields an abnormal test result. All other methods for combining parameters require both tests to be performed on all patients.
The present study was conducted using the original version of the FDT perimeter with 17 test locations. A 24–2 version (Matrix) has recently been developed with 54 test locations. The original version, however, is still more common in clinical practice, is better documented than the newer instrument, and is probably more attractive as a screening instrument. A new GDx version with a variable corneal compensator (GDx-VCC) was launched last year. The GDx-VCC seems to display a slightly better diagnostic performance when compared to the original GDx (Zhou & Weinreb 2002; Bagga et al. 2003; Reus et al. 2003).
In conclusion, a modest increase in FDT diagnostic performance could be obtained by confirmation of an abnormal test result with a repeat test. This strategy resulted in a specificity of 0.90, which is still rather a low specificity for a screening test. Combining TD with another FDT parameter or combining FDT with GDx did not seem to be useful. The combination of FDT with GDx also requires the purchase of a second device, whereas confirming an abnormal test result with a repeat test is more in keeping with common clinical practice. The latter strategy would, therefore, seem to be the most sensible approach.
This research was supported by the Dutch Health Care Insurance Council (CVZ) through the Department of Medical Technology Assessment (MTA) of the University Hospital Groningen, the Netherlands.