Performance of automated antimicrobial susceptibility testing for the detection of antimicrobial resistance in gram‐negative bacteria: a NordicAST study

Automated testing of antimicrobial susceptibility is common in clinical microbiology laboratories but their ability to detect low‐level resistance has been questioned. This Nordic multicentre study aimed to evaluate the performance of commercially available automated AST systems. A phenotypically well‐characterised collection of gram‐negative bacilli (Escherichia coli (n = 7), Klebsiella pneumoniae (n = 6) and Pseudomonas aeruginosa (n = 7)) with and without resistance mechanisms was examined by Danish (n = 1), Finnish (n = 6), Norwegian (n = 16) and Swedish (n = 5) laboratories. Minimum inhibitory concentrations (MICs) were determined for 12 antimicrobials with automated systems and compared with MICs obtained with gold standard broth microdilution. The automated systems used were VITEK 2 (n = 23), Phoenix (n = 4), MicroScan (n = 1), and ARIS (n = 1). Very major errors were identified for six antimicrobials; cefotaxime (6.9%), meropenem (0.4%), ciprofloxacin (0.7%), ertapenem (4.3%), amikacin (3.4%) and colistin (6.4%). Categorical agreement of MIC for the automated systems compared to broth microdilution ranged from 83% for imipenem to 100% for ampicillin and trimethoprim‐sulfamethoxazole. The analysis revealed several important antimicrobials where resistance was underestimated, potentially with significant consequences in patient treatment. The results cast doubt on the use of automated AST in the management of patients with serious infections and suggests that more work is needed to define their limitations.

One consequence of the increase in antimicrobial resistance in recent years is the need for rapid identification and antimicrobial susceptibility testing (AST) of pathogenic bacteria. This is particularly important in patients with serious infections. In bloodstream infections, a decrease in survival of 7.6% for each hour after onset of infection has been reported until effective antibiotics are administered, as well as a 5-fold increase in mortality when the patients received inappropriate antimicrobials within 6 h after recognition of septic shock [1,2]. Recent studies have also documented the value of more rapid diagnosis and AST, which allows earlier appropriate, targeted antimicrobial use [3,4]. Rapid AST has been shown to improve patient outcomes, lower mortality, decrease hospital length of stay, lower occurrence of superinfections and adverse drug reactions, and decrease costs [5].
Several commercial automated AST platforms have been developed to enable clinical laboratories to deliver rapid reports with accurate microbial identification and AST results. Among the advantages of these systems is that the results can be transferred to the laboratory information systems automatically, thus lowering the risk for human errors.
The use of automated methods for AST has become widespread as focus on turnaround time has increased, and to save manpower. Despite the introduction of such systems, there has still been doubts on the accuracy of these systems, particularly when facing low-level resistant pathogens. In addition, microbiology laboratories often observe a variation in AST results when using different systems, like automated systems, gradient tests, or disc diffusion, and it is our experience that this leaves the laboratories confused regarding which system to trust.
The aim of this study was to evaluate automated AST platforms using a well-characterised collection of gram-negative bacteria (Escherichia coli, Klebsiella pneumoniae and Pseudomonas aeruginosa) and compare the results with minimum inhibitory concentration (MIC) values obtained with gold standard broth microdilution.

Study design
The study was organised through the NordicAST (Nordic Committee on Antimicrobial Susceptibility Testing) network (www.nordicast.org). All Danish (n = 11), Finnish (n = 24), Icelandic (n = 1), Norwegian (n = 22) and Swedish (n = 26) clinical microbiology laboratories were invited to participate in the study. All laboratories were asked to perform susceptibility testing with their automated system on a blinded collection of isolates with varying MICs for clinically important antimicrobial agents. Laboratories that had no automated AST system were unable to participate in the study.
Antimicrobial susceptibility testing was to be tested for selected antimicrobial agents using an automated or semiautomated system (Vitek, Phoenix, MicroScan, ARIS) according to the manufacturers' instructions. Antibiotics tested are presented in Table 1. The study was carried out in the first quarter of 2017.

Microorganisms and antimicrobials
The isolates sent out to the laboratories included 20 gramnegative bacteria: Escherichia coli (n = 7, including ATCC 25922), Klebsiella pneumoniae (n = 6, including ATCC 700603), and Pseudomonas aeruginosa (n = 7, including ATCC 27853). The panels contained a mixture of susceptible and resistant bacteria and were selected based on having MIC values around the breakpoints. For most isolates the resistance mechanism was not known.
The EUCAST Development Laboratory determined minimal inhibitory concentration (MIC) by performing broth microdilution (BMD) according to ISO standard 20776-1 on freeze-dried Sensititre plates (Thermo Fisher Scientific, Basingstoke, UK) [6]. The MIC test was repeated three times per isolate and a consensus MIC value was calculated for all agents included in the study: for example 2, 2, 2 = 2 mg/L or 2, 2, 1 = 2 mg/L or 2, 1, 1 = 1 mg/L. These consensus MICs are subsequently called EUCAST reference MICs and used as gold standard (Table 2).
Categorical agreement (CA) and error rates were calculated for each strain-antibiotic combination against EUCAST Breakpoint Table v 11.0 [7] using the following definitions: Very major error: Susceptible (S) by the test method and resistant (R) by the reference method.
Major errors: Resistant (R) by the test method and susceptible (S) by the reference method.
Minor errors: Susceptible, increased exposure (I), by the test method and resistant (R) by the reference method or Resistant (R) by the test method and Susceptible, increased exposure (I), by the reference method.
For strain-antibiotic combinations for which the whole wild-type population is categorised as 'Susceptible, increased exposure (I)', this category was regarded as the susceptible category, that is only major and very major errors could occur. Minor errors were only calculated for strain-antibiotic combinations having both 'Susceptible, standard dosing regimen (S)' and 'Susceptible, increased exposure (I)' categories.
ATUs (Area of Technical Uncertainty) are included in their interpretation category, that is S, I or R. For ciprofloxacin and E. coli and K. pneumoniae, no SIR categorisation could be made for MIC 0.5 mg/L (= ATU), but if the gold standard determined the MIC to be 0.5 mg/L and the automated method results were discordant, these were considered minor errors.

RESULTS
Twenty-eight Nordic laboratories agreed to participate and delivered data to the study; one from Denmark, six from Finland, 16 from Norway and five from Sweden. One Swedish laboratory The automated systems included were: VITEK 2, bioM erieux (23 laboratories; Marcy-l' Etoile, France), Phoenix, Becton-Dickinson (four laboratories; Franklin Lakes, NJ, USA), MicroScan, Beckman Coulter (one laboratory; Brea, CA, USA) and ARIS, Thermo Fisher Scientific (one laboratory; Waltham, MA, USA). Only calculations based on results from laboratories that used VITEK2 are shown in this manuscript. Results from laboratories that used automated systems from other manufacturers are too few. VITEK2 panels included: AST-N230/N222 (n = 6, all from Finland), AST-N218 (n = 4, all from Sweden), AST-N204 (n = 1, Sweden) and AST-N209 (n = 12, all from Norway).
The overall results for laboratories using VITEK2 are presented in Table 3. The table shows the number of laboratories using VITEK and their MIC correlate as compared to the EUCAST reference MIC (broth microdilution). The other automated systems performed equally to VITEK2, with disagreement according to the reference method for the same strain-antibiotic combinations, but no calculations about categorical agreement could be made. For transparency all results are shown in the Table S1, including results for systems other than VITEK2.
In Table 4 results for the three species are shown separately to illustrate the performance in detail. The CA between the automated system and the broth microdilution varied between 54.8% (K. pneumoniae and imipenem) and 100% (E. coli and ampicillin, cefotaxime, ceftazidime, ertapenem, trimethoprim-sulfamethoxazole and colistin; K. pneumoniae and ceftazidime, gentamicin, amikacin, trimethoprim-sulphamethoxazole and colistin; P. aeruginosa and piperacillin-tazobactam, imipenem and amikacin). CA could be calculated for 30 isolate-antibiotic combinations (E. coli 12, K. pneumoniae 11 and P. aeruginosa 7). Ten combinations had less than 90% CA (E. coli 3/12, K. pneumoniae 4/11 and P. aeruginosa 3/7 combinations), and 13 combinations less than 95% CA. CA were low for certain strain-antibiotic combinations, usually when the MICs were close to the breakpoint or in the Area of Technical Uncertainty (ATU).
To illustrate this, we mention some of the strainantibiotic combinations where we found a large    GEN, gentamicin; IMI, imipenem; ME, major error; mE, minor error; MER, meropenem; NT, not tested; PTZ, piperacillin-tazobactam; SXT, trimethoprimsulfamethoxazole; VME, very major error. a For or species-agent combinations for which the whole wild-type population is categorised as 'Susceptible, increased exposure (I)', this category was regarded as the susceptible category, that is only major and very major errors could occur. Minor errors were only calculated for species-agent combinations having both 'Susceptible, standard dosing regimen (S)' and 'Susceptible, increased exposure (I)' categories.  Table S1.

DISCUSSION
This study shows that automated systems for antimicrobial resistance testing, in particular VITEK2 as other systems were rarely used, had several problems in determining the MIC correctly when compared to the reference method (broth microdilution). This was especially true when the MICs were close to the breakpoints or in the ATU. Only laboratories using VITEK2 were participating in a sufficient number to conduct quantitative analyses of the performance. Even though no calculations could be made for the other automated systems, similar results were observed for all systems.
It is common to calculate essential agreement (MICs within AE1 dilution of reference MIC) to evaluate the performance characteristics. In this study, we could not calculate essential agreements due to the truncated results of most of the automated systems. A truncated MIC means that the systems only determine an MIC value that is lower, greater than or equal to a breakpoint MIC. This makes it impossible to evaluate how the systems perform compared to the gold standard and is one of the disadvantages with the automated systems.
Different countries used different AST cards. For the VITEK2 system from bioM erieux, a total of five different cards were used in the different Nordic countries. N204 was only used by one laboratory in Sweden, whereas all other laboratories from Sweden used N218. N209 is only used in Norway and N230 is only used in Finland, sometimes in combination with the N222-card for P. aeruginosa for testing of antibiotics absent in the N230-card. Trimethoprim-sulfamethoxazole is not part of the VITEK2 AST218-card. Gentamicin was apparently not part of the VITEK2 AST230-card. Ertapenem is not on the N209-card or N230-card, and hence not tested by laboratories from Norway or Finland (not testing additionally with the N222-card). VITEK2 users using the card N230/N222 were the only ones to test for colistin. N230 does not have cefotaxime; hence laboratories in Finland do not test for this substance. Ampicillin is not present in the N230 and the N218 cards. We do not know if these differences influenced the test results, and we do not know if there are other differences between the cards than the ones described here.
Most laboratories delivered susceptibility test results on all mandatory antibiotics, and quite a few for the optional antibiotics, though very few laboratories tested for colistin. This can be due to the warning from EUCAST that there are problems with several commercially available products for susceptibility testing of colistin [8]. The same problem applies to disc diffusion and gradient tests, leaving only broth microdilution as the only reliable method available for colistin susceptibility testing [9]. We nevertheless decided to include colistin in the panel to illustrate the problem, as manufacturers of commercial automated antimicrobial susceptibility testing continue to include colistin in their panels, without appropriate warnings that these results should not be used in patient management.
Many of the automated systems provide a short turnaround time, which is used by the manufacturers in their marketing. A short time to result will allow the clinicians to start appropriate antimicrobial therapy earlier, which can lead to more patients surviving serious infections. Some researchers have also inoculated the automated systems directly from blood cultures without waiting for subcultures, obtaining intertest agreement rates around or above 90% [10,11]. It can be argued that these researchers test direct AST with automated systems against standard AST with automated systems, thus not testing the quality of the automated AST versus the reference method. Those relying on disc diffusion or gradient tests have also started the work towards a more rapid susceptibility testing, by applying a disc diffusion method directly on material from blood cultures, obtaining reliable results [12,13]. This means that disc diffusion according to EUCAST is becoming competitive compared to commercial automated systems when it comes to time to results. Others have also tried to compare automated systems and Etest (MIC gradient tests) by doing direct AST from blood cultures [14]. The results show a discrepancy between the systems, similar to what we find in our study.
In addition to rapid AST, it should not be forgotten that many laboratories today rely on direct identification of bacteria from, for example blood cultures with Maldi-TOF, which has increased the speed of diagnostics tremendously [15][16][17][18]. This means that when some publications compare the identification part of the automated systems with traditional methods, saying that traditional identification takes 24-48 h, this is no longer true [3,19,20]. Today accurate plate readers are also available, making it possible to transfer the AST results from disc diffusion tests to the laboratory information systems accurately, eliminating the possibility for human error like the automated systems.
Major errors (resistant by the test method and susceptible by the reference method) lead to an overestimation of resistance by the erroneous method, and can result in a decision not to use a therapeutic agent, which could have been effective. These can have serious consequences if therapeutic options are very limited, but otherwise may not result in harm to the patient. Very major errors are the most serious errors as the method failed to detect resistance, which may result in the use of an ineffective therapeutic agent for treatment of an infection.
Some studies state, especially a few years ago, that automated methods for AST provide comparable results to the reference methods established by organisations such as the European Committee on Antimicrobial Susceptibility Testing (EUCAST) and Clinical and Laboratory Standards Institute (CLSI) [21,22]. This has in recent years been challenged by the studies conducted by the Nordic Committee on Antimicrobial Susceptibility Testing (NordicAST) and its associated members that have shown how various AST systems perform and documented specific flaws within the automated systems [23,24]. One of these studies showed that the EUCAST disc diffusion method and CLSI agar screen performed significantly better than the VITEK 2 system in detecting vancomycin resistance in enterococci. The second study evaluated EUCAST disc diffusion and supplementary automated MIC methods for detection of Carbapenemase producing Enterobacterales (CPE) and showed that EUCAST disc diffusion is a robust method to screen for CPE even though isolates with meropenem MICs <1 mg/L pose challenges, but that semi-automated methods had a high rate of ME. The results of this study support the conclusions of the previous NordicAST studies.
EUCAST introduced the term ATU in 2019 for disc diffusion and broth microdilution. EUCAST never defined what ATU should be for other methods, which often is not possible since many manufacturers do not reveal exactly how their MIC values are determined. Manufacturers of automated systems have also not included ATU in their interpretations, which makes it impossible to include them in our analyses.
One shortcoming of this study is that it is difficult to assemble a representative collection for all agents. Some isolates are better in elucidating certain resistance mechanisms. The selection of isolates to detect differences between the methods can always be discussed. It could also be argued that the selected isolates should have been whole genome sequenced (WGS) to elucidate the investigated resistance mechanisms, and that not including this is a disadvantage. However, WGS does not elucidate all resistance mechanismsin particular those related to expression of acquired or chromosomal genes, which are the ultimate determinants of the resistance phenotype. Another limitation of this study is that since it was carried out new cards have emerged with newer antibiotics that were not included, for example the VITEK2 AST-XN12 card. We still consider our findings significant for clinical practice.
In conclusion, even though the automated systems function quite well compared to gold standard broth microdilution for some strain-antibiotic combinations they unanimously have severe shortcomings for others. The shortcomings apply to antimicrobials of critical importance in the management of severely ill patients. The fact that piperacillin-tazobactam for K. pneumoniae and that many relevant antibiotics for P. aeruginosa were misclassified were among the more serious shortcomings. It raises the question whether automated systems can be relied upon in the management of patients with serious infections. NordicAST study group on automated AST.