Accuracy of transcranial magnetic stimulation and a Bayesian latent class model for diagnosis of spinal cord dysfunction in horses

Abstract Background Spinal cord dysfunction/compression and ataxia are common in horses. Presumptive diagnosis is most commonly based on neurological examination and cervical radiography, but the interest into the diagnostic value of transcranial magnetic stimulation (TMS) with recording of magnetic motor evoked potentials has increased. The problem for the evaluation of diagnostic tests for spinal cord dysfunction is the absence of a gold standard in the living animal. Objectives To compare diagnostic accuracy of TMS, cervical radiography, and neurological examination. Animals One hundred seventy‐four horses admitted at the clinic for neurological examination. Methods Retrospective comparison of neurological examination, cervical radiography, and different TMS criteria, using Bayesian latent class modeling to account for the absence of a gold standard. Results The Bayesian estimate of the prevalence (95% CI) of spinal cord dysfunction was 58.1 (48.3%‐68.3%). Sensitivity and specificity of neurological examination were 97.6 (91.4%‐99.9%) and 74.7 (61.0%‐96.3%), for radiography they were 43.0 (32.3%‐54.6%) and 77.3 (67.1%‐86.1%), respectively. Transcranial magnetic stimulation reached a sensitivity and specificity of 87.5 (68.2%‐99.2%) and 97.4 (90.4%‐99.9%). For TMS, the highest accuracy was obtained using the minimum latency time for the pelvic limbs (Youden's index = 0.85). In all evaluated models, cervical radiography performed poorest. Clinical Relevance Transcranial magnetic stimulation‐magnetic motor evoked potential (TMS‐MMEP) was the best test to diagnose spinal cord disease, the neurological examination was the second best, but the accuracy of cervical radiography was low. Selecting animals based on neurological examination (highest sensitivity) and confirming disease by TMS‐MMEP (highest specificity) would currently be the optimal diagnostic strategy.

Clinical Relevance: Transcranial magnetic stimulation-magnetic motor evoked potential (TMS-MMEP) was the best test to diagnose spinal cord disease, the neurological examination was the second best, but the accuracy of cervical radiography was low. Selecting animals based on neurological examination (highest sensitivity) and confirming disease by TMS-MMEP (highest specificity) would currently be the optimal diagnostic strategy. Given the important consequences of a definitive diagnosis, the absence of a true gold standard test for CVCM or EDM/NAD in living horses is problematic. Equine degenerative myeloencephalopathy affected horses often have low serum vitamin E concentrations, 3 but for definitive diagnosis histopathology is required. Cervical vertebral compressive myelopathy can be detected by myelography, computed tomography (CT), CT myelography, and cervical radiography, but all these techniques still have limitations. Computed tomography scans, large enough to visualize C7 are rarely available and do not enable flexion and extension of the neck to evaluate dynamic compression of the spinal cord. With myelography, dynamic spinal cord compression can be visualized, but general anesthesia is required and the sensitivity appears rather low, 4 especially for the cranial parts of the neck. Cervical radiography might indicate narrowing of the vertebral canal, but the sensitivity (50%) is actually too low for definitive diagnosis. [5][6][7][8] So, a presumptive diagnosis is often based on the history of the horse and the clinical neurological examination. However, certainly in subtle cases, the agreement between observers is poor and differentiation from orthopedic causes might be challenging. 9,10 Transcranial magnetic stimulation (TMS) with recording of magnetic motor evoked potentials (MMEP) is a promising additional test for diagnosis of spinal cord dysfunctions in horses. [11][12][13][14][15][16] A magnetic 70 mm coil is placed on the head of the horse, at the level of the brain, to perform a magnetic stimulation. This induces descending volleys through the spinal cord, evoking a muscle contraction reflected by the MMEP on the electromyography (EMG) machine. On each MMEP, the latency time, the time between stimulation and onset of muscle contraction, can be measured, which is the most reliable variable. 12,13 In horses, the mean latency time of 4 MMEP is used, instead of the minimal latency time which is used in humans. In normal horses, the mean latency time is short and has a low SD whereas in horses with spinal cord disease, latency time is more variable and clearly prolonged. [14][15][16] A recent study that compared TMS with histopathology showed that for diagnosis of spinal cord dysfunction, the optimal cutoff values for latency time were 22 ms in the thoracic and 43 ms in the pelvic limbs. 20 However, these values have not been validated and neurological examination and cervical radiography have not been evaluated accounting for the absence of a true gold standard test. Therefore, the objectives of the present study were to compare the diagnostic accuracy of TMS, cervical radiography, and clinical examination using Bayesian latent class modeling to account for the absence of a gold standard and to determine the optimal diagnostic criterion for spinal cord dysfunction diagnosis by TMS.

| Study protocol and horses
A retrospective diagnostic test accuracy study was performed. The study population consisted of 174 horses (99 male castrated, 28 intact male, and 47 female), presented between 2008 and 2018 at Ghent University clinic for confirmation or exclusion of a neurological gait abnormality. All horses were evaluated by a neurological examination, TMS, and cervical radiography. On 75 horses, an orthopedic examination was also performed as orthopedic disease was suspected, but the results were not included in the study.

| Neurological examination
Each horse's neurological function was examined by at least 1 of 5 veterinarians of the clinical staff. All examiners had at least 3 years of experience in performing neurological examinations. Neurological examination was conducted using a previously published protocol. 17,18 The outcome of the neurological examination was summarized in grade of ataxia. 18 Briefly, grade 0 represented a normal horse. Grade 1 were animals with subtle deficits visible only under special circumstances and not always consistent. Grade 2 corresponded to animals with mild deficits, but visible at all gaits and tests, including walking in a straight line.
Grade 3 were horses with moderate deficits visible to any untrained eye from a distance. Grade 4 corresponded to severe deficits with risk of falling easily even if just standing. Recumbent horses, unable to stand, were classified as grade 5 ataxia. For analytic purposes, a binary outcome variable was created grouping horses with grade 0 and 1 as normal (negative test outcome), and horses with grade 2 to 5 as ataxic (positive test outcome).

| Cervical radiography
For all horses, lateral radiographs of the cervical vertebrae were made from the occiput to the first thoracic vertebra with a ceiling mounted Phillips X-Ray tube (80 kW). Output parameters varied from 70 kV/25 mAs for the cranial cervical vertebrae to 90 kV/90 mAs for C7-T1. A CR system (Agfa DXM) was used with a grid. All radiographs were anonymized and evaluated for any abnormalities by a blinded, boardcertified radiologist. Additionally, the intra-and intervertebral sagittal diameter ratios of the vertebral canal were measured at each cervical vertebra as described. 19 For both ratios, a cutoff value of 0.485 was used to distinguish between a normal and a narrowed vertebral canal indicative for spinal cord compression. 19

| Model development
15699In order to assess the accuracy of the 3 tests to detect spinal cord dysfunction, we considered a latent class model (1 population 3 tests 21 ) allowing for conditional dependence between 2 tests, namely TMS and radiography. We opted for this model because TMS measures the conductivity of the spinal cord, which is disturbed when compressed by bony structures, as measured radiologically. We modeled conditional dependence as previously described. 21 The prior information is a way to narrow parameter uncertainty when previous scientific information is available. In terms of prevalence and Se/Sp of tests, the priors are modeled using beta distributions that are naturally bound from 0 to 1. 21 The prior are informative if some values are less probable than others (eg, probability supposed to be higher than a specific value) or uninformative if any value has the same probability of happening (eg, the sensitivity can be anywhere A literature search delivered acceptable prior information for sensitivity and specificity of cervical radiography. For prevalence estimation of spinal cord dysfunction, ataxia, and TMS, information was limited to best guesses by the authors. For TMS, the only available information on diagnostic accuracy was the data set we previously used to identify optimum cutoff values. Hence, we opted to only use prior information on prevalence and sensitivity/specificity of cervical radiography. In all models, we used noninformative priors for Se TMS , Sp TMS , Se NeurEx , and Sp NeurEx . A noninformative prior gives an equal probability of any possible value from 0 to 1 which is parametrized as a uniform density from 0 to 1 or a distribution Beta (1, 1). We tested different TMS parameters in this Bayesian framework, comparing with ataxia and radiography.
Because there can be some criticism of the fact that the informative prior elicitation can be a process that could potentially have an impact on posterior density, especially for small data sets, it is recommended to run alternative models with different prior specifications to the main model. This process is called "sensitivity analysis" and is important to see if posterior estimates of alternative models are included in the 95% credibility intervals of the main model. 23 Assessment of model sensitivity to priors was therefore done by evaluating 3 models. The first model used noninformative priors on prevalence and the 3 tests. In model 2, prior information on prevalence of spinal cord dysfunction was added, and in model 3, prior information on prevalence, sensitivity, and specificity of radiography were added.

| Prior distribution determination process
Prior information was derived from available literature and expert opinion. As in the present study population including a lot of horses suspected of a neurological disease, the prevalence of spinal cord disease was estimated at 60% with 95% certainty it would be less than 90% (beta (1.4, 3.1)). The range in which the researchers were 95% confident that the true value of the prevalence was above (or below) was and to identify the TMS criteria with highest combined sensitivity and specificity, the Youden's index (sensitivity + specificity − 1) was used.

| RESULTS
The age of the horses ranged from 1 to 21 (median 5.5) years and their weight from 230 to 750 (median 555) kg. Most horses (146) were European Warmbloods, 9 were coldblooded types, 4 were Quarter horses, 3 Standardbred, and 1 was Thoroughbred. Eleven horses were presented for prepurchase examination, 58 were suspected to be ataxic, 34 horses showed signs of weakness, 52 presented an atypical lameness, and 19 performed poorly or were reluctant to work.
All latent class models converged. A conditional dependence scenario was used, because the study was underpowered to reject conditional dependence. All parameters were relatively stable across the different models with less than 5% variation compared to the posterior medians.
Estimated prevalence of spinal cord dysfunction varied for the different TMS decision criteria between 43.1 (29.3%-58.3%) and 60.5 (49.5%-70.8%). For every decision criterion, the variation between the different models was limited to maximal 5%. In Table 1   Note: The prior densities were either noninformative (beta (1, 1)) indicating that all probabilities from 0 to 1 were equally probable or informative. The covariance between the TMS and RX test were parametrized using Dendukuri and Joseph modeling. 21 The prior distribution of covDp was modeled as a uniform (U) probability bounded between 0 and a = min (Se RX , Se TMS ) − Se RX × Se TMS ), indicating that all values between these 2 bounds were equally probable. Similarly covDn was modeled as a uniform value between 0 and b = (Sp RX , Sp TMS ) − Sp RX × Sp TMS ).

| DISCUSSION
This study brought novelty to equine neurology in 2 ways. Not only was it the first study to evaluate TMS in a large population, it also is the first evaluation of available diagnostic tests for spinal cord dysfunction taking into account the absence of a gold standard. In the present study population, with a high prevalence of neurological dysfunction, mainly associated with spinal cord compression, TMS-MMEP was the best test to detect spinal cord dysfunction and had the highest specificity. The neurological examination was second best and had the highest sensitivity. The accuracy of cervical radiography, especially the sensitivity (40%-50%), was poor. In this study, we used a Bayesian latent class approach, which allows accounting for imperfect accuracy of the reference standard test. This methodology is currently the most useful reported strategy in these situations because it is at lower risk of bias than other techniques. 24 Composite reference standard test is commonly used in retrospective studies after reviewing the whole medical file of the patients. However, this approach has a higher risk of bias compared to the latent class approach. 25 Interestingly, the models converged well to their Posterior densities and were not sensitive to prior specification. The median posterior densities were all included within the 95% credible intervals of the main model. These observations are characteristics of a reliable, solid model. Also, despite that we anticipated a conditional dependence between TMS and RX, both covariance parameters were not different from 0 because the 95% credibility interval included 0. However, we chose to keep these covariances in our model because the study was not designed to reject a conditional dependence and might lack power to detect small covariances.
The low accuracy of cervical radiography is known [5][6][7]26  these cutoff values, the sensitivity will increase, but also the rate of false positives will increase. For example for C4, 8 out of 137 horses were considered positive, 3 of them were ataxic, but 5 were normal T A B L E 3 Posterior means and 95% credibility intervals of Bayesian latent class modeling for prevalence (Prev.), sensitivity (Se), and specificity (Sp) of neurological examination (NeurEx), cervical radiographs (RX), and TMS-MMEP (MMEP) to diagnose spinal cord disease in horses, using the mean latency times of the pelvic limbs  Note: The prior densities were either noninformative (beta (1, 1)) indicating that all probabilities from 0 to 1 were equally probable or informative. The covariance between the TMS and RX test were parametrized using Dendukuri and Joseph modeling. 21 The prior distribution of covDp was modeled as a uniform (U) probability bounded between 0 and a = min (Se RX , Se TMS ) − Se RX × Se TMS ), indicating that all values between these 2 bounds were equally probable. Similarly, covDn was modeled as a uniform value between 0 and b = (Sp RX , Sp TMS ) − Sp RX × Sp TMS ). Concerning the neurological examination, a limitation was that horses with grade 1 were also considered normal in the present study.
This decision was based on the fact that certainly in mild cases, the interobserver agreement about the presence of neurological abnormalities might be poor. 9,10 Therefore, caution is needed when taking decisions based on the clinical examination, especially when signs are subtle 10 or when orthopedic disease is present. As the study population also included horses suspected of having orthopedic disease and a positive diagnosis of neurological disease might have a serious impact, the authors chose to give the horses with grade 1 ataxia the benefit of the doubt. By considering horses with grade 1 abnormal, the sensitivity of the neurological examination to detect spinal cord dysfunction will increase, but specificity will decrease.
In conclusion, this study showed that TMS-MMEP, using the minimal or in second place the mean latency time of the pelvic limbs, is