Assessment of agreement using the equine glandular gastric disease grading system in 84 cases

Introduction Equine glandular gastric disease (EGGD) is a common condition causing signs of gastric pain although lesions are highly variable in their appearance. The only definitive method to diagnose EGGD ante‐mortem is gastroscopy. The current recommended method for describing these lesions is the European College of Equine Internal Medicine (ECEIM) guidelines; however, repeatability between users is variable. This study aimed to validate the reliability of lesion descriptions using ECEIM consensus guidelines, using four blinded equine internal medicine diplomates. Methods Ninety‐two horses with EGGD with pre‐ and post‐treatment gastroscopy images were identified using the electronic record at a UK equine hospital between 2012 and 2019. Eight horses were excluded due to non‐diagnostic images. Four blinded observers used the recommended grading system to describe images and outcomes. Intraclass correlation coefficients and Krippendorff's alpha were used to determine reliability and agreement, respectively. Results Intraclass correlation coefficient for severity was 0.782 (95% confidence interval [CI] 0.722–0.832), for distribution was 0.671 (95% CI 0.540–0.763), for the descriptor raised was 0.635 (95% CI 0.479–0.741), fibrinosuppurative was 0.745 (95% CI 0.651–0.812), haemorrhagic was 0.648 (95% CI 0.513–0.744), hyperaemic was 0.389 (95% CI 0.232–0.522) and for outcome was 0.677 (95% CI 0.559–0.770). Krippendorff's alpha for severity was 0.466 (95% CI 0.466–0.418), for distribution was 0.304 (95% CI 0.234–0.374), for the descriptor raised was 0.268 (95% CI 0.207–0.329), fibrinosuppurative was 0.406 (95% CI 0.347–0.463), haemorrhagic was 0.287 (95% CI 0.229–0.344), hyperaemic was 0.112 (95% CI 0.034–0.188) and for outcome was 0.315 (95% CI 0.218–0.408). There was moderate reliability determined between observers using intra‐class correlation coefficients and unacceptable agreement determined between observers using Krippendorff's alpha. Discussion These results suggest that the current grading system is not comparable between observers, indicating the need to review the grading system or define more robust criteria.


Discussion:
These results suggest that the current grading system is not comparable between observers, indicating the need to review the grading system or define more robust criteria.

K E Y W O R D S
ECEIM guidelines, equine, equine glandular gastric disease (EGGD), grading system INTRODUCTION The equine stomach has two regions separated by the margo plicatus.
Dorsally, squamous epithelium is present, and disease is described as equine squamous gastric disease (ESGD), which is typically ulcerative.
Ventrally, glandular mucosa is present, and disease here is described as equine glandular gastric disease (EGGD) (Banse & Andrews, 2019) and is typically non-ulcerative (Hallowell, 2018). These are distinct diseases with different pathophysiology, risk factors and clinical signs .
EGGD prevalence is high (25%-65%) particularly in sports and leisure horses (Begg & O'Sullivan, 2003;Hepburn, 2014;Luthersson et al., 2009;Pedersen et al., 2018;Sykes et al., 2018;, and over the past decade, recognition has increased dramatically (Rendle et al., 2018). Its pathogenesis is unknown, although is suggested to be a failure of gastric defence mechanisms, including altered mucosal blood flow affecting mucus and bicarbonate production, or an extension of inflammatory bowel disease (Hallowell, 2018;Sykes et al., 2018). EGGD lesions are inflammatory with intact lamina propria rather than ulcers or erosions (Hallowell, 2018). Although important in human peptic ulcer disease, helicobacter pylori has not been proven as a cause of disease in equine patients (Husted et al., 2010;Martineau et al., 2009).
Clinical signs of EGGD include poor performance, mild recurrent signs of abdominal pain, unexplained weight loss, altered appetite, increases in nervousness and aggression and changes in rideability including reluctance to tacking up and to go forward (Bowen, 2018;Varley et al., 2019). Clinical signs do not correlate with clinical lesions and cannot be used for diagnosis (Rendle et al., 2018). The only definitive method to diagnose EGGD ante-mortem is gastroscopy. This allows assessment of the presence, location and severity of lesions in addition to treatment response. Scoring systems have been replaced by non-linear descriptors of lesions (Table 1; Rendle et al., 2018;. However, mucosal appearance is not a good indicator of underlying pathological changes. Crumpton et al. (2015) demonstrated that lesion appearance correlates poorly with severity on histopathology.
The aim of the present study was to document the reliability of EGGD lesion descriptors using the European College of Equine Internal Medicine (ECEIM) consensus guidelines using four blinded observers who work together. Previous work has demonstrated no agreement when using all four descriptors combined and only weak agreement for each individual descriptor (Tallon & Hewetson, 2020). Our hypothesis was that there would be good agreement for all lesions as the four blinded observers all work together daily including sharing clinical cases.

METHODS
Horses that underwent gastroscopy with retrievable images between 2012 and 2019 were identified from the electronic patient record at a UK equine hospital. Those where EGGD was diagnosed by the attending veterinary surgeon were evaluated further. Horses where digital images could not be retrieved or where no follow-up examination occurred were excluded. Where horses underwent multiple therapies, each treatment period was assessed. Images of the pylorus and antrum before and after treatment were reviewed by four blinded observers all of whom are diplomates in equine internal medicine and who regularly treat horses with gastric disease. These observers were chosen as they all worked in the same institution. All observers were asked to respond to a questionnaire relating to their experience with gastric disease and their opinion on the current recommended descriptive grading system.  (Sykes, Hewetson, et al., 2015).

STATISTICAL METHODS
Lesion descriptions from observers were analysed using intra-class correlation coefficient (ICC) with the following criteria; two-way mixed effects, absolute agreement and multiple raters. This value and the corresponding 95% confidence intervals for each category are reported. A value of <0.5 was considered poor reliability, 0.5-0.75 moderate reliability, 0.75-0.9 good reliability and >0.9 excellent reliability (Koo & Li, 2016). Calculations were undertaken using IBM SPSS statistics for windows, version 28.0.
Agreement was also assessed using Krippendorff's alpha, and this value and the 95% confidence intervals for each category are reported.
Calculations were undertaken using IBM SPSS statistics for windows, version 28.0.

Horses
104 horses had two or more gastroscopies and relevant treatment. Of these 92 horses were included, 12 were excluded as glandular mucosa was normal on first presentation and a further eight were excluded due to non-diagnostic gastroscopy images. Five horses presented twice during the study period and were included as separate cases. Of those included, 56 were geldings and 28 were mares with a mean age of 9.8 years (4-21 years

Observers
Three of the four observers have been performing gastroscopy for >20 years, and the fourth has been performing gastroscopy for 5-10 years.
All hold at least one Equine Internal Medicine Diploma and two are RCVS recognised specialists in Equine Medicine. Two of the observers contributed to the UK EGGD consensus statement (Rendle et al., 2018), and all have read the UK and ECEIM consensus statements (Rendle et al., 2018;Sykes, Hewetson, et al., 2015). All observers have contributed to research into equine gastric disease, and one observer has contributed to >15 publications in this area. All currently use the ECEIM recommended descriptive grading system in clinical practice.
All observers consider the descriptive grading system fit for purpose for clinical cases to describe lesions and transfer cases between clinicians. However, none find it useful for determining prognosis or likely response to treatment as they are of the opinion that it does not correlate with disease.

Presentation
Glandular mucosa was adequately visualised in all 84 cases. At initial presentation, 53% had ESGD and EGGD and 47% presented with EGGD only. The most common lesion combination was moderate, multi-focal, raised and hyperaemic, which was seen in 33% of cases. For each category, the most common descriptors reported were as follows; moderate severity, multifocal distribution, raised lesions, non-fibrinosuppurative and non-haemorrhagic. Hyperaemia was more often present than absent. No lesions were described as depressed, and diffuse lesions were very rarely observed.

Grading consensus
Consensus for lesion description and outcome for all observers occurred in 3.6% of cases, and this occurred only when the lesions had healed. Consensus for outcome alone occurred in 32% cases. A majority where three observers agreed on all descriptors and outcome occurred in 42%.

Grading reliability and agreement
ICC values and their 95% confidence intervals are reported in

DISCUSSION
There are at present no validated grading systems for EGGD. Current recommendations are based purely on ECEIM consensus that description is the best method of recording these lesions (Sykes, Hewetson, et al., 2015). The limited agreement seen in this study when grading EGGD with the ECEIM descriptors highlights the difficulty in grading these lesions. EGGD lesions are highly variable. As the clinical importance of lesions is unknown, a numerical hierarchical grading system is not advocated (Sykes, Hewetson, et al., 2015). Although recommended by multiple consensus statements (Sykes, Hewetson, et al., 2015;Rendle et al., 2018), these results suggest that the current descriptive grading system is not repeatable between observers. This is further confirmed by Tallon and Hewetson (2020) who demonstrated no agreement when using all four descriptors (severity, shape, appearance and distribution) combined and only weak agreement for each individual descriptor. The limited agreement seen in the present study was unexpected, the four observers in this study work together, are all experienced equine medicine clinicians and perform gastroscopy daily, therefore theoretically should be more likely to agree. Tallon and Hewetson (2020) found that agreement did improve with experience; however, agreement was still considered to be poor among the diplomate group. It must be considered that the current system may be obsolete due to the limited repeatability and therefore other methods of grading lesions should be investigated.
Other grading systems have been suggested although these are based on a linear scale (Andrews et al., 1999;MacAllister et al., 1997;. A grading system with two separate scores for number of lesions and severity was assessed with five independent investigators which showed that variability between observers was not significant, although it did not demonstrate agreement (MacAl-lister et al., 1997). The Equine Gastric Ulcer Council (EGUC) developed an ordinal 0-4 scale (Andrews et al., 1999), which has previously been shown to high levels of agreement for squamous disease (Bell et al., 2007); however, this study only assessed the pylorus in seven horses and did not consider glandular disease as a separate entity. A more recent study by Wise et al. (2021) demonstrated good inter-and intra-observer reliability with the EGUC grading scale for both squamous and glandular lesions. However, numerical grading systems are not currently recommended for the assessment of glandular disease; therefore, this scale was not assessed in this study. Due to the subjective nature of grading EGGD and the wide spectrum of lesion appearance, it was thought a visual analogue scale (VAS) may perform better.
However, upon evaluation, a VAS had poor inter-observer reliability for grading glandular lesions (Wise et al., 2021). Currently no grading system has been compared to histopathological diagnosis (Banse & Andrews, 2019). Crumpton et al. (2015) demonstrated that lesion appearance poorly correlates with histopathological severity and in fact demonstrated that severe gastritis was associated with milder EGGD lesions. This brings into question the usefulness of a mucosal descriptive grading system in determining prognosis and treatment outcome. Equally, none of the ordinal grading systems have yet been investigated to determine whether they can predict prognosis and treatment response. Therefore, although grading systems are useful to monitor lesion progression, they cannot be used to predict outcomes.
It may be that the true issue is larger than the absence of a reliable grading system. In a study assessing gastric disease in weanling foals, two experienced clinicians only agreed on whether or not glandular disease was present in 52% of cases , suggesting that clinicians have different opinions of what is considered normal for the glandular mucosa. Previous studies have reported that the significance of hyperaemia without any other lesion or clinical signs is unknown but presumed insignificant, although a marked clinical response upon treating these lesions has been reported (Rendle et al., 2018;Sykes et al., 2017). Many clinical trials have defined lesion healing as grade 0 or 1 on a four-point grading system (Hepburn and Proudman, 2014;, suggesting the authors consider grade 1 disease clinically insignificant. In this study, a third of cases were classified as healed, and treatment cessation occurred when mild hyperaemia was still present. It is unknown if clinical signs resolved in these cases as no owner data were collected; however, it is presumed the horse was clinically normal. Further work to determine the significance of hyperaemia and its possible association with clinical signs is required. Though as the correlation between lesion grade, and therefore potentially hyperaemia, and clinical signs is reported to be inconsistent, it may be of value to initially assess if descriptors can predict clinical signs. The most common presenting lesion in this study was similar to previous data where common descriptors were moderate, multi-focal, raised and haemorrhagic along with severe, multifocal, raised and bleeding (Varley et al., 2019). Multi-focal, raised lesions were consistent across both studies, suggesting they may be most common. However, variability in scoring systems makes comparing studies difficult. Sykes et al. (2017) used a different grading system and found that moderately severe lesions were most common and that focal and multifocal lesions were described with equal frequency.
The majority of horses in this study were sports horses which is representative of the hospital case load. Due to the period over which data were collected, multiple gastroscope systems were used, some of which were of a better quality than others. It has been suggested that assessment of the appearance and colour of the gastric mucosa can be affected by different light settings and endoscopy systems (Rendle et al., 2018); therefore, this needs to be considered and may have affected lesion descriptions. The eight horses that were excluded due to non-diagnostic images had poor image quality as the images were too dark. Additionally, only still images were used in this study, whereas video is typically used in clinical practice.

CONCLUSIONS
The poor agreement demonstrated for the current non-linear scoring system proposed in the ECEIM consensus statement to classify EGGD in the horse suggests that it has limited value. Further investigations into grading systems along with the clinical significance of hyperaemia in a larger multi-centre clinical study are warranted.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

FUNDING INFORMATION
None.

This project was approved by the University of Nottingham Ethics
Committee.

DATA AVAILABILITY STATEMENT
Raw data were generated at Oakham Veterinary Hospital Derived data supporting the findings of this study are available from the corresponding author on request.