Validation of the AASLD recommendations for classification of oesophageal varices in clinical practice

The American Association for the Study of Liver Diseases recommends the use of a 2‐grade classification system (small and large) to describe the size of oesophageal varices (OV). Data on observer agreement (OA) on this system are currently lacking. We aimed to evaluate this classification and compare it to the widely used 3‐grade classification (grade 1 ‘small’, grade 2 ‘medium’, grade 3 ‘large’) among operators of variable experience.


| INTRODUC TI ON
Oesophageal varices (OV) are a common finding in patients with liver disease. They occur in approximately 40% and 70% of patients with compensated and decompensated cirrhosis respectively. 1 Acute variceal bleeding is a life-threatening complication of OV with a 6-week mortality ranging between 16% and 26%. 2,3 Guidelines recommend endoscopic surveillance of patients with known cirrhosis or portal hypertension. 4 Index endoscopic assessments are frequently performed by endoscopists with varying levels of experience in liver disease and portal hypertension. During the procedure, the operator ascertains the location, size and appearance of the varices according to standard criteria.
The objectives of endoscopic assessment for variceal screening are two-fold. The first objective is to assess if varices are present or absent. The second objective if varices are present, is to determine whether or not they require treatment with non-selective beta blockers or endoscopic band ligation (EBL). 4 The latter decision is primarily based on the varices size and/or presence of high-risk features. 4 The timing of repeat procedures is also determined by the presence and size of varices. Therefore the accuracy and consistency in the classification of varices by endoscopists will have a direct effect on subsequent management.
The Japanese Research Society for Portal Hypertension originally described the 3-grade classification system which is still widely used. 5 It involves scoring varices as grade 1 (small), straight small calibre varices; grade 2 (medium), moderately enlarged, beady varices covering less than one-third of the lumen; and grade 3 (large), markedly enlarged, nodular or tumour-shaped varices occupying more than one-third of the lumen. 6,7 The American Association for Study of Liver Disease (AASLD) proposed the 2-grade classification system. This system was originally created by the North Italian Endoscopy Club who found it to be predictive of variceal bleeding. 8 The system was endorsed by a consensus meeting (Baveno I, 1992). 4 It involves classifying variceal size into either small or large. The classification can be quantitative with a cut-off diameter of 5 mm as measured by an open biopsy forceps or semi quantitative using grade 1 above as small and grade 2/grade 3 as large. The quantitative approach is not widely used in clinical practice because of its challenging nature and doubtful accuracy. The technical difficulty is created by the variable degree of air insufflation, breathing pattern and peristalsis. Endoscopic examination should be performed on both minimal and maximal insufflation in order to avoid misclassification ( Figure 1). Clinically, grades 2 and 3 are regarded as varices needing treatment and treated the same way.
Data on the interobserver agreement and therefore reproducibility of the 2-grade classification system are lacking. Moreover, this system has not been compared to the more widely used 3-grade approach in adults. The primary aim of this study was to compare observer agreement (OA) on variceal classification using the 2-grade vs the 3-grade system. The secondary aim was to assess the impact of observer experience on the level of agreement.

| Study design
This was a prospective repeatability and reproducibility study in a tertiary referral centre (Nottingham University Hospitals NHS Trust, Nottingham, UK). All the participants provided written informed consent and this study received approval from the East Midlands Nottingham 1 research ethics committee. Endoscopy procedures were performed between 31 July 2012 and 25 February 2014 using a high definition system and videos were digitally recorded. Nine independent observers assessed OV on the video recordings. The same nine observers as well as the reference observer re-assessed the same video recordings after an interval of at least 1 year to assess for intra-observer agreement.

| Participants and interventions
We recruited consecutive patients with a clinical diagnosis of cirrhosis who were scheduled for a diagnostic gastroscopy as part of screening or surveillance for OV during their routine clinical care.
All procedures were performed by a single experienced endoscopist (Performed more than 1000 procedures and regularly performed EBL for at least 3 years). A 9.8-mm diameter high definition endoscope (GIF-H260; Olympus Key-Med) was used. A standardised

Key points
• Increased blood pressure in the abdomen can happen as a result of scarring of the liver. Blood vessels around the food pipe are fragile and often do not tolerate an increase of their pressure. This puts them at risk of rupturing into the food pipe leading to blood vomit which could be a threat to life.
• A camera test is advisable to check the state of the food pipe blood vessels. During the camera test, a careful evaluation of such blood vessels is important. According to the camera test evaluation, treatment is indicated in the form of medications to decrease the pressure or direct application of elastic bands on to the blood vessels.
• In this study we compared the observer agreement using two different grading systems for evaluation of the food pipe blood vessels. We also tested the impact of observer experience on their consistency of evaluation.
• We found that there was no significant difference between the two grading systems. However, there was a significantly higher consistency amongst experienced observers in identifying the presence and stage of such risky blood vessels. These results may have implications on training and service redesign.
recording protocol was used after analgesia and/or sedation (Table   S1). Prior to unsedated procedures, topical pharyngeal anaesthesia was applied to the posterior pharynx (5-10 sprays, Lidocaine 10 mg/ dose, Xylocaine; AstraZeneca). In case of patient preference for sedation, Midazolam (Hameln Pharmaceuticals Ltd) with or without pethidine was used.

| Rating and data collection
One-hundred anonymised video recordings from 100 patients were digitally stored (evaluation set). Nine blinded endoscopists (observers) excluding the endoscopist who recorded the procedures evaluated all the videos independent of each other and in a random order. with confidence impossible, 10 = excellent views allowing for diagnosis with utmost confidence). Semiquantitative morphological assessment of variceal size was used by observers for both classification systems.

| Outcome measures
The primary outcomes were inter-and intra-observer agreement among the nine assessors using each of the two classification systems. Secondary outcomes were inter-and intra-observer agreement among the nine assessors stratified by level of experience (hepatologists vs luminal gastroenterologists vs trainee gastroenterologists).

| Statistical analysis and sample size calculation
Outcomes were measured using either intraclass correlation coefficient (ICC) or kappa (κ) statistic as appropriate, both of which summarise agreement within or between observers in comparison to the probability of agreement by chance. Test statistics were generated according to published methodologies as follows. 9 For categorical data (ie varices present vs varices absent), Cohen's kappa was used in case of two observations (ie intra-observer agreement) and Fleiss kappa in case of more than two observations (ie interobserver agreement). For ordinal data (ie 2-grade staging system and 3-grade staging system), absolute agreement ICC was used, analysis of variance was performed using a two-way random effects model

| Sample size calculation and statistical analysis
The total number of videos required for analysis was calculated using the method suggested by Zou. 11 Assuming a true test-value of 0.59 (based on previous literature), 12 our nine observers were required to rate a minimum of 93 videos to yield 80% power with a confidence interval ± 0.12 for agreement. 12 A total of 19 assessments were performed for each video including; nine initial assessments by the observers; nine interval assessments by the observers; as well as one interval assessment by the reference endoscopist.
R-statistical computing (R version 3.4.1, Vienna, Austria) was used for analysis. The R-library was used to calculate interobserver reliability (irr). The R-libraries 'reshape2' and 'ggplot2' were used for data visualisation and 'ICC.sample.size' was used for sample size calculation. 11

| Baseline characteristics
One-hundred patients were recruited to the evaluation set to allow for low quality videos or missing data. A summary of the descriptive statistics can be found in Table 1.  Table 2 outlines the prevalence of scoring grades according to various evaluations performed. Absence of varices was almost identical using both grading systems. There was a consistent drop in the prevalence of grade 1 when using the 3-grade system as compared to grade 'small' using the 2-grade system, this was not statistically significant (chi-square test; P = .3).

TA B L E 1 Baseline characteristics of patients and videos (n = 100)
Figure S1 provides a summary of scores provided during all assessments performed.
Interobserver agreement amongst subgroups of observers is outlined in Table 3 and Figure 2.

| Impact of experience on observer evaluation
Hepatologists had significantly higher intra-observer agree-

| D ISCUSS I ON
Careful endoscopic evaluation is one of the cornerstones in management of OV. The AASLD recommends the use of a 2-grade classification system which has been previously validated as a predictor of variceal haemorrhage as opposed to the 3-grade system. Our study shows that there is no difference in both inter-and intra-observer agreement between the two systems among observers of variable experience. Hepatologists had significantly higher intra-observer agreement compared to the other two groups (Figure 3). Therefore, they may be better suited for assessing OV as they appear to be more consistent in evaluating varices over time.
The interobserver agreement of the 2-grade system was compared to the 3-grade system on a previous study. On this study, pre- To our knowledge, this is the only classification that has been directly compared to the 2-grade system. 13 These data have significant shortcomings: firstly, it was published as a letter to the editor so no critical appraisal of the methodology is possible; secondly, it was based on video tape recording of 1990s endoscopic technology with image resolutions far less than what is currently being used in clinical practice; thirdly, recordings were reviewed by experts. Therefore, this is unlikely to reflect current practice. Other studies investigating interobserver agreement have also been published. 12,[14][15][16][17][18][19] None of the studies directly compared between the commonly used 3-grade system vs the recommended 2-grade system in adults. We found a significant improvement in intra-observer agreement with experience of the endoscopist (Figure 3). None of the Previous studies evaluated for intra-observer agreement which is an important factor in assessing the validity and reproducibility of a classification system and can help rationalise performer allocation to endoscopy lists. week. All videos were prospectively recorded using the same standardised protocol set a priori (Table S1) in order to further reduce operator bias. The use of video rather than still images enables more realistic and unbiased views for observers representative of real-life practice. Finally, we evaluated for interval intra-observer agreement to assess which observers are likely to be more consistent with their own decision-making over time and therefore may be more suited for undertaking these procedures in clinical practice.
This study also has some limitations. The observers did not have  Table   S1. Which is never the case in real-life.
These data support the use of both the 3-grade and 2-grade classification systems; the latter has been validated as a predictor of variceal haemorrhage and is recommended by the AASLD clinical practice Our study suggests that hepatologists, who all perform dedicated varices lists at our centre, were more consistent as evidenced by the significantly higher intra-observer agreement on both the presence vs absence as well as the grade of OV ( Figure 3). This is unlikely to be the case for hepatologists who do not perform regular variceal screening.
Whether or not there is a 'glass ceiling' with subjective human classification systems, ie possibly no classification system will im- and platelet-spleen ratio score 24 have been validated as good negative markers to predict the absence of OV. Such non-invasive tests may help minimise the subjective nature of human classification systems and reduce the overall work-load of variceal screening endoscopy. This will enable the evaluation of selected cases on dedicated lists for the assessment of portal hypertension.
This prospective investigation of the inter-and intra-observer agreement among nine observers with 100 videos revealed substantial agreement using both the 2-grade and 3-grade classification systems. This provides validity for using the 2-stage system which has been validated as a predictor of variceal haemorrhage and is recommended by the AASLD. Hepatologists had significantly higher levels of consistency in identifying both the presence and stage of OV. This may have implications to create alternative training models for residents and fellows in the recognition and grading of OV.

D I SCLOS U R E
The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

CO N FLI C T O F I NTE R E S T
There are no known personal or financial conflicts of interest to report.