Is Ultrasound a Valid and Reliable Imaging Modality for Airway Evaluation?: An Observational Computed Tomographic Validation Study Using Submandibular Scanning of the Mouth and Oropharynx

Authors


  • This work was supported by departmental funding. Drs Abdallah and Chin are also supported by the Merit Award Program, Department of Anesthesia, University of Toronto. Dr Vincent Chan received equipment support from BK Medical Systems (Wilmington, MA), Philips Medical Systems (Bothell, WA), SonoSite (Bothell, WA), and Ultrasonix (Richmond, BC, Canada). None of the other authors report any conflicts of interest. Portions of this report were presented as an abstract at the 38th annual meeting of the International Anesthesia Research Society, May 17–20, 2014, Montreal, Quebec, Canada. The abstract won the “Best of Category” award in the Airway Management category. Drs. Faraj W. Abdallah and Eugene Yu contributed equally to this work. The authors thank Mr. Cyrus Tse, Research Assistant, University Health Network, Toronto, Canada.

Abstract

Objectives

Ultrasound (US) imaging of the airway may be useful in predicting difficulty of airway management (DAM); but its use is limited by lack of proof of its validity and reliability. We sought to validate US imaging of the airway by comparison to CT-scan, and to assess its inter- and intra-observer reliability. We used submandibular sonographic imaging of the mouth and oropharynx to examine how well the ratio of tongue thickness to oral cavity height correlates with the ratio of tongue volume to oral cavity volume, an established tomographic measure of DAM.

Methods

A cohort of 34 patients undergoing CT-scan was recruited. Study standardized assessments included CT-measured ratios of tongue volume to oropharyngeal cavity volume; tongue thickness to oral cavity height; and US-measured ratio of tongue thickness to oral cavity height. Two sonographers independently performed US imaging of the airway before and after CT-scan.

Results

Our findings indicate that the US-measured ratio of tongue thickness to oral cavity height highly correlates with the CT-measured ratio of tongue volume to oral cavity volume. US measurements also demonstrated strong inter- and intra-observer reliability.

Conclusions

This study suggests that US is a valid and reliable tool for imaging the oral and oropharyngeal parts of the airway, as well as for measuring the volumetric relationship between the tongue and oral cavity, and may therefore be a useful predictor of DAM.

Abbreviations
CT

computed tomography

DAM

difficulty of airway management

STROBE

Strengthening the Reporting of Observational Studies in Epidemiology

US

Ultrasound

Difficult airway management (DAM) is the most common cause of anaesthesia-related morbidity and mortality,[1] accounting for 39% of all mortality and morbidity occurring under anaesthesia.[2] While the etiology of DAM is multifactorial, several DAM predictors have been identified.[3] Assessment of the airway relies on several morphometric measurements including the visual assessment of the Mallampati score, an estimate of tongue volume relative to oral cavity volume.[4] This score is the most commonly used predictor of DAM[4]; but its sensitivity is limited and does not exceed 57%.[5] It is also largely observer-dependent, with limited inter-observer reliability.[6] To enhance its diagnostic value, the score may be combined with other morphometric measures, such as neck circumference,[7] mouth opening, upper-lip-bite,[8] occipitoatlantoaxial extension,[9] and thyromental distance.[10] Nevertheless, evidence suggests that this Mallampati score, alone or in combination with additional measures, is not highly reliable in predicting DAM.[11] Consequently, the presence of a difficult airway may not be discovered until a laryngoscopy is attempted—a situation that carries potential risks, with a reported incidence of 1.5 to 8.5%.[12]

Additional bedside tests that increase the capacity to predict DAM would improve safety. Ultrasound (US) imaging is a simple non-invasive tool that may permit a more objective assessment of tongue volume relative to oral cavity volume than the Mallampati score. However, US literature has focused primarily on examining airway anatomy,[13-15] and little has been done to explore its role in DAM.[16-19] This is likely the result of a lack of i) systemic validation of US for airway imaging, and ii) knowledge of the sonographic parameters that can predict DAM.

We aimed to validate US as an airway imaging modality by comparing it to the gold standard, ie, computed tomography (CT),[20] using proven predictors of DAM. Evidence from cone beam CT and acoustic reflectometry studies of the airway[20-23] suggests that i) the ratio of tongue volume to oral cavity volume is predictive of DAM, and that ii) unidimensional measurements of tongue and oral cavity size can be used to accurately estimate the three-dimensional volumetric measures of these structures.

Using the submandibular window[14, 15] for imaging of the tongue, oral cavity, and oropharynx, this observational study aims to prove that the US-measured ratio of tongue thickness to oral cavity height is an accurate and reliable estimate of the ratio of tongue volume to oral cavity volume, as estimated by CT. To achieve that, we sought to test the joint hypothesis that i) the CT-measured unidimensional ratio of tongue thickness relative to oral cavity height is a valid estimate of the CT-measured ratio of tongue to oropharyngeal cavity volumes; and ii) the US-measured ratio of tongue thickness relative to oral cavity height is, in turn, a valid estimate of the same ratio as measured by CT. Furthermore, we also sought to assess the inter- and intra-observer reliability of the submandibular US scan of the mouth and oropharynx.

Materials and Methods

This observational study was approved by the University Health Network Research Ethics Board (Toronto, ON, Canada), and completed at the Princess Margaret Hospital in Toronto, Canada, a teaching hospital affiliated with University of Toronto. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines[24] were adhered to in the preparation of this manuscript.

Study Participants

A cohort of 42 adult outpatients, aged 18–85 years and scheduled for a high resolution CT scan of the head and neck as part of the management of lymphoma, breast, bladder, and cervical cancers were recruited to participate in this prospective observational study between October 10, 2012 and June 4, 2014. As this is a validation study of US imaging, we did not specifically seek patients with known DAM risks factors. Based on the imaging schedule, available in advance, potential study participants were recruited at the time of their planned CT scan. The purpose of the study was outlined, and a printed information package was provided to all eligible patients during an interview preceding the scan. Exclusion criteria included inability to provide informed consent; presence of cognitive or psychiatric history; a language barrier that may interfere with assessment; multiple amalgam teeth fillings interfering with CT imaging; a history of radiation or prior major surgery in the mouth, head, or neck; known tongue, oropharyngeal, laryngeal, or head and neck cancers, tumors, or anatomy-altering conditions; any notable swelling, scarring, cysts; ongoing infections in the mouth, head, or neck.

Pre-imaging Assessment

After informed consent and explanation of study procedures, two sonographers (anaesthesia fellows, P.C. and S.A.), experienced with sonographic imaging of the oral cavity, independently performed the clinical airway evaluation and US measurements in all study participants. Clinical airway assessment was performed in the sitting position with the head in a neutral position to determine the Mallampati score, thyromental distance, mouth opening, and upper-lip-bite. The anatomical submandibular midline was determined, and a sagittal line was marked on the skin—from the midpoint of the chin anteriorly to the laryngeal incisor of the thyroid cartilage posteriorly and inferiorly.

Positioning for CT and US Scan

The study subjects were asked to position their heads on a CT-scan pillow to reproduce the same extended position during both US and CT examinations, a sniffing position that demands extension at C1-2 cervical vertebra, and flexion at lower c-spine.[25] The subjects were also trained to hold 5 mL of water in the mouth while they lay supine and with their mouth wide open during imaging. The open mouth, head extension and supine position all served to simulate the clinical intubation conditions while the small volume of water in the mouth helped to improve palate visualization by reducing air attenuation in the oral cavity.[26] The use of water to facilitate the sonographic static and dynamic visualization of the tongue has been described earlier.[27-29] Patients who were consistently unable to perform this manoeuvre during the pre-scanning preparation were excluded from the study. Finally, to reduce the potential respiratory variability in the airway dimensions,[23, 30] all CT and US measurements were taken while the study participants held their breath in expiration.

CT Scan

In each participant, the clinical CT-scan protocol (Toshiba Aquilion 64 helical scan, Toshiba of Canada, LTD; 120kVp scan with Sure exposure SD6 settings) aimed to obtain 2-mm thickness axial and sagittal slices of the head and neck with contrast injection. The clinical protocol was completed in the supine position, with patient's head fully extended on the CT-scan pillow.

For the study-related CT imaging, a subsequent scan was performed with the head in the same extended position, but with the mouth open and holding 5 mL of water, as described above. The CT scan sought to measure i) the maximal tongue thickness, ie, the measured distance between the superior surface of the geniohyoid muscle and the superior surface of the tongue at the point of maximal thickness; and ii) the height of the oral cavity, ie, the measured distance between the superior surface of the geniohyoid muscle and the hard palate. These two measurements allowed calculation of the CT-measured ratio of tongue thickness relative to oral cavity height. Furthermore, a three-dimensional, digital image reconstruction software program (Vitrea 2 Workstation, version 4.1.14.0; Vital Images Inc, Minnetonka, MN) was used to calculate the CT-measured ratio of tongue volume relative to oropharyngeal cavity volume.

While there is no ambiguity over what constitutes the tongue,[26] the term “oropharyngeal cavity,” in the setting of our study, refers to the combination of oral cavity and oropharynx.[23] The choice of measuring the total oropharyngeal volume instead of limiting the measurement to the oral cavity volume is driven by several motives including: i) the fact that the posterior tongue in patients positioned supine, particularly in cases of loss of tone, obesity, or obstructive sleep apnea, frequently extends into the oropharynx[31]; ii) from a volumetric perspective, the oropharynx is considered an integral part of the upper airway[20, 32]; and finally, iii) the consensus to incorporate the oropharyngeal volume as part of the oral cavity volume in earlier imaging studies of the tongue and oral cavity volumes.[23, 33]

For the purpose of this study, the oropharynx, per se, is considered to be the area limited superiorly by the palatal plane extended to the posterior wall of the pharynx, and inferiorly by a horizontal line through the superior point of the epiglottis.[34] Figure 1 depicts the boundaries of the oropharyngeal cavity used in estimating its volume.

Figure 1.

Dimensions of the tongue (yellow line) and oropharyngeal cavity (red line) used in estimating their respective volumes

US Scan

The US scan of the airway was performed twice, before and after the CT-scan. In each US imaging session, both sonographers (P.C. and S.A.) independently performed the US measurements of the tongue thickness and the oral cavity height. Both sonographers were blind to each other's measured values.

Patients were positioned in the same manner as for the CT scan, with their mouth open, holding 5 mL water, as described above. US examination was performed with the study subject resting supine. Using a SonoSite Turbo (SonoSite Inc, Bothell, WA) US system, a 5-2 MHz, 60-mm broadband curved array transducer was placed in the sagittal plane, along the pre-marked line, to perform a submandibular scan to obtain the best possible view to visualize the tongue in its maximal thickness and the maximal height of the oral cavity. The US scan sought to identify the i) geniohyoid muscle extending between the mandible and the hyoid bone; ii) the tongue (anterior, posterior, and superior surfaces); and iii) palate, as shown in Figure 2. The distances from the superior surface of the geniohyoid muscle to the superior surface of the tongue and to the palate were measured using the US caliper function. To standardize the measurement, a vertical cursor was used as a reference, and measurements were recorded only when measuring-calipers were aligned parallel to the cursor. Figure 2 illustrates the measurements made. All US measurements were performed in real time and saved; the image files were later digitally recorded (DVD recorder) and the ratio of tongue thickness to oral cavity height was calculated.

Figure 2.

Submandibular ultrasound scan and measurement technique of tongue and oral cavity. White line is a vertical reference cursor; Green line depicts maximal tongue thickness; Yellow line depicts oral cavity height.

Abbreviations: AT, anterior tongue; GH, geniohyoid muscle; H, shadow of the hyoid bone; M, shadow of the mandible; P, palate; PT, posterior tongue; ST, superior tongue.

Outcome Measures

The CT-measured ratio of tongue thickness relative to oral cavity height was determined by the CT study. Subsequent digital reconstruction permitted calculating the CT-measured ratio of tongue volume relative to oropharyngeal cavity volume. These two ratios were compared.

To assess the validity of the US imaging for mouth and oropharynx assessment, the ratio of tongue thickness relative to oral cavity height (as determined sonographically) was compared sequentially to the same ratio as measured by CT, and to the CT-measured ratio of tongue volume, relative to oropharyngeal cavity volume. Furthermore, to assess the reliability of the US imaging, the averages of the two US measurements (obtained independently by each of the two sonographers) were compared to evaluate the inter-observer reliability, while the two measurements completed by each sonographer were used to evaluate the intra-observer reliability.

Sample Size Calculation

We sought to test the joint hypothesis that i) CT-measured ratio of tongue thickness relative to oral cavity height highly correlates with the CT-measured ratio of tongue to oropharyngeal cavity volumes, and that ii) the US-measured ratio of tongue thickness relative to oral cavity height highly correlates with the same ratio as measured by CT. We sequentially tested our joint hypotheses as a serial gatekeeping procedure; specifically, we proceeded to test the second part of the hypothesis only after demonstrating the first part.[35]

A sample size estimate for this observational study was planned to permit independent sequential testing of each element of the joint hypothesis. To calculate the sample size needed for assessing the agreement between the CT-measured ratio of tongue thickness, relative to oral cavity height, and the CT-measured ratio of tongue volume, relative to oropharyngeal cavity volume, we used the Pearson's correlation coefficient (R). Pilot data from eight patients suggested an r value = .9 for these two measurements. We estimated that 17 patients per group (imaging modality) will be needed for a two-tailed Pearson's test to detect such correlation with a type I error (alpha) of .05 and power (1-beta) of 90%. We inflated the sample size by 20% to account for attrition resulting from potential technical difficulties associated with the imaging technique. Therefore, we planned to recruit 21 patients per group, or 42 patients in total.

This sample size also permits 80% power to examine the correlation between the measurements of the ratio of tongue thickness relative to oral cavity height as obtained from two different imaging techniques, namely US and CT scan.

Statistical Analysis

For evaluating the validity of the US scan of the mouth and oropharynx, the results of the CT-measured ratio of tongue thickness relative to oral cavity height, and the CT-measured ratio of tongue volume relative to oropharyngeal cavity volume, were first compared using linear regression analysis. Subsequently, the CT- and US-measured ratios of the tongue thickness relative to oral cavity height were compared using linear regression analysis. The accuracy of our assumptions was then confirmed by a direct comparison of the US-measured ratio of tongue thickness/oral cavity height to the CT-measured ratio of tongue volume/oral cavity volume using linear regression analysis. Finally, the degree of agreement between the two imaging techniques was evaluated using the Bland-Altman/Tukey mean-difference technique by plotting the mean differences and the limits of agreement between the CT and US measurements of the ratios of volume and thickness, respectively.[36, 37]

For evaluating the intra-observer and inter-observer agreement, we selected the Cohen's Kappa (κ) statistic because of its ability to account for observer bias, agreement attributed to chance, as well as its capacity to capture actual concordance rather than trends.[38] The κ values can range between −1 (complete disagreement) to 0 (chance agreement) to +1 (perfect agreement). Observer agreement is interpreted according to the measured κ values; where ranges of [.0, .4], [.4, .6], [61, .8], [.81, 1.0] are designated to reflect poor-to-fair, moderate, substantial, and almost perfect agreement, respectively.[39] Based on Cantor's approach[40] and using the Gwet's variance calculation,[41] under the assumptions of i) 0% probability of chance agreement, ii) an 80% probability of actual agreement, and iii) a 20% margin of relative error, a total sample size of 39 patients is needed for assessment of inter-observer reliability of the US scan of the mouth and oropharynx by the two anaesthesiologists.

Statistical analysis was performed using the SPSS statistical package for Windows (version 22; SPSS Inc, Chicago, IL). Continuous data are expressed as mean (standard deviation) or mean (95% confidence interval). A two-tailed P-value of .05 or less was designated as the threshold of statistical significance.

Results

A total of 42 patients were enrolled in this study; eight patients were excluded because of inability to hold the water in their mouth, and maintain an open mouth while lying supine for the duration of the imaging session. The data from the remaining 34 patients were included in the analysis. Table 1 summarizes the demographic characteristics of the study participants. Table 2 describes the procedural time for the study-related training and imaging.

Table 1. Characteristics of Study Participants
VariableValue
  1. Values are expressed as mean ± standard deviation or absolute numbers.

  2. Abbreviations: ASA, American Society of Anesthesiologists, CT, computed tomography; F, female; M, male.

Age (y)56.9 ± 16.2
Gender (M/F)(24/10)
BMI, kg m-227.4 ± 3.9
ASA classification (II/III)(29/5)
Mallampati score (I/II/III)(12/21/1)
Thyromental distance, cm6.8 ± .9
Mouth opening, cm5.5 ± .7
Upper-lip-bite (I/II)(26/8)
Reason for CT-scan 
Lymphoma21
Melanoma3
Cervical cancer4
Breast cancer3
Bladder cancer3
Table 2. Procedural Time
Study ProcedureMean Time Requirement
  1. Abbreviations: CT, computed tomography; min, minute; sec, second; US, ultrasound.

Pre-imaging assessment3 min
Training on positioning and holding the water in the mouth2 min
Holding water in the mouth during imaging2 min (US study)
 1 min (CT study)
Holding breath in expiration during imaging30 sec (US study)
 15 sec (CT study)
Total study-related imaging time6 min (US study)
 3 min (CT study)

Ratios of Airway Dimensions

The submandibular scanning of the mouth and oropharynx determined the following ratios of airway dimensions, expressed as mean (95% confidence interval): .72 [.70, .75], .74 [.72, .77], and .78 [.76, .80], for the CT-measured ratio of tongue volume/oral cavity volume, CT-measured ratio of tongue thickness/oral cavity height, and US-measured ratio of tongue thickness/oral cavity height, respectively.

Validity

Linear regression analysis was used to compare the measured ratios of airway dimensions.

Comparing the CT-measured ratio of tongue thickness/oral cavity height to the CT-measured ratio of tongue volume/oropharyngeal cavity volume suggested that the first ratio strongly correlates with the second (R = .94, P < .0001). (Figure 3, Table 3) Comparing the US-measured ratio of tongue thickness/oral cavity height to the same ratio as that measured using CT scan suggested that the first ratio strongly correlates with the second (R = .87, P < .001) (Figure 4, Table 3).

Figure 3.

Linear regression correlation between CT-measured ratio of tongue thickness/oral cavity height and CT-measured ratio of tongue volume/oral cavity volume (R = .94, P < .0001).

Abbreviations: CT, computed tomography imaging; P, P-value; R, correlation coefficient; US, ultrasound imaging.

Figure 4.

Linear regression correlation between US-measured ratio of tongue thickness/oral cavity height and CT-measured ratio of tongue thickness/oral cavity height (R = .87, P < .001).

Abbreviations: CT, computed tomography imaging; P, P-value; R, correlation coefficient; US, ultrasound imaging.

Table 3. Results
Imaging ModalityMeasurementStatistical TestInterpretationSignificance
  1. a

    R, Pearson's correlation coefficient.

  2. b

    κ, Cohen's Kappa.

  3. Abbreviations: CT, computed tomography; US, ultrasound.

CTRatio of tongue volume to oropharyngeal cavity volume.94aCT-measured thickness (or height) accurately estimates CT-measured volumeUnidimensional CT measurement accurately estimates three-dimensional CT measurement
CTRatio of tongue thickness to oral cavity height
CT vs. USRatio of tongue thickness to oral cavity height.87aUS-measured thickness (or height) accurately estimates corresponding CT-measured valuesUnidimensional US measurement accurately estimates unidimensional CT measurement
CTRatio of tongue volume to oropharyngeal cavity volume.85aUS-measured thickness (or height) accurately estimates CT-measured volumeUnidimensional US measurement accurately estimates three-dimensional CT measurement
USRatio of tongue thickness to oral cavity height
US, same operatorRatio of tongue thickness to oral cavity height.84bRepeated measurement by same operator are consistentGood intra-operator reliability
US, different operatorsRatio of tongue thickness to oral cavity height.81bMeasurements by different operators are consistentGood inter-operator reliability

Moreover, the direct comparison of the US-measured ratio of tongue thickness/oral cavity height to the CT-measured ratio of tongue volume/oropharyngeal cavity volume confirmed the validity of our assumption that the first ratio strongly correlates with the second (R = .85, P = .005) (Figure 5, Table 3).

Figure 5.

Linear regression correlation between US-measured ratio of tongue thickness/oral cavity height and CT-measured ratio of tongue volume/oral cavity volume (R = .85, P = .005).

Abbreviations: CT, computed tomography imaging; P, P-value; R, correlation coefficient; US, ultrasound imaging.

Finally, the Bland-Altman plot regression showed very good agreement between the two imaging modalities (R = .81, P = .03). As seen in Figure 6, except for three patients, the majority of CT-measured ratios were greater than the US-measured ratios, suggesting that US consistently under-estimated the CT-scan.

Figure 6.

Bland-Altman plot of the agreement of the ratios of volumes as measured/estimated by the two imaging techniques. Dotted lines depict the 95% confidence interval of the limit of agreement.

Abbreviations: CT, computed tomography imaging; US, ultrasound imaging.

Reliability

The values of Cohen's Kappa statistic for inter- and intra-operator comparisons were .84 and .81, respectively, suggesting strong inter- and intra-operator reliability (Table 3).

Discussion

Our findings confirm that the US-measured ratio of tongue thickness to oral cavity height highly correlates and is in agreement with the CT-measured ratio of tongue volume to oral cavity volume. Importantly, these findings indicate that US imaging is a valid and reliable tool for airway imaging and measurement of dimensions potentially predictive of airway difficulty. Notably, our results controvert earlier conclusions that linear measures of airway dimensions are weak correlates of volumetric measurements.[42] Further studies are needed to determine the sonographic parameters that can predict DAM, and to explore the utility of US scan of the airway in the assessment of DAM in the clinical setting. The implications of our findings are not limited to DAM, but may also prove to be useful in screening for obstructive sleep apnea, a condition where the volumes of the tongue and the oropharyngeal cavity have a diagnostic value.[43, 44]

The clinical utility of US in predicting DAM has so far been limited by the lack of evidence of validity and reliability of this imaging modality in measuring dimensions of the relevant airway structures. Effectively, computed tomography and magnetic resonance imaging have constituted the gold standard of airway imaging[23, 45]; but their clinical utility is limited by cost and availability. However, by validating the use of US imaging in the scanning of airway structures, this observational study sets the stage for future sonographic studies to identify additional airway variables, including volumetric parameters, which may be more predictive of DAM than the currently used, morphometric variables. Furthermore, the known morphometric predictors of DAM may now be re-evaluated in the setting of airway US scan. In our opinion, US imaging of the airway is a simple, affordable, and practical tool that carries the potential to emerge as a useful clinical-and-research tool for evaluation of the airway.

Our work has several limitations. First, our aim was to validate the use of US in imaging of the airway by using an accepted tomographic predictor of DAM, rather than demonstrate the utility of US, per se, in screening for DAM. We did not seek to identify or substantiate the use of novel airway parameters that may be predictive of DAM. To this end, the study procedures described should not be considered readily transferrable into clinical practice in the DAM population; further research is needed to demonstrate the utility and value of the described US imaging technique in this population. Second, DAM is recognized to be a multifactorial outcome[3]; and the ratio of tongue volume to oral cavity volume is only one of these factors among many others.[9] In fact, numerous airway parameters have been proposed as predictive of DAM; and the choice of tests that offer the best predictive value remains debatable.[9] Such variables include, but are not limited to, neck circumference,[7] thyromental and sternomental distances, hyomental ratio, and upper-lip-bite.[46, 47] The role of US in evaluating these additional risk factors has yet to be explored. Third, this study was conducted in nonsurgical patients, precluding an actual assessment of the difficulty of intubation, correlation of the findings with Cormack-Lehane classification of laryngoscopic difficulty, and comparison with indirect video-scope imagining systems.[48] Also, the majority of patients were not obese and had Mallampati scores of I or II, rendering them less representative of the DAM population. Fourth, the generalizability of the US imaging technique we describe may have practical limitations, including the ability of subjects to perform the associated study manoeuvres. Fifth, our focus on difficulty of intubation undermines the role of difficulty of mask ventilation as another potential cause of DAM; however, while challenging mask ventilations shares most of the risk factors,[49] it is less likely to result in mortality and morbidity.[50] Finally, though our imaging attempted to emulate the actual setting of airway management, our observations do not account for some important variables that may influence DAM, such as airway plasticity and tongue compressibility,[23] and potential changes in airway dimensions associated with general anesthesia.[51]

In summary, this observational study establishes the validity and reliability of US in imaging of the oral and oropharyngeal parts of the airway, as well as in measuring the volumetric relationship between the tongue and oral cavity. Future studies may now use sonographic imaging of these airway parts to identify predictors of DAM.

Ancillary