Magnetic resonance imaging of the vocal fold oscillations with sub‐millisecond temporal resolution

The temporal resolution of the MRI acquisition is intrinsically limited by the duration of the spatial encoding, which is typically on the order of milliseconds. Faster motion such as the vibration of the vocal folds during phonation cannot be imaged with conventional MRI as this would require sampling frequencies in the kilo‐Hertz range. Here, a faster MRI acquisition strategy is presented that encodes a 1D periodic motion at a temporal resolution that is an order of magnitude higher compared to conventional MRI.


| INTRODUCTION
Dynamic MR imaging has been the mainstay in many clinical examinations and provides functional parameters such as blood flow, 1 perfusion, [2][3][4] and motion. 5 However, many physiologic processes such as the motion of the heart valves or the oscillation of the vocal folds require sub-millisecond temporal resolution for their delineation and are therefore too fast for current MR imaging techniques. These high requirements for temporal resolution are not easily met by conventional MRI methods, because the spatial encoding process is inherently slow. In MRI, image information is imprinted into the MRI signal using linearly increasing magnetic field gradients, which are switched on and off during the image acquisition. Because of the finite gradient switching times, the acquisition time for a Fourier line is on the order of milliseconds -therefore, faster dynamic processes cannot be imaged with current MRI techniques.
Historically, many different methods have been presented to accelerate the MR image acquisition: gradient echo 6 and turbo spin echo 7 MRI reduced the minute-long spin echo acquisitions to a few seconds, and echo planar imaging 8 further shortened the acquisition time down to 50-100 ms. An additional acceleration was achieved with parallel imaging [9][10][11] that uses the intrinsic spatial encoding of multiple receiver antennas and with partial Fourier methods and compressed sensing 12 that exploit Fourier space symmetries and prior knowledge about the imaged object. With a combination of these technologies, a temporal resolution of 20 ms has been realized, 13 but the amount of Fourier lines required to unambiguously reconstruct the MR image remains a limiting factor.
For a periodic motion, such as the heartbeat, even higher temporal resolutions are possible if the data acquisition is synchronized with the electrical cardiac activity (ECG). Here, each Fourier line is measured repetitively over the cardiac cycle together with the ECG, and multiple images are reconstructed from the data by resorting with respect to the ECG activity. Even though this synchronization allows for measuring dynamic processes with a time resolution on the order of the acquisition time for a single Fourier line, faster oscillation frequencies above 100 Hz cannot be visualized because the motion would interfere with the gradient encoding process. During speaking and singing (phonation), the vocal folds in the human larynx ( Figure 1) perform an oscillatory motion with frequencies between 60-1500 Hz-therefore, MRI is currently unable to visualize vocal fold motion. The vocal tract configuration during speech has been studied with During phonation (e.g., singing, voiced consonants) the posteriorly points of attachment (arytenoid cartilages) adduct the vocal folds and the glottis is closed. The vocal folds oscillation is driven by the subglottal pressure, periodically pushing the vocal folds to the side. Shown at the bottom are 2 phases of the vocal folds oscillatory motion, on the left when the vocal folds touch and the glottis is closed, on the right when the glottal area is maximized stroboscopic MRI, 14 and temporal resolution of continuous acquisition has been improved in recent years. 15,16 However, for dynamic MR imaging of the vocal folds during phonation, the insufficient temporal resolution precluded the visualization of the vocal fold oscillations, 17 and only an averaged image of the oscillating vocal folds could be acquired. 18 A similar approach has been used to image unilateral palsy of the vocal folds. 19 The clinical standard method to study vocal fold motion under phonation is laryngeal stroboscopy, 20,21 during which a camera and a strobe light are introduced into the pharynx above the vocal folds, and a series of images is recorded during phonation of a sustained tone. The strobe light frequency is slightly detuned to the phonation fundamental frequency that causes images to be acquired at consecutive motion phases over the oscillatory cycle. In the stroboscopy image sequence, the predominantly lateral motion of vocal folds can be studied (Figure 2A), and functional parameters such as glottal opening and closing can be qualitatively described that are important for the detection of phonation diseases. Laryngeal stroboscopy is invasive and rather uncomfortable for the patient so that analysis of vocal fold motion is strongly influenced by the measurement procedure. Furthermore, it provides only a surface view from above the vocal folds, and internal or deep-lying structures partaking in the phonation process remain hidden. MRI, on the other hand, is non-invasive and could provide a cross-sectional view of the anatomical structures involved in phonation.
In this work, very short phase encoding gradients were used to measure the 1D periodic motion of the vocal folds with sub-millisecond temporal resolution. To achieve such high temporal resolution, for every Fourier line, phase encoding is performed along the direction of motion, and the gradient pulses are designed to yield the shortest possible duration, limited only by hardware constraints and peripheral nerve stimulation. In a simulation, we compared conventional frequency encoding with the fast phase encoding for a 1D oscillation with increasing frequency to determine the point spread function (PSF) as a function of oscillation frequency. Based on the simulation results, we acquired MRI data in volunteers simultaneously with electroglottography 22 (EGG), which was used to assign MR data to different motion phases during image reconstruction.

| Frequency and phase encoding in MRI
In conventional MRI, data are acquired with Cartesian encoding, and 2 gradients are applied in the 2 orthogonal in-plane directions of the cross-sectional slice: a frequency encoding gradient with simultaneous data acquisition and a preceding and varying phase encoding gradient. The frequency encoding gradient G RO introduces linearly increasing Larmor precession frequencies, Δf = 2π × γ × G RO × Δx. Therefore, the spatial position Δx of the magnetization can be calculated by a Fourier transform of the signal that is received during gradient activity. The other inplane position Δy is encoded by the phase encoding gradient, G PE , which leads to an additional signal phase before the data acquisition. As a single phase value alone is not sufficient to differentiate the positions of many signals along the y direction, the encoding procedure is repeated with different amplitudes of G PE such that the position information is encoded into a phase change. Frequency and phase encoding gradients are typically applied for several milliseconds, where the duration is dependent on the F I G U R E 2 Proposed encoding scheme for imaging of vocal folds oscillations. (A) Schematic representation of the oscillatory movement of the vocal folds and the directions for frequency and phase encoding gradients. (B) A fast periodic motion along the x-axis is encoded by very short phase encoding gradient lobes G PE . For every k-space line, the shortest possible gradient shape is applied, resulting in increased temporal resolution for lines closer to k-space center. The frequency encoding gradient G RO is applied along the y-direction, where no motion occurs and the signal is independent from motion in x-direction. To encode a different motion phase in each cycle, TR must be different from the oscillation period T. Conventional slice-selection and rewinding are not depicted available maximum gradient strength and slew rate, as well as the safety limits to prevent painful peripheral nerve stimulation. Although theoretically, a TR below 1 ms is possible using for example 3D radial sequences-at maximum gradient amplitude and slew rate and a non-selective RF excitation -the associated high readout bandwidths would result in a significant loss of SNR such that dynamic imaging of the small vocal folds would not be feasible. Longer frequency encoding times are preferable to increase SNR, especially when the MR signals are very weak.

| Encoding and reconstruction of 1D oscillatory motion
Any periodic motion in the imaging plane can be separated into an x-and a y-component, which are independently encoded by frequency and phase encoding. To investigate the influence of both encoding schemes on an oscillating pointlike magnetization, separate simulations of the signal evolution of were performed. When observed from the pharynx, the vocal folds located in the larynx below perform an oscillatory motion that is occurring mainly in the left-right direction ( Figure 2A). Therefore, the oscillation can be considered as 1D in the transverse plane.
For an oscillation frequency of f osc = 100 Hz that is sampled at N imag = 10 images per cycle, a temporal resolution of τ = (f osc N imag ) −1 = 1 ms is required to visualize the oscillation, which must be even shorter when imaging motion with higher f osc . With frequency encoding, data are acquired in the presence of a several millisecond-long frequency encoding gradient. If a spin changes its frequency because of its movement in the applied gradient, the Fourier transform of the signal yields a continuous frequency distribution (i.e., blurring in image space). In the case of 1 full oscillation during readout, the signal is expected to be blurred over the motion range. To mitigate this artefact, the gradient duration can be shortened at the cost of SNR but is finally limited by the gradient system. With the oscillation frequencies of the vocal folds at 100 Hz and higher, faster gradient encoding schemes are necessary to allow for spatial encoding with sub-millisecond temporal resolution.
The main idea of this work is to apply the phase encoding along the direction of motion to encode it, such that the temporal resolution of the sequence is not given by TR, but by the duration of the phase encoding gradient τ PE . Contrary to conventional phase encoding, the gradient lobes are not chosen to have the same duration for each k-space line k n but are time-optimized to realize the necessary moment in the shortest possible time. This leads to triangular gradient shapes close to k-space center, where the slew rate is the limiting factor, and trapezoidal gradients for the outer k-space lines, where the gradient amplitude is limiting ( Figure 2B). Frequency encoding is insensitive to motion perpendicular to the encoding direction, provided the coil sensitivity is homogeneous along the direction of motion. With this approach, the motion can be fully encoded during τ PE , which approaches zero for k n values close to the center of k-space, without compromising SNR.
To reconstruct N imag artefact-free images of the periodic motion, an EGG signal was acquired together with the MRI data. We used a modified, MR-safe EGG unit (EGG-D400, Laryngograph Ltd., London, UK) that records the alternating current (at a carrier frequency of 3 MHz) between 2 electrodes on both sides of the larynx. 23,24 When the vocal folds are closed, the contact area between the vocal folds lowers the impedance between the EGG electrodes and a higher current is measured. To avoid coupling between EGG and MR data, harmonic suppression low-pass filters were built into the EGG-leads. The EGG allows for a direct measurement of the vocal fold oscillation phase as compared to an audio recording using a microphone within the bore, from which the gradient noise would need to be removed. 25 Synchronization between the EGG and MRI acquisitions was realized with an optical trigger generated by the MRI system at the time of each RF pulse, which was then converted into an audio signal and recorded by an audio channel of the EGG device. After the MR measurement, the data of each acquired k n is attributed to the correct motion phase ϕ i by identifying the trigger signal in the EGG data and fitting a sine wave to the EGG curve to extrapolate the oscillatory motion at each phase encoding gradient event. Then, k n are sorted into N = 10 phases, and after applying a Hann filter window and 3× zerofilling, a Fourier transform is performed for image reconstruction ( Figure 3).

| Simulation of frequency encoding in motion direction
A discrete magnetization performing a 1D harmonic oscillation with an amplitude of 15 mm and a varying frequency was used to compare the 2 encoding types. Because the system is 1D, only 1 type of encoding is necessary for the reconstruction of the movement. Simulations of the PSF for frequency and phase encoding were performed separately using the same imaging parameters as in the in vivo experiment, namely FOV, resolution, and readout bandwidth, as well as the performance of the gradient system. For both encoding schemes, the evolution of the phase of the magnetization depending on its current position and the current gradient amplitude was calculated on a 1 μs grid according to In case of frequency encoding, a Fourier transform of the complex phase evolution during the readout event yields the PSF. For phase encoding, all phase encoding lobes need to be simulated separately, and the acquired complex phases values are then Fourier-transformed to obtain the PSF.
Multiple simulations were performed for increasing number of oscillation cycles of the moving magnetization during the readout event, starting from 0 (stationary magnetization) to 5 full cycles during the readout event.
For each frequency, 100 phases of the motion were simulated, allowing for the reconstruction of a continuous movie of the encoded motion. In simulations with frequency encoding, for reconstructions of the same motion phase, the point-like magnetization is always in the same phase of its oscillation cycle after at the echo condition. In the case of phase encoding, this is true for the center of the phase encoding gradient lobes.

| MR-measurements
For dynamic MRI of the vocal fold oscillation with sub-millisecond time resolution, we implemented a spoiled gradient echo sequence with time-optimized phase encoding gradients on a clinical 1.5T MRI system (Tim Symphony; SIEMENS, Erlangen, Germany). MR data was acquired with a custombuilt Tx/Rx coil that was fixed to the neck of the volunteer at the position of the larynx between the EGG electrodes ( Figure 4). During the MRI, the volunteer was asked to sing a D 3 (frequency: 146.8 Hz) in a single breath out. Therefore, a singing duration of 25 s was achieved, which is the limiting factor for the total acquisition time. All methods were carried out in accordance with relevant guidelines and regulations, healthy volunteer scanning was approved by the institutional review board (Ethikkommission) of the University Medical Center Freiburg (No. 160/2000), and informed written consent was obtained before imaging.
A target resolution below 1 mm was chosen to image the vocal folds oscillation. Considering also peripheral nerve stimulation, this resulted in a FOV of 70 × 70 mm 2 and an image F I G U R E 3 Synchronization of EGG and MR data. The trigger signal (purple) is used to synchronize EGG data (blue) with MR data (red).
The varying duration of the phase encoding gradient (green) for the different k-space lines is also shown as an overlay on the corresponding EGGsignal. Depending on the phase of the oscillatory motion ϕ i , the data is sorted into the corresponding k-space. The optical trigger, phase encoding gradients, and data acquisition are depicted from the MR pulse sequence

F I G U R E 4 Positioning of Tx/Rx coil and EGG electrodes. (A)
Custom-built Tx/Rx Coil used for the measurements. (B) Fixation of the coil and EGG electrodes at the neck of the volunteer with a Velcro neck-strap. The implemented signal filters are located in the shielded cases matrix of 80 × 80. The bandwidth of 240 Hz/pixel was selected as a compromise between acquisition time and SNR. Images of the vocal folds were acquired in a transverse slice orientation using the following parameters: TE = 3.47 ms, TR = 7.4 ms, slice thickness = 8.5 mm, RF-pulse duration = 500 µs. The flip angle α Ernst = 7° was used to maximize the vocal fold signal, which had a T 1 of ~1000 ms. Partial Fourier of 6/8 and asymmetric echo of 40/50 were used to further reduce acquisition time per image. RF-and gradient-spoiling was used. With the MR system's gradient system (G max = 30 mT/m; s max = 100 mT/m/ms) a phase encoding duration of Δτ PE = 750 µs could be achieved for the largest k n . In total, every k n was acquired 57 times to ensure that it is sampled at least once for every phase ϕ i of the reconstructed oscillatory motion and to make full use of the available acquisition time of 25 s.

| Voice analysis
Undesired short-term fluctuations from the fundamental frequency during phonation are called jitter and stem from a variety of physiological sources. 26 To measure this frequency instability, a microphone recording of the volunteer's phonation outside the MR system was analyzed with the EGG software package SPEAD (Version 4.2.2, Laryngograph Ltd., Walligton, UK). It uses the multidimensional voice program (MDVP) and calculates the jitter using where T i is the period length of the i-th oscillation cycle and n the total number of cycles. 27 A jitter below 1% is considered normal. 28 Figure 5 shows the simulated 1D PSF of the oscillating magnetization for both encoding schemes at 2 distinct positions during its oscillation-its equilibrium point and its turning point-and for increasing oscillation frequencies. For a stationary magnetization, the simulation shows the expected point-like distribution of signal intensity in all 4 cases. A video of all simulated motion phases can be found in the supplementary material (Supporting Information Video S2).

| Simulation of a 1D motion
Increasing oscillation frequencies are accompanied by a broadening of the PSF, especially in the case of frequency encoding. Here, the PSF shows significant distortions of the moving magnetization when only small fractions of an oscillation cycle are performed during data acquisition. When data acquisition is centered on the turning point of the oscillation and the readout duration equals the period of the oscillation, the PSF covers the whole motion range. When centered on the transition of the equilibrium point, this is already the case for an oscillation period that is twice as long as the data acquisition. For 1 or more oscillation cycles during readout, the PSF always covers the whole motion range. The shorter phase encoding gradients are more robust to the motion, and the PSF shows only little deviation from the ground truth for up to 1 oscillation cycle during data acquisition. When imaging the magnetization at its central position, the true signal is preserved for all oscillation frequencies, because the phase integral in Equation 1 vanishes over a symmetric interval. Figure 6 shows 1 cycle of the vocal folds oscillation at a fundamental frequency of 145 Hz (Jitter: 0.24 %). Reconstructed into 10 contiguous frames, each frame corresponds to an acquisition time of Δt = 690 µs, achieving the desired submillisecond temporal resolution. A video showing the oscillation 100× slower can be found in Supporting Information Video S1. During the oscillation, the vocal folds never seem to fully close the glottal opening, whereas in laryngeal stroboscopy, a complete closing of the glottal opening under phonation is observed. Opening widths between 2.7 and 4.7 mm were measured in the MR images (Figure 7).

| DISCUSSION AND CONCLUSIONS
We present an encoding method allowing to image fast, periodic motion with sub-millisecond temporal resolution. This enables the investigation of fast physiologic motion with the multitude of image contrasts available with MRI. In comparison with laryngeal stroboscopy, where only the outside surface can be imaged, the proposed method also allows study of the motion inside the tissues involved. Finally, this technique removes the necessity for a camera in the pharynx to image the oscillation and therefore provides a more natural singing condition.
The discrepancy in vocal fold closure between the presented MR images and images obtained from laryngeal stroboscopy is attributed to partial volume effects from tissue below the surface of the vocal folds (e.g., musculus vocalis) or susceptibility artefacts, rather than a shortcoming of the encoding scheme. Imaging of the vocal folds during the closed phase of their oscillation should pose little problem to the proposed encoding method as the vocal folds here move at their slowest velocity.
Because of the cost of an MRI measurement, the method presented in this work is unlikely to replace laryngeal stroboscopy in clinical examinations of the vocal folds, but rather provides an additional tool for researchers to assess

F I G U R E 5
Simulation of a moving, point-like magnetization. The PSF of a moving point-like magnetization is simulated using regular frequency encoding (left column) and the proposed phase encoding gradient (right column), for increasing oscillation cycles during readout. The top row shows the frames that correspond to the magnetization being at its central position after half the encoding gradient, the center row shows the frames that correspond to the magnetization being at its upper turning point after half the encoding gradient. At the bottom, the gradient shapes used in the simulation are shown. In all graphs, the dotted yellow line represents the oscillation frequency of the vocal folds in the in vivo experiments (f osc = 145 Hz). The signal readout is shown as gray blocks F I G U R E 6 Reconstructed images of vocal folds oscillation at 145 Hz. Left: transversal view of the larynx at the height of the vocal folds.
The image intensity was corrected for the coil sensitivity and then resized using zero padding. The red border shows the position of the images on the right; the green line shows the position of the line plots in Figure 7. Right: oscillation phases of the vocal folds at 145 Hz shown in 10 phases, resulting in a temporal resolution of ~690 µs. Minor blurring is observed as a consequence of the Hann filter used during reconstruction functional parameters of the vocal folds oscillation by using the amenities of MRI, in particular the free choice of the imaging plane, the non-invasiveness, and the choice of different image contrasts. MRI could, for example, be applied when the use of endoscopic cameras is impossible because of anatomic feature size. 29 Imaging the dynamic distribution of water distribution within the vocal folds may provide insights into the formation of vocal fold nodules. 30,31 Independent of the moving object, this technique requires the motion to be monitored continuously and with a temporal resolution similar or better to the desired resolution of the MR images. This excludes motion where available monitoring means do not provide such high sampling rates or are not MR compatible. Encoding is currently limited to 1D motion, which can be circumvented by applying phase encoding in both in-plane directions at the cost of higher total acquisition times, which might enable usage of this technique in other slice orientations (e.g., orthogonal to the vocal fold plane to study the dynamics of the mucosal wave in the coronal plane). 32,33 Temporal resolution and SNR can be further improved by using stronger gradient systems and higher field strengths respectively, however, peripheral nerve stimulation may become a limiting factor for temporal resolution, and higher fields may worsen susceptibility artefacts at the boundary between tissue and air. Other repetitive motion patterns such as the opening and closing of heart valves can potentially be imaged with this technique using an ECG signal for synchronization. In this case, the total acquisition time is not limited by the patient's ability to sing a tone during 1 exhalation.
In conclusion, for the first time, MR images with submillisecond time resolution are acquired using retrospective EGG-gating of the vocal fold oscillation so that the fast oscillatory motion can be studied. This method opens new diagnostic options for the analysis of diseases associated with vocal fold abnormalities.