Learning Enabled Continuous Transmission of Spatially Distributed Information through Multimode Fibers

Multimode fibers (MMF) are high‐capacity channels and are promising to transmit spatially distributed information, such as an image. However, continuous transmission of randomly distributed information at a high‐spatial density is still a challenge. Here, a high‐spatial‐density information transmission framework employing deep learning for MMFs is proposed. A proof‐of‐concept experimental system is presented to demonstrate up to 400‐channel simultaneous data transmission with accuracy close to 100% over MMFs of different types, diameters, and lengths. A scalable semi‐supervised learning model is proposed to adapt the convolutional neural network to the time‐varying MMF information channels in real‐time to overcome the instabilities in the lab environment. The preliminary results suggest that deep learning has the potential to maximize the use of the spatial dimension of MMFs for data transmission.


Introduction
The remarkable transmission capacity of multimode fibers (MMF) has attracted much renewed research interest. [1][2][3][4][5] In image transmission, compared with a single-mode fiber, the large number of spatial modes within an MMF allows the transmission of a high-resolution image through a small cross section with a diameter down to tens of micrometers. [6][7][8][9] Early attempts to turn a single MMF into an image transmission device can be traced back to the 1960s, by using phase conjugation, [10,11] or image encoding and decoding. [12,13] Recent advancements DOI: 10.1002/lpor.202000348 in light-shaping hardware such as spatial light modulator (SLM) and digital micromirror devices (DMD) enabled the spatially complex modulation of the phase or the intensity of a light beam. [14,15] This subsequently allowed the transmission matrix (TM) measurement of turbid media, [16][17][18] including MMFs [2,3,8,9,19,20] and focusing and signal transmission via digital phase conjugation. [21,22] These methods were then utilized to acquire an image at the fiber distal end, [20,21,[23][24][25] and potential endoscopic applications were proposed. [8,9,26,27] A big challenge facing single MMF imaging is the highly variable MMF transmission channel, which is subject to the changes in the fiber geometric shape and the environment. [28][29][30] For example, in the widely used TM approach, pre-calibration of the TM is required each time before image acquisition, [7,8] which is impractical for endoscopic imaging. To overcome this challenge, attempts were made to improve the robustness of light transport to bending deformations in MMFs. [31][32][33][34][35] Recently, it has been shown that MMF TMs can be characterized using metamaterial surface reflector stacks for lensless imaging without the distal access. [36] Deep learning techniques have been successfully applied to fields including computer vision, natural language processing, bioinformatics, drug design, medical image analysis, and material inspection. [37][38][39] Recently, deep learning approaches were proposed in image transmission and wavefront shaping through MMFs, [40][41][42] and other scattering media. [43,44] It was demonstrated that MNIST digits, letters, or other simple geometries were transmitted through a single MMF via deep learning. [40][41][42]45] After that, transmission of natural scenes through an MMF up to 10 m was reported. [46] It was also suggested that neural network learned and generalized different transmission states when the MMF was bent or subject to continuous transmission characteristic variations. [40,41] To accurately transmit randomly distributed and spatially multiplexed information using deep learning through a standard MMF over a longer period of time is an unexplored territory. This is mainly due to the following two challenges. First, although previous studies demonstrated digits, letters, or images transmission at a considerable accuracy using neural networks, data transmission requires high transmission accuracy for randomly distributed data at the individual channel level (i.e., one pixel in image transmission). Second, MMFs are well-known highly variable and random information channels, and a trained and fixed neural network for the MMF channel can only be valid for a limited period of time. Particularly, in a longer fiber, such gradual environment changes can accumulate and result in a considerable impact on the MMF transmission characteristics. [40] In this paper, we aim to address the two challenges mentioned above, and to use deep learning to transmit randomly distributed and spatially multiplexed information over MMFs. We use a simple proof-of-concept experimental system previously employed in image transmission experiments, [41,47] consisting of a singlewavelength laser, a DMD as the data modulator, and a detection camera. To overcome the time-varying nature of the MMF information channels, we propose a scalable neural network model based on the semi-supervised learning (SSL) approach, toward the sustainable long-period data transmission over MMFs. The remainder of this paper is organized as follows. First, we evaluate the variability of the MMF information channels of different types, lengths, and diameters. Second, we investigate random data transmission accuracies using a fixed neural network for different numbers of multiplexed channels in various MMFs. Third, we develop a scalable SSL model and evaluate its performance on sustainable data transmission over time-varying MMF channels. Finally, we study the individual data channel performances and the influences of the number of detectors on data transmission.

Experimental Setup
The experimental setup used to demonstrate the deep-learningbased high-spatial-density data transmission framework through an MMF is shown in Figure 1a. The laser beam (532 nm, 50 mW, Linewidth <1 MHz, Cobolt Samba) is expanded, collimated, and projected onto a DMD (ViALUX V-7001, ≈22 kHz). The DMD is able to display any arbitrary binary pattern by switching the micromirrors "ON" or "OFF", corresponding to input binary values "1" or "0" respectively. The wavefront of the incident laser beam is modulated by the pattern on the DMD and is consequently coupled into the input of a single solid-core MMF using a tube lens and microscope objective. Different types of MMFs are used in our experiment, as listed in Table 1. At the distal end of the fiber, a further microscope objective and tube lens are used prior to image the speckle pattern on a CMOS camera (QImaging optiMOS). In the experiments, the effective area of the DMD (the area coupled into the MMF as the input) is varied accordingly to the total number of transmission channels. Each transmission channel occupies 4 × 4 DMD micromirror pixels. For example, 20 × 20, 15 × 15, 10 × 10, and 5 × 5 transmission channels can be obtained by maintaining the total micromirror pixel numbers at 80 × 80, 60 × 60, 40 × 40, 20 × 20 respectively. The data frames displayed on the DMD are randomly generated (implemented using random.randint function within the Python Numpy Library) in an N × N square configuration, [48] with 50% each "ON" and "OFF" pixels, which resembles the data distribution during real data transmission over N × N independent channels. MMF output speckles are recorded on the CMOS camera over a cropped region of interest measuring 320 × 320 pixels for step-index MMFs and 260 × 260 pixels for graded-index MMFs separately.
The transmission experiments are repeated over a range of fibers of different types, diameters, and lengths, as detailed in Table 1. For simplicity, the datasets are indexed as follows: "Refractive index"-"Core diameter"-"Length"-"The number of transmitted channels (denoted by N)". For example, SI-100-100-400 refers to a step-index (SI) MMF with a diameter of Ø100 µm, a length of 100 m, and 400 multiplexing channels, i.e., 20 × 20 input DMD pixels. GI denotes graded index. Figure 1b is the schematic of the proposed CNN architecture, [41] and a standard CNN with three convolutional layers is used as the underlying classifier component (CNN implementation is detailed in the Supporting Information). The system operates at 200 frames per second (i.e., 200 input-output-pairs per second), and 40 000 data pairs are collected during a 200 s period. The images are then down-sampled to 96 × 96 pixels, and the dynamic range of intensity values is subsequently normalized to the range of [0, 255], in order to reduce the number of deep CNN parameters and computer memory usage, thereby increasing the training speed while maintaining the prediction accuracy. The downsampling process is implemented using resize function with IN-TER_AREA interpolation within the Python cv2 Library.

Data Preprocessing
Laser Photonics Rev. 2021, 15,2000348  In neural network testing (see Section 3.2 and Figure 3), 10% of the data (i.e., 4000 test input-output data pairs) are pseudorandomly selected using train_test_split function within the Python Scikit-learn Library by setting the random state to a fixed value, which enables us to relocate the test data according to their original orders in the 40 000 measurements. The performance of the models can be numerically quantified from two perspectives: classification evaluation and regression evaluation. [41] Regarding classification evaluation, the accuracy and the F1_score are calculated for the predicted binary pattern by comparing it to the ground truth. The prediction accuracy is defined as the percentage of correctly predicted pixels within one N × N-pixel input DMD pattern. The F1_score is defined as the harmonic mean of the model's precision and recall. [49] For regression evaluation, the mean squared error (MSE), the Pearson correlation coefficient (PCC, calculated using np.corrcoef function in Python), and the structural similarity index (SSIM) [50] are also calculated for the predicted pattern based on the ground truth.

Semi-Supervised Learning Model
The idea behind our SSL model is to use the predicted data as the new training data to adapt the neural network to the gradually varying MMF transmission channel. Our SSL algorithm eliminates the need for calibration and does not require access to data at the transmitter end. The new training input-output pairs are formed by combining the CNN predicted input data with camera captured speckles. The detailed SSL model is illustrated in Figure S1 and Algorithm S1 in the Supporting Information. The main idea of our SSL approach is described as follows: i) At the receiver end, we measure the real-time speckle images, which contain the gradual changes of fiber transmission channel; ii) Given the ≈100% accuracy to predict input within a short time interval, we can use the real-time speckle image to accurately predict the input (i.e., the gradual change to the speckle is not significant enough to influence the prediction results for a short time interval); iii) We can then use the predicted input and the measured output speckle at the receiver as the new input-out pair to update the neural network. Note that these new training data represent the true real-time data because of the ≈100% prediction accuracy. We also release the source code of our confidence-based SSL model, which can be found in [51] .
For the application of the SSL model in data transmission (e.g., the results presented in Figure 4), the streaming output speckles (40 000) are received in a chronological order starting with only 5% (2000) initial labeled instances, with the remaining unlabeled output speckles divided into 76 batches of 500 output speckles. 500 is chosen because it is found to be the optimized batch size in our model. Every time when we use an output speckle batch to update the CNN, we call it a Step. In this case, there are 76 Steps corresponding to the 76 batches. In the initialization Step 0, the 2000 initial labeled instances are used to train the CNN. In the following Steps, the CNN trained in the previous Step is used to predict the corresponding input data of the 500 output speckles in the current batch, and this forms 500 new data pairs (i.e., 500 output speckles labeled with their corresponding CNN-predicted pseudo input data). These new 500 data pairs are then subject to a confidence-based filtering process (see the Supporting Information) to remove those data pairs with relatively large prediction errors. After filtering, 2000 most recent data pairs, including these newly formed data pairs in the current Step after filtering and data pairs from the previous Steps, are used to update the CNN. This process is repeated 76 times until all 40 000 speckle images are processed. Note that this process can be continued for data transmission through an evolving MMF channel.
The performance of the SSL model is assessed by comparing the prediction results of the SSL model with both the ground truth and the predicted results by a static CNN model only trained by the first 2000 initial labeled instances.

MMF Channel Stability Test
We first test the stability of the MMF information channel. The MMFs under test are bare fibers without protective jackets, winded on a fiber reel as provided by the manufacturers. The stability over time is measured by the PCC to an initial speckle pattern recorded with the same input modulated by the DMD. All MMFs under testing exhibit an unstable time-varying nature in the laboratory environment, as shown in Figure 2a. A periodic fluctuation in the stability of the system is present. The factor leading to this periodic change is unclear and is possibly a result of a low-frequency variation present in the lab environment. Such effects of fluctuations and the deterioration of stability in ambient temperature and lab environment are clearly exacerbated by the increase of the fiber length. In addition to the periodic fluctuation, a clear downward trend of stability over time is present for all MMFs. For example, the correlation drops to ≈21% through the step-index Ø100 µm 1 km MMF after 100 s. Despite the significant fluctuations and instabilities over time in all MMF channels, Figure 2b shows the high correlation between two adjacent speckle images sampled by the camera along the time domain. This is an important observation, revealing that the system only changes gradually, which forms the basis of our SSL model.

Random Data Transmission
We conduct the experiment in an offline mode first, where all training data are collected altogether, and the evolving transmission states (i.e., instability) of the MMF are learned jointly by the CNN at the same time in a single training process. For each dataset described in Table 1, 90% (36 000) of these speckle images and their corresponding DMD patterns are randomly selected for training with the remaining 4000 data pairs for testing the final CNNs. In this offline setting, the CNNs can generalize the instability very well in all of these drifting transmission states. Examples of the data transmission accuracies of our proposed framework are shown in  Figure S4 in the Supporting Information. Using CNNs, we are able to effectively multiplex the inputs of up to 400 data channels by detecting the corresponding speckle patterns, with high prediction accuracies for the datasets listed in Table 1. Despite the high instability with the 1 km fibers, the prediction accuracy remains good (e.g., over 99% for the multiplexing of 225 channels). This suggests that the CNN generalizes the instability in its learning and is hence able to achieve prediction accuracies similar to those achieved with short fibers. As shown in Table 2, for the multiplexing of 100 channels, an increase in step-index Ø100 µm fiber length from 100 m to 1 km yields only a ≈1.2% drop in prediction accuracy, and doubling the step-index 100 m fiber diameter results in a further 100% prediction accuracy. A drop in prediction accuracy of most datasets is clearly visible toward the start and the end of the collection. This is due to the reduced number of inputoutput data pairs in the training datasets at the start and the end of the collected datasets. This is because, at the data-collection starting point, there are no immediate correlated training data collected before this point, and similarly at the endpoint, no immediately correlated training data collected after the endpoint either. Therefore, the relatively fewer training datasets for the MMF Laser Photonics Rev. 2021, 15,2000348   transmission states at the start and the end of the data collection period result in less accurate predictions, which are not the real system performance and can be disregarded. Further experimental results ( Figure S5a, Supporting Information) are presented in Supporting Information to support this observation.

Data Transmission Using SSL Model
The offline experiment detailed above demonstrates the high transmission accuracy of random data over MMFs when using one pre-trained CNN to generalize the instability in the MMF. However, the offline setting is not suitable for real-world data or image transmission for a long period of time. Firstly, it is impossible to acquire all the training data altogether for the time-varying MMF information channel. Secondly, there is no direct access from the receiver to the data at the transmitter, and therefore the CNN training data cannot be directly gained during real-time. To address this without sending known channel calibration data, we note that the offline results presented in Table 2 suggesting ≈100% accuracy for all the datasets, and the MMF information channel only varies gradually with time ( Figure 2b). Therefore, we conclude that it is possible to update the CNN in real-time by using the CNN predicted channel inputs and camera-collected corresponding speckle images.
The SSL model is therefore proposed (see Materials and Methods section, and the Supporting Information for details) to overcome the significantly time-varying nature of MMF transmission channels. In this SSL model, we only use a small batch of initial labeled data according to extreme verification latency (EVL) [52,53] conditions. Under EVL conditions, the actual labels of processed data for further verification are never made available, and therefore the SSL model requires no feedback from the transmitter after the initialization. We include the real-time transmitted data as pseudo labels in the training process to overcome the predictive errors caused by the time-varying MMF channels. This is a receiver-end framework which requires no feedback from the transmitter after the initialization. The experiments are conducted using six datasets collected with different fibers: SI-40-100-25, SI-40-100-100, SI-100-100-100, GI-50-100-25, GI-50-100-100, and GI-50-1k-25. Each of the datasets consists of 40 000 input and output pairs. In Figure 4, the performance of our confidence-based SSL model is compared with the performance of a static model, where the CNN is only trained with the initial labeled 2000 instances without being further updated over time.
In Figure 4a, with 25 channels transmitted in a 40 µm-core 100 m step-index MMF, significant fluctuations can be seen from the results of the static model, compared to a ≈100% accuracy achieved by our SSL model. Figure 4b,c are results arising from the multiplexing of 100 channels over 40 and 100 µm core 100 m step-index MMFs respectively, where our SSL model achieves 100% accuracy consistently over the experiment duration. In comparison, without using our SSL model, the accuracy starts to decrease before Step 10 and drops quickly (the Step describes the process of using a new batch of input-output pairs to update the CNN. See Materials and Methods section for the definition of Step). This implies that the degradation of performance is caused by accumulated predictive errors. Figure 4d,e illustrates that the proposed SSL algorithm achieves near 100% accuracy over time for the multiplexing of both 25 and 100 channels over a 50 µm-core 100 m graded-index MMF. Without using the SSL model, the transmission accuracy decreases drastically over time. Figure 4f displays the SSL model results of multiplexing 25 channels over a graded-index MMF of 1 km length. Again, our SSL model exhibits 100% transmission accuracy over time.

Channel Performance Evaluation
To evaluate the performance of individual data channels for results presented in Table 2, we select four datasets with relatively low data transmission accuracy, namely SI-100-100-400, SI-100-1k-100, SI-100-1k-225, and SI-100-1k-400. Over 4000 predicted inputs, the transmission accuracy of each individual channel is calculated by the total number of accurately transmitted channel  data divided by 4000. Figure 5 visualizes individual channel transmission accuracy for these four datasets. It is interesting to see from Figure 5 that the individual channel performance is not evenly distributed. Certain channels perform much worse than others. Surprisingly, these less accurate transmission channels remain at similar spatial locations of the DMD modulated pattern for experiments on different MMFs. For example, Figure 5a,d are the channel performance results for the multiplexing of 400 channels over 100 m and 1 km SI MMFs, respectively, and the Pearson correlation coefficient between them is as high as 0.898. Note that each individual channel occupies the same number of DMD pixels at the input regardless of the number of channels multiplexed (see Materials and Methods section). Therefore, we consider the zoom-in areas in the red-lined squares in Figure 5c,d cover the similar DMD area of Figure 5b. The calculated Pearson correlation coefficients between Figure 5b and the red-lined square areas are reasonably high too, 0.759 and 0.763, respectively. This suggests that different experiments suffer from identical system noises, which are likely caused by the experi-mental system rather than the MMF. Hence, the transmission accuracy can be further enhanced by improving the experimental system or by excluding those less-accurate channels.

Influence of the Detector Number at the Receiver
Finally, we study the influence of the total number of pixels at the receiver camera end on the transmission accuracy, as shown in Figure 6. Firstly, we down-sample the original speckle images to ones with reduced numbers of pixels (see Materials and Methods section), as illustrated in Figure 6a. It can be seen in Figure 6b that a down-sampled speckle image at 30 × 30 pixels is sufficient to recover the full-transmission-channel information accurately for SI-40-100-100 and SI-100-100-100. With the increase of the multiplexed channel numbers and the MMF length, it appears more camera pixels are required to maintain a high detection accuracy. For example, ≈70 × 70 and 80 × 80 detector pixels are needed for accurate data transmission of Datasets SI-100-100-400 Laser Photonics Rev. 2021, 15,2000348  and SI-100-1k-100, respectively. For SI-40-100-25, only 10 × 10 detector pixels are needed to achieve a transmission accuracy of more than 98%. These results suggest that the total pixel number can be optimized at the receiver end to further enhance the detection and transmission speed. We also studied other partial speckle images and their influences on the accuracy and presented the results in Figures S6 to S8 in the Supporting Information.

Discussion and Conclusion
The scope of this work is to provide early insights into the deeplearning-based continuous transmission of spatially distributed random information over standard MMFs. Our proof-of-concept experiment is focused on overcoming the high instability of MMF channels, and demonstrating sustainable high-accuracy random high-spatial-density data transmission over MMFs.
There are a few limiting factors in our proof-of-concept experiment. Firstly, the modulation speed of DMD (≈22 kHz) and the frame rate of the camera (200 f s −1 ) used in this work are relatively low. In fact, the transmission speed of our experiment is limited by the camera frame rate. Given the speed limit of our proof-of-concept system, the intermodal dispersion (IMD) in the MMF and how it may affect the data transmission accuracy at a high speed is not studied. The data transmission speed can be improved by using an ultra-high-speed camera or a photodetector array. As shown in our results, a down-sampled speckle image is sufficient to recover the complete channel information. At the transmitter end, multiple modulators and light sources can be used to replace the DMD for a much faster speed. Nevertheless, the multiplexing of multiple channels in an MMF will considerably enhance the data transmission capacity compared to a single-channel transmission, at a low speed at which the IMD can be neglected. Secondly, the transmission accuracy in the proof-ofconcept system can be enhanced by involving the error correction techniques widely used in optical communication, such as forward error correction (FEC) technologies [54] to ensure that errorfree data are applied in SSL model. In addition, as suggested by the results presented in Figure 5, channel performances are affected by the experimental system. Hence, transmission accuracy can be further improved with an optimized system or by excluding those less-accurate data channels. Thirdly, we are aware of the fact that the online training and updating of the CNN in realtime may be time-consuming and challenging, and a practical training and updating scheme needs to be carefully designed and developed. With the rapid development of artificial intelligence and high-performance computing technologies, [55,56] a negligible time frame of neural network training and prediction may be within reach in the future. Finally, we have not applied the SSL algorithm for a prolonged period of time than necessary for MMF channel instability compensation. However, given the fastvarying nature of the MMF channel, we show successful MMF instability compensation within the time frame during which the channel varies significantly (Figure 2a). This suggests that the www.advancedsciencenews.com www.lpr-journal.org SSL algorithm may be applied for a long period of time to compensate for the MMF instability. It is worth noting that errors are likely to occur during the reconstruction process and hence can be accumulated, which will lead to the failure of the data reconstruction eventually. Therefore, it is essential to minimize errors in the process. To some extent, the confidence-based SSL model reduces the accumulated error. Figure 4e shows a reconstruction accuracy less than 100% due to errors occurred at the early stage of the transmission, and the accumulation of error does not accelerate within the experiment duration. Figure 4a shows that the reconstruction accuracy occasionally drops to below 100% and then is back to 100% within the experiment duration. For a longer period of time than our experiment duration, the accumulated errors may become too large to be overcome by the SSL model. In such cases, error correction techniques such as FEC can be used, and pre-known data can be sent from the transmitter for calibration. Apart from these, the polarization variation in the MMF under testing is not particularly studied in this work, [57] and we believe any polarization induced instability falls under the MMF instability category and will be effectively overcome by the scalable SSL model. Moreover, although not experimentally verified in this work, the combination of multiple wavelengths in our system for data transmission is considered feasible by introducing additional lasers, modulators, filters, and cameras for different wavelengths modulation and detection separately.
In summary, we presented a continuous data transmission framework for spatially distributed random information over MMFs using deep learning. Our proof-of-concept experiments demonstrate that deep learning using CNN enables high-spatialdensity channel multiplexing for accurate data transmission of up to 400 channels over a single MMF. The highly time-varying MMF transmission is overcome by an SSL algorithm, which updates the CNN continuously according to the changes of the MMF information channels to maintain an accurate transmission over time. The results presented in this work provides useful insights into future MMF based data or image transmission systems.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.