Convolutional neural network for quality of transmission prediction of unestablished lightpaths

With the advancement in evolving concepts of software‐defined networks and elastic‐optical‐network, the number of design parameters is growing dramatically, making the lightpath (LP) deployment more complex. Typically, worst‐case assumptions are utilized to calculate the quality‐of‐transmission (QoT) with the provisioning of high‐margin requirements. To this aim, precise and advanced estimation of the QoT of the LP is essential for reducing this provisioning margin. In this investigation, we present convolutional‐neural‐networks (CNN) based architecture to accurately calculate QoT before the actual deployment of LP in an unseen network. The proposed model is trained on the data acquired from already established LP of a completely different network. The metric considered to evaluate the QoT of LP is the generalized signal‐to‐noise ratio (GSNR). The synthetic dataset is generated by utilizing well appraised GNPy simulation tool. Promising results are achieved, showing that the proposed CNN model considerably minimizes the GSNR uncertainty and, consequently, the provisioning margin.


| INTRODUCTION
In the last decades, optical networks' design and management are continuously evolving to handle the rapidly increasing global internet traffic demands. The internet traffic has been growing continuously 1 with the advancement of new technologies and bandwidth-rigorous applications, such as video-ondemand, full high definition (FHD) or 4 K, and the internet of things (IoT). This continuous increase in global internet traffic requires the full capacity exploitation of the already deployed network infrastructure. In this context, the underlying keyenabler technologies are coherent technology for optical transmission and dense wavelength division multiplexing (DWDM) employed for spectral usage of fiber propagation. Elastic-optical-network (EON) and software-defined networks (SDN) pave a path for the open and dis-aggregated optical network in addition to these technologies. The distinctive characteristics of EON and SDN provide flexible and dynamic resource provisioning in optical networks for both control and data planes. 2,3 EON introduces flexibility in the spectral assignment in the data plane and uplifts the network's capacity while minimizing network cost. This resilience leads toward much more intricate lightpath (LP) provisioning than typical fixed-grid wavelength division multiplexing (WDM) networks. However, the SDNcontroller manages the operating-points of distinct network elements independently in the control plane, while enabling personalized network management. Today's optical networks have begun to evolve toward a partial dis-aggregation, with an eventual target of full dis-aggregation. The key step toward network dis-aggregation is examining the optical-line systems (OLSs), which associate the network elements. At present, the proficiency of OLS controllers to handle the optimal workingpoint determines the deterioration in the QoT. 4,5 This workingpoint's precise accomplishment leads to a lower margin and higher traffic deployment rates; therefore, it is obligatory to employ the quality of transmission estimation (QoT-E) for precise estimation of LP performance-the path computationbefore its actual establishment. In this direction, QoT is effectively assessed by the generalized signal-to-noise ratio (GSNR), which incorporates the collective effect of both nonlinear interference (NLI) and amplified spontaneous emission (ASE) noise respectively. 6 Utilizing the features of the transceiver, the GSNR reports the suitability of the path and the deploy-able rate. Conventionally, the network nodes tolerate a variation on their working point (ripples in amplifiers, insertion losses, noise, and gain figure, etc.). This leads to develop an uncertainty in QoT-E that needs a system-margin to avoid network out-of-services or network outage.
In the current study, we suppose a domain-adaption (DA) approach. The DA-approach make use of the available data from source domain "S" (e.g., well-deployed in-service network), where the network operator has the exact knowledge about the working-point of network elements and exploit the useful information to estimate the QoT of LP in the target domain "T" (e.g., a recently deployed or unseennetwork), that is, the network where the system administrator does not have the enough information of the operating point of network elements. The goal of this investigation is to diminish the margin in the GSNR-measurement of the target domain. This reduction in the GSNR unreliability permits the network-controller in target domain to precisely set up the LP with a low margin. Commonly, the controller can get an exact representation of the systemspecifications, that is, network status. The QoT-E makes use of different analytical methods that can compute the GSNR with a very well accuracy as indicated in Reference 7. The application of an analytical method is not valid without the exact knowledge of system-specifications, as it is a pre-requisite to get system-specifications in the present scenario of DA. The latest work-frame about DA infers that the utilization of the analytic method is not recommended for predicting the QoT of LP before its set up in such an agnostic scenario. 8,9,10,11 To overcome this challenge, we choose to apply a data-driven approach as an alternative way, which has already been proved very efficacious for managing optical networks; as demonstrated. 12,13,14 A thorough analysis of various applications of machine learning (ML) in optical networks is described in Reference 15. The authors in 16 employed neural network (NN) to distinguish the integrated circuits and for their complete and precise softwarization. In specific to a distinct focus of this work, that is, estimation of QoT of LP before its actual set up, some very efficacious MLbased techniques, for instance, the methodology based on cognitive-case-reasoning (CBR) is described in Reference 17. In Reference 18, a data-driven ML-based technique is demonstrated to handle OLS in an open environment. Different ML-based techniques are investigated in 19,8,9 for QoT-prediction of LP. Authors in Reference 20 employed convolutional neural networks (CNN) for performance monitoring of optical-transport-network. In Reference 21 one-dimensional CNN model is proposed to estimate multistep performance in operational optical-networks by utilizing bit-error-rate of data. In Reference 10, the authors assessed the performance of two DA-based mechanisms for ML assisted QoT-E of an optical LP. The authors in Reference 22, presented an ML-based technique for QoT-E along with the statistical closed-form method for QoT margin-setting. The authors in Reference 23 presented a transfer-learning-based deep-neural-network architecture for optical-signal-to-noise-ratio (OSNR) monitoring. Finally, the authors in Reference 11 studied the QoT-E accuracy given by a few active-learning (AL) and DA approaches on two distinct network topologies.
The remarkable distinction of the present study is that we proposed to employ CNN for the system-margin minimization of the T network considering the mimicked data of GSNRs response-to-specific traffic configurations of LPs of the S network in an open environment. There are a couple of motivations for employing CNN for this particular work. Most of the ML-based approaches discussed in the literature are required to perform feature extraction manually prior to their learning process. However, CNN is capable of learning domain-specific features automatically. On the other hand, the conventional fully connected NNs result in complex networks and are not capable of minimizing the spatial dependencies. Whereas CNN has sparse connectivity with a reduced number of trainable parameters that leads to reduced computation complexity and memory requirements. The dataset-generation is carried out synthetically by employing the GNpy simulation tool against two distinct networks specified by dissimilar topologies employing the identical fiber-type and communication devices but are distinguished regarding the most exquisite specifications of amplifiers and fiber losses. Our simulation results show that CNN performs very well with a mean-absolute-error (MAE) of 0.18 dB for the S network and on average 0.2 dB for the T network.
The rest of the article is structured as follows: In Section 2, the simulation performed to imitate an open-OLS and data-generation are reported. Section 3, shortly describes the argument that precise QoT-E in terms of GSNR has a prime role in the reduction of the system margin. In Section 4, we described the proposed CNN architecture, which is used in the context of the DA-based approach. Later, in Section 5, we presented detailed results. Finally, the conclusion is given in Section 6.

| SYSTEM MODEL AND DATA GENERATION
The proposed work simulates an open OLS, which incorporates cascaded-amplifiers and fibers. For the simulation setup, the grid size of 50 GHz is assumed with 76 channels on the C-band. Only 76 channels over the total bandwidth of approximately 4 THz are examined because of limited computational-resources. The transmitter generates signals at 32 GBaud, shaped with a root-raised-cosine filter. The signal's launch power is set to 0 dBm, which is kept constant by erbium-doped fiber amplifier (EDFA), operating at a constant output power mode of 0 dBm per channel. The noisefigure of EDFA changes uniformly, between the range of 4.5-6 dB with a ripple-gain variation uniformly with 1 dB variation. Standard single-mode fiber (SSMF) is assumed for all the links with a total distance of approximately 80 km. In addition, fiber impairments such as fiber attenuation (α) = 0.2 dB/km and dispersion (D) = 16 ps/nm/km are also considered. To create the simulation model realistic, the statistics of insertion losses are determined by an exponential distribution with λ ¼ 4, as described in the study. [24][25][26] The paths are computed using the Dijkstra algorithm, with the metric used is the shortest distance path. For the computation of GSNR, the ASE noise is modeled as additive white Gaussian noise (AWGN) with bilateral power spectral density (PSD), including both polarizations. The nonlinear impairments are modeled by the analytical perturbation model, such as generalized Gaussian noise (GGN) model. 26,27 The dataset is generated synthetically mimicking the receiver's signal power, NLI generation during the signal propagation against two different networks and ASE-noise accumulation using the GNPy simulator. The GNPy is an open-source optimization library that is formulated on Gaussian noise (GN) model. 27,7 It provides an end-to-end simulation environment to develop the network model on the physical layer. This library defines route planning in mesh optical networks and can include customized network elements in the network. The synthetic dataset is generated against two different network topologies; European (EU) network and USA-network demonstrated in Figure 1A, B respectively. The EU Network is considered welldeployed and represents the S network while USA Network represents the T network. The two considered networks are the same in terms of fiber and Optical network elements (ONE). However, they are different concerning the amplifier's delicate parameters (noise and ripples gain) and fiber insertion losses. The dataset used in this work consists of five source-to-destination (s ! d) pairs of EU network and two s ! d pairs of USA-network presented in Table 1. The spectral load realization against each simulated link of a dataset is a subset of 2 76 , where 76 represents the total number of channels. We considered 3000 realizations of arbitrary traffic flow varying between 34 and 100% of overall operational bandwidth for every s ! d pair. Thus for EU network topology, 15 000 realizations are generated, and for the USA network topology, 6000 realizations are generated. The considered dataset is then normalized to scale the values. For normalization, Z-score normalization is used: z ¼ XÀμ σ where μ and σ is the mean and standard deviation against each feature, the considered Z-score normalization is applied to both the train and the test data.

| GSNR AS QOT-ESTIMATION METRIC
In general, an optical network composed of optical-networks-elements (ONE) coupled via two-way fiber links, where traffic is routed, add on/dropped as illustrated in Figure 1C. The amplifiers are laid down following a certain span distance considering the EDFAs/Raman amplification or hybrid of both. In the advance optical network, ONE coupled via fibers is typically demonstrated with a discrete controller and an OLS with the particular specifications to place the operational-point for every single amplifier passing through the link. Moreover, the transport-layer services are implemented utilizing reconfigurable optical add/drop multiplexers (ROADM). As stated in recommendations of International telecommunication union (ITU-T), 28 the spectral-usage of DWDM technology is capable of exploiting both fixed or variable spectral-grid that distinguish the spectral-slots for both grid architectures. 29 Exploiting either architecture of grid, LPs can be established, where each LP represents the logical abstraction of suitable links between node-to-node concerning traffic demands. Further to this, a polarization-division-multiplexing (PDM) is utilized over each established LP to propagate from a specified source to destination. In addition to the transmission, LP tolerates various impairments for instance ASE-noise, fiber propagation, and filtering retributions implemented by ROADM. Also, it has been widely reported in the literature that during fiber propagation, the QoT of the LP is affected by the amplitude and phase noise. 30,31,4,32 This incorporated phase-noise is effectually counter-weighted by the receiver's DSP-module, applying a carrier-phase estimator. These specific noises are particularly examined for a very small distance along with a high-symbol-rate communication model. 32 On the contrary, the amplitude noise, commonly known as NLI consistently degrades the performance. Finally, due to the penalty of ROADMs-filtering the level of QoT level decreases, which are commonly measured as an additional loss.
The QoT-E metric for a particular LP routed by definite OLSs from source to destination is given by the wellacknowledged GSNR measurement, which combines both the aggregated effect of NLI disturbance and ASE noise. Generally, GSNR is determined by Equation (1), where OSNR ¼ P Rx =P ASE , SNR NL ¼ P Rx =P NLI , P Rx is the power of the signal at certain channel at the receiver, P ASE denotes the ASE noise power and P NLI denotes the NLI power.
Analyzing the specifications of the transceiver, the GSNR accurately provides the BER, as BER is a common terminology stated by different vendors during the demonstration of industrial products. 6 The nonlinear effects P NLI generated during fiber propagation relies on the spectral-load and the power of the distinct channel. 4 In these circumstances, it is pretty much clear that there is an optimal spectral load for each specific OLS that maximizes the GSNR. 5 Examining the LP propagation effects against a specific pair of source and destination, we provide an abstract view of the operation as a combined impact of every single ONE that adds up the QoT impairments. Simultaneously, given a specific pair of source and destination encounters the cumulative impairments of previously traversed OLSs along with ROADM effects. Each crossed OLS adds a specific amount of NLI The E represent OLSs with GSNR i (f ) represented as the weights on a particular edge, shown in Figure 1D. Specifically, for a given LP from the source node I to destination node F that passes through intermediary nodes B, the QoT defines as Equation (2). Following abstraction at network-level, LPs  deployment could be feasible for a specific source node to destination node with the reduced margin, which relies on the GSNR of a specific source to the destination path.

| CNN ARCHITECTURE FOR GSNR ESTIMATION
The ML paradigm, particularly the CNN, which is a subset of deep learning, provides striking attributes that cannot be precisely measured using analytical models.
Typically, ML models achieve cognitive ability by exploiting different perceptive sets of rules to obtain the training data's inherent information. The trained model explores the abstraction of inherent knowledge to execute logical decisions during the testing phase. Commonly CNN is a well-acknowledged model to perform best with image data. The proposed CNN-based model works with numerical data to explore its effectiveness for GSNR estimation of unestablished LP.
In the present work, the dimensionality of the problem makes it more complex to apply the frameworks of the fully connected NN; as for each channel N Cℋ , we consider five distinct feature N ℱ so the total number of mapping features are N ℱ Â N Cℋ (i.e, [N ℱ =4 Â N Cℋ =76] + span = 1). The mapping between such a large number of input features and the respective GSNR configuration needs a system with many trainable factors, which increases the training time and will be easily susceptible to problems such as over-fitting and local minimum. Also, using the conventional approach of a fully connected network would create unnecessary complexity as it would not be able to get benefit from the hidden correlation between the input data. Regarding these two key problems, we concluded CNN as more remarkable, as they have been designed to process data coming in multiple arrays, as images. Moreover, they can effectively encapsulate the spatial and temporal dependencies in twodimensional form data by applying relevant filters 33 and weight sharing. For the CNN-based model studied in this investigation, we consider a set of features and number of samples as a two-dimensional input to the network intending to estimate the GSNR of an unestablished LP illustrated in Figure 2. The proposed framework comprises two network stages for end-to-end training, that is, feature-extraction network and regression-network. First, the pre-processing of the input data is performed using z-score normalization. In this work, we utilize a two-dimensional dataset consist of 12 000 rows (number of samples) and 305 columns (set of features), respectively. In order to obtain compatible three-dimensional data, we reshaped our data dimensions from two-dimension (12 000 Â 305) to three-dimension (12 000 Â 305 Â 1). After that, the three-dimensional normalized data is directed to the feature-extraction network, which consists of a total of three layers, an input layer and two convolution layers (conv-layers). Conv-layers are used as a feature detector to extract significant features from the input data for better prediction. In addition to this, each conv-layer is using ReLU activation function to accelerate the training process. Moreover, the average-pooling layers are placed between succeeding conv-layers to carry out spatial-pooling. The main purpose of pooling layers is to lower the spatial size of input feature-maps, leading to the reduced number of parameters and computational-complexity. It is worth mentioning that the reduction in spatial size is equivalent to the kernel size of the pooling layer that is 2 Â 2 in our case and it reduces the spatial size of input features by a factor of 2. Thus, each layer produces a compact and informative description of input features. The output of the featureextraction network is a three-dimension representation of input data. Moreover, the flattening layer is used to convert three-dimensional representation into a one-dimensional array of feature vectors and then this feature vector is finally passed to the regression network. The purpose of this network is to map extracted features to the GSNR of LP. This network includes one fully connected layer and an output layer with 1 neuron to output the estimated GSNR of LP. The detailed parameters of our proposed CNN architecture are given in Table 2(a).

| RESULTS AND DISCUSSION
In this section, we first assess the performance of our proposed CNN model for the same domain (SD) scenario and explore the effect of different CNN-layer configurations on the model performance. Then, we further evaluate the performance of our proposed model in the DA scenario. Initially, we train the CNN model on the EU network ("S" network), considering two hidden layers with 32 neurons in each layer for the SD scenario. We utilized 12 000 samples of the EU network for training and 3000 samples from the EU network for testing. The utilized set of features to describe CNN models' features includes power, ASEnoise, span-length, NLI, and the total distance of the path. The proposed CNN model's performance is assessed using the MAE metric; it quantifies the GSNR-predictions of the CNN model by getting the mean-absolute difference of all the estimated values with the actual values. The initial configuration of our proposed architecture illustrated in Figure 2 provides an MAE of 0.32 dB. Also, to obtain a more accurate prototype, various configurations of the proposed CNN layout are investigated. In the prosecution of this objective, Figure 3A demonstrates the MAE of the predicted GSNR against different CNN architectural designs with an increasing number of neurons for each layer. Observing Figure 3A, the architecture with two-layers produces better results in terms of MAE than the more dense architecture. The addition of extra hidden layers leads to a more extensive and complex network layout. In addition to this, Figure 3A also reveals that with the increase in the number of neurons for each layer ranging from 32 to 64, the model's performance enhances. Moreover, when we further increase the number of neurons, the CNN-model performance does not get better any further with the given number of 12 000 training samples. Therefore, we choose to analyze the performance of CNN architecture with two hidden-layers and 64 neurons for evaluating both the SD and DA scenarios.
The proposed model is developed by utilizing the python-based higher-level application program interfaces (APIs) of the open-source TensorFlow platform, particularly Keras library. Furthermore, we consider four activation functions for this investigation to assess their impact on model performance (see Table 2(b)). In Table 2(b) the comparison of four considered activation functions exhibits that the model with the ReLU as an activation function performs better, having lowest MAEs compared to other models. 34 Moreover, we utilized the default adaptive moment estimation (ADAM) optimizer for this study. To avoid over-fitting the model, we set the epochs as the stopping factor. The training phase is accomplished by executing several epochs until there is no further improvement in the model's performance. The performance concerning the training step is depicted in Figure 3B, which demonstrates that, at the start of the training procedure, MAE is very high (i.e., 0.62 dB), and it gradually decreases with the increase in the number of epochs till specific limit (350 epochs in this case), further incrementation in epochs do not improve the system performance. Finally, we found that the proposed model converged at 350 epochs in the present simulation scenario. The considered CNN model is executed on a system with an Intel Core i7 8550U 1.80 GHz CPU workstation equipped with 8 GB of RAM. The CNN model's training time against several layers is demonstrated in Figure 3C. The proposed CNN model takes ≈ 7 h to train with two hidden layers, 64 neurons for 350 epochs.
The proposed CNN model's performance is evaluated by using the MAE metric for SD approach, where the CNN model is trained on some EU-network paths and tested on another path of EU-network. The first four paths of the EU network are used to train the CNN models, and the last path is used for testing the models reported in Table 1. We utilized all the given features of the 76 channels for training the CNN model to estimate the GSNR of single channel under test (channel-1 is considered in this case). The result of the test path, from Paris to Rome, is depicted in Figure 3D. Figure 3D shows the Predicted GSNR value against Actual GSNR with a mean (μ) and standard deviation (σ). Observing the statistics μ and σ, it is pretty much clear that the CNN model performs very well in terms of GSNR prediction. The CNN exploits its ability of dimensionality reduction to perform automatic learning of the different correlations among a large set of features. In CNN, the dimensionality reduction is achieved by synergic use of convolution and pooling operations. This reduction in dimension of a massive set of input features substantially minimizes the risk of over-fitting and computational complexity.
We further evaluate the CNN model's performance in DA scenario, where the model is trained on four paths of EU-network and then tested on two paths of USA-network given in Table 1. The result of the DA approach is illustrated in Figure 4A,B. These results demonstrate the proposed CNN models' prediction performance against the two paths, that is, Houston to Jacksonville and Orlando to Philadelphia, of the USA network. Observing the result statistics shows that the predicted-GSNR values with the CNN model seem to follow the same distribution as the real GSNR-values. The error in predicting GSNR (ΔGSNR) is defined as: ΔGSNR = GSNR Predicted À GSNR Actual . The ΔGSNR μ and σ of the worst predicted path (Houston to Jacksonville) of USA network is À0.1375 and 0.124 dB. The maximum error in predicting GSNR using the proposed CNN is (ΔGSNR max = 0.372 dB), and is estimated by considering 3σ of the worst predicted path of the USA network. From the obtained results, we concluded that the CNN model also performs very well in DA scenario due to its capability to learn complex hidden patterns, resulting in better generalization.

| CONCLUSION
The advanced prediction of the QoT of LP before its actual deployment has techno-economic importance for the network's operator during the design and operating phase of optical networks. In this context, a CNN-based framework is proposed in the SD and DA scenarios for precise LP estimation before the network's actual deployment. The proposed CNN architecture consists of two networks: (1) featureextraction with input and two conv-layers to extract useful features and (2) a regression network to estimate GSNR of LP before its actual provisioning in a network. Our simulation results show that the proposed framework performs very well in predicting GSNR for both the SD and DA scenarios scenario.

ACKNOWLEDGMENTS
This publication has been produced with co-funding of the European Union for the Asi@Connect Project under Grant contract ACA 2016-376-562. Open Access Funding provided by Politecnico di Torino within the CRUI-CARE Agreement.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.