Assessment of adaptive optics‐corrected optical links statistics from integrated turbulence parameters through a Gaussian process metamodel

With the development of free space optical links (FSO) for space‐ground communications comes the need to mitigate the effects of the atmospheric turbulence to guarantee a lossless connection. By having a network of addressable ground stations, we want to guarantee to always target a point where the link is available. Assuming atmospheric transmission is managed thanks to site diversity, we focus only on the influence of atmospheric turbulence on the signal injected into a single mode fiber on the downlink. The use of adaptive optics (AO) is assumed to avoid turbulence‐induced signal disruptions and enable a sufficiently high level of received signal for data transmission. Up to now, AO performance adequate assessment required the knowledge of high‐resolution Cn2 and wind profiles. With the advent of integrated atmospheric parameters measurement instruments, we investigate here the possibility to estimate AO‐corrected performance from a limited number of integrated parameters. In this paper, we propose to use a Gaussian process metamodel to assess the statistics of the received optical power after an AO correction. Taking as input only four integrated parameters of the turbulence profile and associated wind profile, which can be measured with simple instruments, the estimation error on the value of the 1% quantile of the received optical power is inferior to 0.7dB . We also demonstrate the possibility to estimate the half correlation time of the received optical power using the same integrated parameters.

between a GEO satellite and a fixed Earth ground station for missions such as Internet delivery and/or data repatriation from point to point on earth.However, the development of these atmospheric optical links remains conditioned by their availability, which is highly dependent on the atmospheric channel: absorption, scattering, and turbulence.Cloud masking issues can be managed thanks to site diversity, 3 and we focus here on atmospheric turbulence influence.
With the wish for very high bandwidth comes the necessity to inject the signal into a single mode optical fiber (SMF) to be amplified and/or to enable coherent detection.Deep fluctuations in the injected signal happen due to atmospheric turbulence that causes amplitude variations, that is, scintillation and wavefront distortions.A significant telescope aperture enables averaging of the scintillation effects as the size of the speckles becomes small in comparison with the receiving aperture.Phase-related fadings in the received signal can be reduced with the use of adaptive optics (AO) systems to correct the distorted wavefront and maximize the coupling efficiency in the SMF. 4,5AO and pupil averaging, because of technological and cost limitation, does not perfectly correct for turbulent channel errors.To maximize the retrieved information, digital mitigation techniques can be used such as forward error correction on an interleaved signal 6 in addition to the physical mitigation technics.Forward error coding will operate efficiently if the error probability (hence fading probability) remains reasonable on the interleaved signal.The size of the interleaving window depends on temporal characteristics of the received optical power (ROP) such as the typical coherence time. 7Those techniques will have a big impact on the bandwidth and on the latency of the whole system and thus must be adequately scaled according to the ROP.Knowing what would be the ROP distribution and ROP coherence time on any site and at any moment would give us precious information on how to adapt the system in real time.However, this statistic highly depends on turbulence conditions knowledge along the line of sight and on the AO system.Considering the significant variability of turbulence conditions with the location of ground stations, several initiatives have recently been taken to gather local measurements of the most relevant atmospheric parameters with respect to AO-corrected optical links.Some of these initiatives deploy high vertical spatial resolution measurements, 8 whereas others focus on integrated turbulence parameters. 9owledge of the turbulence and wind profiles at kilometer vertical resolution along the line of sight guarantees to precisely assess fading statistics, 10 hence an accurate optical link performance evaluation.High-resolution atmospheric turbulence characterization can be performed thanks to rather complex instrumentation such as Moon Limb Profiler by nighttime 11 or Sun Limb Profiler. 11Such instruments are being deployed to demonstrate high-resolution turbulence characterization capacity on several sites over Europe. 12They will provide high-resolution C 2 n profile characterization along the line of sight between the ground and the direction of the target used, which might be different from the optical links direction, the impact of this difference of line of sight on the optical links availability being hardly documented.Initiatives to systematically document integrated turbulence parameters sometimes with really simple instrumentation are also emerging. 13Considering the importance of integrated turbulence parameters in the assessment of AO-corrected error budgets, indications exist in favor of a link performance which would depend only on a few integrated parameters, but such a relation between corrected optical link performance and integrated turbulence parameters has never been clearly established so far.The exact expression of the correction residuals involves a complex combination of moments of the C 2 n profile and wind, whose weightings depend on the number of corrected modes and the tractability of analytical expressions raises real challenges for a clear-cut demonstration that integrated parameters are sufficient to characterize optical link availability.Machine learning methods associated to physical performance models might provide crucial indications to answer this question.
It is the major prospect of this paper: investigating the possibility to assess AO-corrected optical link availability from integrated turbulence parameters by machine learning and to identify the compulsory parameters to be monitored, thanks to a sensitivity analysis.
Over the past 10 years, some studies have taken advantage of machine learning for atmospheric turbulence estimation or temporal prediction.Most focus on assessing C 2 n near the land surface, such as Wang and Basu 14,15 which propose to use a multilayer perceptron (MLP) trained, respectively, on seven measured meteorological input variables: wind speed, temperature and temperature gradient, soil temperature, relative humidity, net radiation and sol water content, or only five input variables: wind speed, relative humidity, pressure, wind shear, and potential temperature gradient.In Su et al, 16 only four meteorological variables are used: temperature of the surface, temperature, wind speed, and relative humidity measured at 0.5 and 2 m.Prediction results are overall accurate but associated with a particular scenario.In addition to these MLP metamodels, Jellen et al 17 compared three other metamodels: polynomial regression, random forest, and boosted regression trees with six input variables: air temperature, air-water temperature difference, pressure, relative humidity, wind speed, and solar radiation.Best results were obtained with random forest, but the predictions were not always accurate.Some deep neural networks have also been used more recently: in Lamprecht et al, 18 a ResNet residual network is proposed to retrieve the refractive index structure parameter from the height above sea level and the corresponding wind speed, instead of relying on analytical formulae.The performances are promising, but it would require to collect training data from many different places on earth in order to deliver accurate results.A recent PhD work aimed at forecasting future daytime C 2 n conditions from prior meteorological data: wind speed, pressure, temperature, relative humidity and solar irradiance, and C 2 n measurements. 19Neural networks (MLP and recurrent neural network) are used to create a low-altitude model capable of forecasting C 2 n up to 4 h later using 16 h of prior measurements.The forecasting quality is not always sufficient, best in the middle of the day, moderate in the morning, and generally worst in the evening.
Finally, some recent approaches 20 use deep neural network to infer the atmospheric turbulence refractive index structure parameter C 2 n from short exposure images of turbulence-induced laser beam intensity scintillations.
Other studies focus instead on integrated turbulence parameters temporal prediction.Among these, we can cite Milli etal., 21 where turbulence nowcasting, that is, the ability to forecast the turbulence conditions over the next 2 h, is investigated at Paranal Observatory.MLP performed best among the three metamodels tested: random forest, MLP, and long short-term memory (LSTM) deep network trained on 1 or 2 h history of meteorological and integrated turbulence parameters such as seeing, coherence time, temperature, pressure, wind speed, and direction.
In Giordano et al, 22 a random forest metamodel is trained to predict the seeing over the next 2 h on a large atmospheric database measured by the Calern Atmospheric Turbulence Station, including ground meteorological conditions, vertical profiles of the C 2 n , and integrated parameters characterizing the optical turbulence: seeing, isoplanatic angle, and coherence time.
As far as free space optical links (FSO) are concerned, machine and deep learning methodology mostly focus on compensating the effects of atmospheric turbulence on the performance of the whole single input single output (or SISO) and multiple input multiple output (or MIMO) FSO system 23 or on predicting parameters of the FSO channel, 24 such as optical signal-to-noise ratio.Closer to the methodology we propose in this paper, two publications aimed at predicting the RSSI (received signal strength indicator) of the FSO.In T oth et al, 25 pressure, air temperature, particle concentration, visibility, relative humidity, and wind speed at different past time horizons are used as input variables for the metamodels.
Best results were obtained with random forest and enabled to retrieve some atmosphere behavioral patterns influencing RSSI.Lionis et al 26  Here, we focus on the injected power statistics assessment, and we consider input variables associated to the turbulence profile and to the wind profile in a perspective to identify a minimum of compulsory parameters.Using machine learning on a database of turbulence profiles over Tenerife and a physical AO modeling tool, we propose a metamodel to estimate the probability density function (PDF) of the injected power into the single mode fiber of the receiver of the optical ground station.We further develop this metamodel to be able to assess the autocorrelation function of the injected power.For our method to be suitable for a massive deployment of ground stations around the world, we wish to use exclusively data provided by simple and easily deployable instruments.Section 2 describes our atmospheric channel model and the profiles database, Section 3 presents the methodology, including metamodel construction and sensitivity analysis, and Section 4 discusses the numerical results.

| ATMOSPHERIC CHANNEL MODELING
The metric of free space optical communication availability in the case of a coded channel is given by Shannon's noisy-channel coding theorem 27 that states that for successful decoding with arbitrarily small error probability, the capacity of a communication channel must be greater than the rate of the code used (the proportion in bits of the datastream that is useful, i.e., nonredundant).Unfortunately, due to slow-fading or other random factors, the channel capacity can fall below the code rate, hence compromising error-free decoding.In other words, the probability of interruption of the turbulent channel is P outage ðR 0 Þ ¼ P CðSNRÞ < R 0 ð Þ , where C is the channel capacity defined by Shannon 27 as the maximal data rate that can be achieved with the given channel, SNR is the electrical signal-to-noise ratio, and R 0 is the coding rate of the forward error correction code.
The capacity and, on a wider scale, the SNR are strongly dependent on the reception system.The challenge of accounting for all the intricate phenomena arising from different types of receiving chains is beyond the scope of this article.We present here a proof-of-concept study in a straightforward scenario, and we focus our attention on the received power into a SMF (the ROP) statistics.Looking at the ROP statistics will allow us to take into account the interleaving and the coding while avoiding strong hypothesis on the noise statistic, which allows a more exhaustive approach.In the very simplistic case of a noise that would be independent of the signal, we can show that there exists a bijection between the channel capacity fluctuations and the fluctuations of the ROP. 28 order to train our metamodel to estimate the ROP statistics, we first need access to a database of ROP and associated turbulence measurements.All those data are obtained thanks to simulations that we want to be as representative as possible of the most significant phenomena that affect AO performance in the limit of tractability of the model, in the prospect to build a methodology and identify the most adapted ML tools to extract relevant informations from the data.A necessary adaptation of the inputs used in the metamodel will be needed when switching to experimental data.In this section, we describe the AO correction modeling, the database, and the system hypotheses.

| Modeling of the ROP after AO correction
In this work, we study separately the impact of the scintillation on the ROP and the impact of the distorted phase.We call ρ ϕ the coupling efficiency neglecting the impact of scintillation and ρ I the term of scintillation.Correlated time series of ρ ϕ and ρ I are obtained by modeling the fluctuation of the coupled flux in a SMF with a pseudo-analytic AO modeling tool called SAOST (simplified adaptive optics simulation tool). 10,29,30This model neglects the influence of noncommon path aberrations between the wavefront sensor and the injection path and assumes a perfect wavefront sensor (not sensitive to scintillation), a Zernike description of the correction phase, and an infinitely fast deformable mirror (the delay in the loop mostly comes from integration time of the wavefront sensor and calculation of the control voltages sent to the mirror).Comparison between SAOST and end-to-end models can be found in Canuet. 31Experimental validation has been conducted in relevant condition for a GEO feeder link and can be found in Bonnefois et al. 32 In case of moderate turbulence strength, terms of interaction between the scintillation, mostly caused by distant turbulence, and the phase effects, related to close to ground phenomena, can be neglected, and thus, SAOST works under the assumption of independence between ρ ϕ and ρ I .Let f SMF ¼ ρ ϕ * ρ I be the coupling efficiency in a SMF after propagation through atmosphere and AO correction.Analytical expressions of f SMF and its distribution under the hypothesis of independence are described in more details in Canuet et al. 10 Scintillation influence is simulated assuming the small perturbation approximation in the Rytov regime.In practice, this constrains the validity of the approach to limited Rytov variance (typically when the point source log-amplitude variance σ 2 χ < 0:3) which corresponds to the experimental limit of validity of the small perturbation approximation for horizontal propagation as first identified in Gracheva and Gurvich. 33This strong limitation in the case of horizontal propagation is, to our understanding, not restrictive when dealing with vertical propagation where σ 2 χ takes low values.In the database presented in the following part containing 37,059 profiles, less than 0.05% have a σ 2 χ ≥ 0:3.We adopt in this paper the same approach as in SAOST, that is, to consider independently the effects due to the phase error and those due to the scintillation; the benefit of this dissociation is to be able to interpret our results more easily.

| Residual phase error model
The residual phase error is computed in SAOST using a Monte Carlo approach.Following the algorithm described in Roddier, 34 random occurrences of Zernike coefficients are sampled to describe the corrected phase.The temporal correlation of the ROP time series is obtained by filtering the raw Zernike coefficients by a temporal power spectral model 35 in the Fourier domain.Some hypothesis have to be made on the turbulence condition of which we can cite von Kármán statistics for the index of refraction fluctuations spectrum assuming a fixed outer scale.
More details and comparisons with an end-to-end simulation can be found in Canuet. 31

| Scintillation impact
The temporal impact of scintillation on the coupling efficiency can be approximated by the product 10 in the weak fluctuation regime of the scintillation index χðr, tÞ where σ 2 χ is the variance of the punctual log-amplitude and χ AP ðtÞ is the logamplitude averaged on the receiver aperture.

| AO and simulation parameters
In the following, all the generations of power attenuation's time series will be done with the same parameters of AO.We assume an AO system that corrects the first 10 radial orders with a frequency of 2kHz.The simulation is done with a time sampling of 4kHz for a duration of 10 s which gives us 40.000 points per time series.Our telescope is taken with a pupil of 60cm.These choices are being made as they assume a relatively simple and cost effective hardware for this type of application.

| Atmospheric conditions database
As we want to describe the ROP statistics using a set of integrated parameters from the C 2 n and wind profile, the best approach is to build a large database of such profiles.
It should be noted that knowledge of the C 2 n and wind profiles at kilometer vertical resolutions guarantees an accurate optical link performance evaluation using simulation tools such as end-to-end modeling of the atmospheric propagation and the correction by AO.9][40] In addition to this, these profiles present the risk to be affected by measurement noise (hence, influencing the metamodel).To our knowledge, there is no experimental database available that would be representative of the various atmospheric conditions, which leads us to use a database from an atmospheric reanalysis model.Indeed, data provided by numerical models present the advantage to precisely control underlying hypothesis and input parameters at the expense of a more questionable relevance.As the first goal of this work being to demonstrate the possibility to rely on a metamodel for performance assessment in relevant conditions, the possibility to cover a large scope of atmospheric conditions justifies in itself to exploit data obtained from a numerical model.
There exist models based on empirical measurements, some nonparametric such as Greenwood's, 41 H-V Night, 42 and AFGL AMOS 43 and some parametric such as the famous Hufnagel-Valley model, 44 its enhanced version the Hufnagel-Andrews-Phillips 45 model, or the Sadot-Kopeika 46 model.All those models are suitable to describe the average value of the turbulence over a given site and are useful for site selection, but they will not give an instantaneous and accurate turbulence profile description.Overall, these models have been developed for specific sites at given time of the year, and it is not clear to what extent they would be suitable to other sites or meteorological conditions.
As our work aims at characterizing the statistics of the instantaneous ROP and, especially, the distribution's tail to describe the probability of interruption, we need to work with a theoretical model that would describe any small variation in the turbulence induced by different meteorological conditions and would work for any location and any hour of the day.Such a description is achieved using Gladstone's formula and Tatarskii's 36 theory with a model for the outer scale such as Dewan's 47 or HMNSP99. 48In this approach, the C 2 n profile is calculated from precise vertical profiles of meteorological parameters (temperature, pressure, relative humidity, wind speed, and wind direction).
Our database of profiles was provided by Durham University.Wind and C 2 n profiles were obtained through a global turbulence model capable of converting meteorological data from a general circulation model, into three-dimensional optical turbulence maps.This model based on Tatarskii's is developed in Osborn and Sarazin 49 and was confronted successfully to on site measurements in Paranal.It was improved to include a separate boundary layer and enable stronger turbulence strength near the ground to be modeled. 50e general circulation model used is ERA5 51 from the European Centre for Medium range Weather Forecasts (ECMWF).This model, from which the turbulence is calculated, has a spatial resolution of 0.3 along latitude and longitude and provides a forecast for every hour.We chose a grid of 11 by 11 points around Tenerife's island (with a spatial resolution of ≈ 30 km), which is a site of interest for a potential future ground station, and focused on the first 19 days of March 2018.It leaves us with 121 simultaneous measurements for each hour, with some missing values.
On the overall 19 days considered, we thus have 37,059 profiles on 113 pressure levels each.Historical data are freely available on ECMWF website.These 37,059 C 2 n and wind profiles cover a large set of condition.In the following, we consider that these profiles are representative of field data and could be obtained with instruments directly measuring turbulence along the line of sight.

| MACHINE LEARNING METHODOLOGY
As described in Sections 2.1, we consider the residual phase error and the scintillation as two independent phenomenons and the quantity of interest, the coupled flux in a single mode fiber, as the product of both.
In Sections 2.1.2,we presented our scintillation simulation model, which is based on a parametric expression of the distribution that depends only on the two parameters σ 2 χ and σ 2 χ AP (Equation ( 1)).However, the latter is not easily measurable, as it would require an instrument with the same pupil size as our telescope.Our first goal is therefore to be able to assess σ 2 χ AP with machine learning using a small number of easily measurable parameters.Our second goal is to describe the PDF of ρ ϕ with the minimum amount of variables and be able to determine those variables from the same small number of measurable parameters.n and wind profiles coupled with a physical model of light propagation through the atmosphere and wavefront correction with an AO loop.What we aimed at is represented by the dashed box and arrows as we want to shortcut the heavy process of profiles measurement and simulation thanks to machine learning on a small number of integrated parameters easily measurable.

| Residual phase error distribution description
A usual and effective way to predict a distribution using machine learning is to parameterize this distribution and then use a metamodel to predict each parameter.
Here, we were led to study L ϕ ðtÞ ¼ 10log 10 ðρ ϕ ðtÞÞ ð3Þ the loss in power induced by the phase fluctuation in dB as it gives a bigger weight to the low values of ρ ϕ ðtÞ that are the critical values for our application.
Studying the distribution of ρ ϕ ðtÞ on our 37 k profiles, we highlighted that it has an exponential decay, which is consistent with the closed form of the distribution already proposed by Canuet. 31We can show that the logarithm of an exponential distribution is a Gumbel distribution, and we verified experimentally that the distribution that best fit the distribution of our loss L ϕ ðtÞ, in perspective with the Bayesian information criterion and sum of square error, is a Gumbel distribution of the following form: where z ¼ xÀμ β .This result is particularly interesting as, with the Gumbel distribution being a good enough approximation of the density probability of power attenuation, we can describe the PDF with only two parameters, μ and β, that contain all the information on the statistic of power attenuation.
The quality of the fit can be seen by looking at the relative error measured between the quantiles of the experimental distribution and those of the theoretical distribution.We show in Table 1 the statistics of relative error made on some relevant quantiles.The statistics are given for the whole database.For example, the relative error on the 0.01 quantile is, in average, of 0.57% and is below 1.92% in 99% of the cases.Such small relative errors emphasize to which extent our fit is appropriate.It is to be noticed that, the number of data of our simulations being finite, the error on the smallest quantiles can be due to a lack of data to precisely estimate the latest, as much as a nonability of the Gumbel distribution to F I G U R E 1 Schematic description of the method proposed in this paper.describe precisely the smallest quantiles.Further work in this regard was not conducted as the quality of the approximation is good enough for our application.
Figure 2 aims at visually demonstrating the goodness of the fit in the diversity of conditions encountered.We selected three sets of C 2 n and wind profiles from the whole database, one corresponding to the average value of mean power attenuation over the 37,059 profiles (orange), one to the worst value of mean power attenuation (blue), and the last one to the best value of mean power attenuation (green).

| Gaussian process (GP) metamodel
Different machine learning techniques have been tested of which we can cite gradient boosting, MLP, support vector machine regression, and GPs. 52Best results were obtained for a GP, and we only describe this metamodel here.
A GP writes the output of interest as the sum of a regression part, a constant term in this study, and a centered stochastic process Z: The stochastic part ZðxÞ is a Gaussian-centered process fully characterized by its covariance function CovðZðxÞ, ZðuÞÞ ¼ σ 2 Rðx, uÞ with σ 2 the variance of Z and R the correlation function, or kernel, that accounts for spatial correlation effects.
In this study, we focus on a stationary process Z, which means that, for new points, the prediction consists of a linear combination of the observed values, with weights that depend on the distance between the new input point and the training data.The assumption is that the closer the inputs are, the more correlated the outputs are.The kernel we chose is a Matérn 5/2 kernel, an extension of the radial basis function kernel, T A B L E 1 Relative errors on quantiles with the Gumbel fit over the 37,059 power attenuation's series.F I G U R E 2 Gumbel fit on the distribution of L ϕ in three cases: First one in blue corresponds to the minimum of coupling efficiency, second one in orange is an average coupling efficiency, and the last one in green is obtained for the profile with the highest coupling efficiency.
one of the most commonly used forms of kernel.The Matérn kernel computes the similarity of two given points x and x 0 in dimension d as follows: where the θ j are the hyperparameters and should be optimized in addition to β 0 .
One of the advantages of GPs is that an estimate of the uncertainty associated with the prediction is available.
In order to evaluate the predictive ability of the metamodel, we rely on the predictivity coefficient Q 2 , which stands for the percentage of the output variance explained by the metamodel.It is the same as the determination coefficient R 2 but computed on n test data Y i instead of training ones: where Y is the mean of the test data and b Y i stands for the output of the metamodel for the same input values as Y i .Q 2 is between 0 and 1 and should be close to 1 for an accurate metamodel prediction.

| Choice of relevant inputs
We have based our choice of integrated parameters on the integrated parameters regularly used to describe the error budget of an AO.The first chosen parameter is Fried's parameter, an essential parameter when one is interested in the effects of turbulence.Fried's parameter r 0 is defined as the typical diameter of a telescope whose resolution would be limited by atmospheric turbulence.In the case of a plane wave, considering a Kolmogorov spectrum, we have along the line of sight: where λ is the wavelength (1.55 μm).Experimentally, Fried's parameter can be estimated either from the amplitude jitter of a star at the focal plane of an imager or more robustly thanks to differential imaging such as performed with a DIMM. 53e second parameter, denoted h, 54 is a measure of the height dispersion of atmospheric layers, homogeneous to an altitude: It provides an assessment of the physical origin for the angular decorrelation of the phase perturbations (the influence of distant turbulence layers) while being independent from the turbulence strength.It is useful to characterize and compare different profiles.h is related to Fried's parameter and the isoplanatic path θ 0 by 55 : The estimation of h can therefore be performed thanks to a measurement of θ 0 , which is derived, for instance, from limited aperture averaged scintillation by nighttime 56 or thanks to a Shabar measurement by daytime. 57The last parameter, denoted v, describes an average wind speed over the turbulent layers 59 and similar to h is given as follows: where v is the modulus of the transverse wind velocity.It is related to the turbulence coherence time τ according to the following: The turbulence coherence time can be extracted, for instance, from the temporal analysis of the jitter of a bright enough point source image.
This evaluation can then be exploited to provide an estimation of v. r 0 , h, and v are quantities that characterize the phase of the wave and σ 2 χ the intensity fluctuations.The link between phase and intensity perturbations involves diffraction, which is not accounted for in the calculation of the integrated parameters, hence the need to be able to measure The Rytov approximation discussed in Section 2.1 assumes that the refractive index fluctuations are small compared with the mean refractive index, allowing for linearization of the wave propagation equation.This approximation is valid for weak turbulence, where the fluctuations are small and the wavefronts do not experience significant bending.Under this regime, σ 2 χ can be approximated in the following way: According to, 33,58 σ 2 χ value is precise when it is less than 0.3, which is in accordance with the studied database.Based on this considerations, the point source log-amplitude variance σ 2 χ is supposed to be measured from the scintillation of a bright point source with a small diameter instrument.
Thus, the PDF of the scintillation only depends on the unknown parameter σ 2 χ AP .

| Sensitivity analysis
In this study, we rely on Sobol's indices, 60 also known as variance-based sensitivity analysis.The variance of the output of interest Y is decomposed into fractions, which can be attributed to each of the moments we use here as input variables for our metamodel.The values of Sobol's indices enable to rank input variables according to their importance in the uncertainty of the output.The first-order Sobol index S i characterizes the contribution of a given input X i to the output variance, and the total Sobol index S Ti measures the contribution to the output variance of the studied input X i , including all variance caused by its interactions of any order with all other input variables.They are estimated thanks to a Monte Carlo method and can be written as follows: and where E stands for the esperance of the random variables and X $i all X except X i .First-order indices vary between 0 and 1, and the difference between 1 and their sum characterizes the global influence of interaction effects.If the total index associated to an input variable is close to zero, this input has a negligible impact on the output variability and can be set at a constant value.On the contrary, Sobol's indices close to one indicate that the input variable is influent.
Sobol' indices are very often used to determine the sensitivity of a simulation code to a specific input but works under the assumption of independence between input variables; their interpretation becomes hazardous in the case of correlated inputs.In our case, r 0 , h, and v are strongly correlated being all moments of the same profiles.To deal with correlated inputs, methods have been developed around Shapley values, which come from the field of cooperative games theory. 61,62The associated Shapley indices are designed as a simple and easy way to interpret effects of the interactions and dependences contributions between the inputs involved on the total output variance.

| Metamodel construction
Our database is split in two datasets: one training set containing 15% of the profiles randomly selected and a test set containing the remaining 85%.This arbitrary choice corresponds to 5558 randomly selected profiles.Less profiles would be enough to learn the few parameters of this GP as long as they cover the variability of the encountered cases.
The complexity of GP algorithms is Oðn 3 Þ due to the need to invert an n Â n matrix; a 15% training set is a good trade-off between prediction accuracy and learning time with 20 min of single core time on a modern processor for the training process.
Figure 3 shows the estimated value against the one obtained with our metamodel.The coefficient of determination Q 2 is superior to 0.99 in every case and shows how observed outcomes are predicted by the model.We find 0.9994 for the prediction of μ, 0.9992 for β and 0.9969 for As we are able to predict μ, β, and σ χ AP from the moments, the next step is to reconstruct the probability density of the ROP using the parametric descriptions of L ϕ ðtÞ (Equation ( 3)) and ρ I (Equation (1)).An example is given in Figure 4 where, for one randomly selected profile, we can see, from left to right, the statistical reconstruction of L ϕ ðtÞ, ρ I , and the ROP.
In order to characterize the relevance of our estimation on the ROP's statistic, we can compute the absolute error made on the mean and standard deviation of the reconstructed PDF of the ROP.We also look at the absolute error on the 1% quantile of the ROP (see red dashed line in Figure 4) as we want a faithful reproduction of the tail of the distribution.Statistics on the absolute error associated to our 37 k profiles can be seen in Figure 5.
On all profiles, the prediction error on the value of the 1% quantile is inferior to 0.7 dB which is compatible with current assumptions done in commonly used link budgets (margins are typically 3dB).
Once again, we have to put into perspective this value with the fact that the temporal series of ROP generated with SAOST are finite, and thus, part of the error is due to the nonperfect convergence of the random variable.The weight of this error due to convergence in the overall error has yet to be determined.

| Sensitivity analysis
After finding satisfactory results with our metamodel, we want to describe the impact of each input variable on each output.To do so, we are using sensitivity indices: first and total order Sobol indices as well as Shapley indices, which were presented in Sections 3.4.Sensitivity indices were estimated using "Sensitivity: Global Sensitivity Analysis of Model Outputs", 63 an open source, GPL-2 licensed, R 64 library developed for the treatment of uncertainties.
F I G U R E 3 Predicted values with a Gaussian process using r 0 , h, and v versus real values of μ, β, and σ χ AP on the test data.
In Figure 6, we notice that the variability of h has little impact on μ and β.We thus can wonder if we would be able to predict the value taken by μ and β with the same accuracy if we were to fix the value of h.Doing so and using the same GP structure as the one described in Sections the large values of β, that is, the cases were the residual phase error after AO correction is the highest.Even if h is not essential to adequately describe the residual phase error, the additional information it provides on the structure of the profile allows better treatment of cases of strong turbulence where the correction by AO is the worse.
Same analysis can be conducted for the prediction of σ χ AP where, according to Figure 6, the impact of v seems very low: Its total Sobol index is almost equal to zero, and its Shapley index is larger but also accounts for its correlation with the two other input variables.Removing v from the input parameters of our metamodel leads us to a prediction score of 0.9955, very close to the 0.9969 obtained with v.This result was to be expected as expressions of the variance of log-amplitude averaged on a pupil that can be found in the literature are independent of the wind speed profile. 65

| Autocorrelation assessment
The next step of our study is to be able to estimate the autocorrelation time of the ROP using the same moments.Knowledge on the temporal behavior of the ROP is fundamental as the duration of the fading in the received signal will dictate the use of numerical mitigation techniques and the latency in the telecommunication protocol.
Once again, it should be noted that analytical expressions exist to describe the auto-covariance of L ϕ 10 and of ρ I . 66,67The ultimate goal is to estimate those autocovariance functions using a small number of instruments but, in order to simplify the problem, we first look at the halfcorrelation time of L ϕ and ρ I time series independently.A simplistic way to predict the autocovariance could then be to fit an exponential decay law matching the estimated half correlation time.

| Metamodel construction
The approach is exactly the same as for the prediction of the PDF: Half-correlation times were computed using SAOST, and we use r 0 , h, and v as inputs of our metamodels.Once more, best results were obtained using GP regression with a Matérn 5/2 kernel.
Figure 7 shows that the prediction using r 0 , h, and v is able to recreate the trend but that it lacks precision with a high variance on the distribution of the error.The chosen input moments do not account well enough for the temporal aspect of the turbulence, and it is then necessary to consider an additional measurement.
One way to recover the missing temporal information on the atmospheric layer is through measurement of the power spectral density (PSD) of the scintillation on a small pupil instrument.To simulate the measurement of such an instrument, we used the expression given in Shen et al 67 and calculated the PSD of the scintillation for a 5cm pupil at a wavelength of 1500nm.Simulation was done on 100 points spaced evenly on a log scale from 1e À4 Hz up to 1000Hz.
In order to add information associated to the scintillation spectrum in our model, we wish to reduce the data dimension while keeping the maximum of information.Indeed, there is a lot of information redundancy in the 100 points used to simulate the power spectrum density, but these points have to be decimated in a nonlinear way if we want to keep the relevant information no matter the profile.
Trials were done using the cutoff frequency, slope, and magnitude of the PSD, but these added input did not result in significant improvements of the prediction score.This is due the fact that it is sometimes hard to extract and define this parameters due to multiple regimes in the PSD of the scintillation.
To extract relevant features of the PSD without analyzing each case manually, we used a convolutional autoencoder, a kind of neural network extensively used for data reduction.The underlying concept is simple; the architecture consists of two parts, an encoder and a decoder. 68The encoder learns an encoding of the data and is validated and refined by attempting to regenerate the inputs from the latent space with the decoder.In our case, the input data are of dimension 100 and the encoded data of dimension 5, as shown on Figure 8.A value of 5 was shown to be the minimal value that enables accurate reconstruction of the PSD whatever the profiles might be; increasing the dimension of the latent space above 5 did not result in significant reconstruction improvement.In Figure 9, we can see an example of reconstruction on two randomly picked PSDs from our database.The real PSD computed from Shen et al 67 is represented in blue, while the output of the decoder applied on the encoding of the PSD in the 5 dimension latent space is in orange.We can see that most of the information is conserved in the latent space.
We built a new GP; this time with eight inputs: the three previous moments and the five encoded values of the PSD.
As anticipated, with the added temporal information, we obtained much more satisfactory results than the one described in Results for the metamodel taking only r 0 , h, and v as inputs can be seen in Figure 11, while sensitivity analysis results computed on the model with the PSD added in the inputs are shown in Figure 12.In both cases, we can clearly see that the most influential variable is v, which is not surprising given the fact that the correlation time depends on the displacement speed of the turbulent layer.
When the five moments of the PSD are added (D1 to D5 in Figure 12), we obtain some interesting values.It can be noted that while D1 and D5 have a lot of influence on the metamodel outputs, D2 seems to have very little.It would be interesting to look further into the  F I G U R E 1 0 Prediction of the half-correlation time using Gaussian process on inputs r 0 , h, and v and information on the power spectral density (PSD) of scintillation for a 5 cm pupil.Same other parameters as in Figure 7 We studied here the possibility to assess downlink GEO to ground optical link ROP statistics by applying a machine learning methodology.We demonstrate that, assuming several simplification hypotheses on the AO system performance model, a set of a very small number of integrated turbulence parameters appears sufficient to precisely assess ROP statistics.Moreover, exploiting the temporal power spectrum of the scintillation recorded by a small diameter receiver eases the evaluation of the ROP correlation time.An experimental validation of such a methodology could be conducted on real data, first by restricting the cases of application to those where the assumptions made to build the metamodel are satisfied.
Further investigations are being conducted to extend the method to the uplink case and to evaluate the impact, in terms of performance assessment, of a deviation of the instrument aiming angle to the link line of sight.

Figure 1
Figure 1 sums up the methodology; the upper boxes with full lines represent the current state of modeling the statistic of ROP.It requires the use of C 2 n and wind profiles coupled with a physical model of light propagation through the atmosphere and wavefront correction with an AO loop.What we aimed at is represented by the dashed box and arrows as we want to shortcut the heavy process of profiles measurement and sim-

3. 2 ,
we obtain a prediction score of 0.9988 for μ and 0.9984 for β.The slight decrease in Q 2 is due to a bigger error on the small values of μ and F I G U R E 4 Probability density function (PDF) computed from simplified adaptive optics simulation tool (SAOST) time series and PDF computed from our metamodel with, from left to right: L ϕ ðtÞ, ρ I , and the received optical power.F I G U R E 5 Histogram and CDF of the absolute error on prediction of the mean, the standard deviation, and the 1% quantile.F I G U R E 6 Sensitivity analysis on μ, β, and σ chi .Red: first Sobol indices; blue: total Sobol indices; green: Shapley indices.

Figure 7 (see Q 2 in
Figure 10: 0.99 and 0.98 against Q 2 in Figure 7: 0.88 and 0.84)4.2.2 | Sensitivity analysisUsing the same tools as the ones described in Section 4.1.2,we conducted a sensitivity analysis and calculated first and total Sobol indices as well as Shapley indices for each of the models described in Section 4.2.1.

F I G U R E 7
Prediction of the half-correlation time using Gaussian process (GP) on inputs r 0 , h, and v; x-axis shows the real value and y-axis the predicted one.On the left are the half-correlation neglecting scintillation effects and on the right are neglecting phase effects.F I G U R E 8 Architecture of the autoencoder used to encode the power spectral density (PSD) weights of the autoencoder to better understand what features of the PSD does each of the encoded value represent most and thus what part of the PSD function is important in the description of the demicorrelation time of the ROP.This study is not trivial and will not be conducted in the context of this article.

F I G U R E 9
Comparison of real and reconstructed power spectral density (PSD)