Machine-learning based noise characterization and correction on neutral atoms NISQ devices

Neutral atoms devices represent a promising technology that uses optical tweezers to geometrically arrange atoms and modulated laser pulses to control the quantum states. A neutral atoms Noisy Intermediate Scale Quantum (NISQ) device is developed by Pasqal with rubidium atoms that will allow to work with up to 100 qubits. All NISQ devices are affected by noise that have an impact on the computations results. Therefore it is important to better understand and characterize the noise sources and possibly to correct them. Here, two approaches are proposed to characterize and correct noise parameters on neutral atoms NISQ devices. In particular the focus is on Pasqal devices and Machine Learning (ML) techniques are adopted to pursue those objectives. To characterize the noise parameters, several ML models are trained, using as input only the measurements of the final quantum state of the atoms, to predict laser intensity fluctuation and waist, temperature and false positive and negative measurement rate. Moreover, an analysis is provided with the scaling on the number of atoms in the system and on the number of measurements used as input. Also, we compare on real data the values predicted with ML with the a priori estimated parameters. Finally, a Reinforcement Learning (RL) framework is employed to design a pulse in order to correct the effect of the noise in the measurements. It is expected that the analysis performed in this work will be useful for a better understanding of the quantum dynamic in neutral atoms devices and for the widespread adoption of this class of NISQ devices.


I. INTRODUCTION
In the last few years we are witnessing a revolution in the field of quantum computing.The so called Noisy Intermediate Scale Quantum (NISQ) devices [1] represent the state of the art in this field.The intermediate scale of such devices refers to the fact that at the best of our technologies, we are still capable of dealing with at most few hundreds of qubits.Several error correction codes have been developed to deal with such noise [2][3][4], but they require the adoption of auxiliary qubits further decreasing the resources available for the computation.Pasqal [5] has developed a NISQ device called Fresnel based on a neutral atom quantum processor capable of using up to 100 qubits [6] and provides a Python library called Pulser [7] that can be used to prepare a setting either to run it on the real machines or to simulate it on a built-in simulator.
Machine Learning (ML) is a field in the context of Artificial Intelligence (AI) that deals with the study and realization of models that learn to make predictions after being trained with data [8,9].Artificial Neural Networks (ANNs) are ML methods organized in layers of artificial neurons that performs calculations with weighted summation of the inputs followed by non-linear activation functions.ML methods has already developed in the context of quantum noise characterization [10][11][12][13] and have already been adopted in the context of error estimation.In [14] the authors train a recurrent neural network to * filippo.caruso@unifi.itdetect if certain errors happened in a quantum circuit and use the model to enhance a surface error correction code.Surface error correction codes allows an high error tolerance, however to be implemented they need an high number of physical qubits [15].By contrast, in our proposed approach for noise mitigation, no additional qubits are needed for error detection.In fact, our purpose is to learn how to modify the pulses in such a way as to minimize the effect of noise without implementing error correction codes.Moreover, we estimate the noise in devices with the analog interface and not with the digital one.In fact, with neutral atoms devices it is possible to take advantage of analog and digital modes.With the former, laser pulses can be used to directly manipulate the Hamiltonian of the system: With the digital mode, on the other hand, it is possible to evolve the state of the system through quantum gates, thus creating quantum circuits.In [16] the authors consider the noise to have the form of a Pauli channel and make the assumption that the error rate is modeled with a Gibbs random field (GRF).Those assumptions allows the authors to effectively learn the parameters of the GRF to characterize the noise of a real IBM NISQ device.As discussed below, in our work we use a different noise formalization, in fact we resort on how the noise is implemented in the Pasqal simulator that we use to generate the data to train the deep learning model.RL is a ML methodology that requires the presence of a simulator of an environment where an agent operates [17].The agent is usually implemented as a neural network that is trained to implement the policy that governs the actions of the agent.Initially, for each episode (the elementary phase of each RL algorithm that is repeated over time and is constituted of a series of actions of the agent and reactions of the environment), the agent and the environment are initialized in some initial state.Then, the agent perceive some information about the environment and, based on that, the policy follows a probability distribution of the possible next actions that the agent can perform to change the state of the environment or the state of the agent within the environment.The episode continue with the choosing of the best action according to the policy and new steps until a predefined number of steps or some episode-ending condition.RL have been already used in the context of state preparation and circuit optimization [18][19][20].In the context of noise correction, RL have been adopted to correct the noise that degrades a state over time [21] or to optimize existing quantum correction codes [22].In our work we instead focus on the task to correct the effects of the noise of a defined quantum dynamics without modifying the base pulse.

II. NOISE BENCHMARKING PROTOCOL
A setting consists in the topological arrangement of atoms and the description of the laser pulses that interacts with them.Then, the computation on Quantum Processing Units (QPUs) is structured in cycles of three phases: (i) the preparation of the register, (ii) the quantum processing and (iii) the register readout.In particular, on neutral atoms devices, the preparation of the register is obtained using arrays of optical tweezers [23].Initially the register is initialized with atoms in random positions and afterward the single atoms are moved in the desired positions.The quantum computation is performed analogically using laser pulses that interact with the register atoms and can excite them.The laser pulses are characterized by the values and shapes of the Rabi frequency Ω(t) and detuning δ(t).Finally, the register readout is performed by taking a fluorescence image to capture the energy levels of the atoms.In Pasqal NISQ devices, it is possible to prepare registers of maximum 100 atoms with a minimum distance of 4µm between them arranged in bidimensional structures in an area of maximum radius 50µm.
NISQ devices, as the name suggests, are affected by several noise effects that limit their applicability and the operations that can be reliably executed on them.The devices used in the realization of a quantum computer are not ideal, as they are affected by noise: for example, lasers are not exactly monochromatic, and atoms are cooled by lasers to very low temperatures, but still non-zero.These imperfections have an impact by introducing errors during the preparation of the system, its evolution over time, and the measurement.The effect is that the measured probabilities of occupation are dif-ferent from those we would have obtained from an ideal environment.In general, there are different parameters that can be used to indicate different sources of noise in the device [24].In the present work we will focus on five parameters that are considered predominant for their effects: the laser intensity fluctuation σ R indicates the standard deviation of the fluctuation of the desired Rabi frequency of the laser pulse; the laser waist w is the diameter of the gaussian laser beam; the false positive measurements ε represents the probability of wrongly measure as excited an atom that was in the ground state; false negative measurements ε is the probability of measuring an excited atom in ground state.Table I shows those sources of noise and their estimated values provided informally by Pasqal.The objective of our work is the implementation of ML models to: (i) provide a quantitative estimate of the noise; (ii) mitigate the effects of the noise.We decided to formulate a supervised regression task to quantitatively estimate the noise [16] and to use a Reinforcement Learning (RL) framework [17] to mitigate the noise effect.Regarding the noise characterization, our aim is to show that it is possible to estimate the noise parameters in the form of mean values and error intervals.As depicted in fig. 1, the workflow begins with the simulation of various executions, with different noise parameters, of a quantum dynamic where a global pulse irradiates all the n atoms of a register.Afterward, the atoms occupation probabilities, that we call P = P 1 , . . ., P 2 n , are collected and used to train ANN models to predict the noise parameters that were used to perturb the dynamics: temperature, laser waist, false positive measurement rate ε, false negative measurement rate ε and intensity fluctuation σ R .At the end, the trained models are used on prediction with the real data, obtaining an estimation of the noise parameters.For the the simulations used in the generation of the data and for the training of the models, we use our servers with Nvidia TITAN RTX and GeForce RTX 3090 GPUs.Moreover we could also make use of the CINECA Marconi100 supercomputer.
The rest of the paper is structured as in the following.First, in section III A we consider the simpler problem of characterizing only a single noise parameter, then in section III B we show the results of the characterization of all the aforementioned parameters.In section IV we illustrate the RL error correction protocol that we adopt.
FIG. 1. Scheme of the noise estimation pipeline.A global pulse is defined by the shapes of Rabi frequency Ω and detuning δ (a).A register is prepared with the positions of a set of n atoms (6 in the specific case) that are irradiated by the laser pulse (b).When the pulse ends, the excitation states of the atoms are measured and the process is repeated to gather statistics on the occupation probabilities P = P1, . . ., P2n (c).The probabilities are used as input to an Artificial Neural Network (ANN) that predicts the noise parameters (d).The ANN is trained collecting a simulated dataset of probabilities labelled with the corresponding values of noise.The depicted setting is for the more general multiple parameters estimation.The difference for the single parameter estimation is that the neural network have only one output for σR and the adopted pulses and atoms registers are different.

III. NOISE CHARACTERIZATION A. Single parameter scenario
In this section we consider the estimation of a single noise parameter.After preliminary analysis, we decided to focus on the noise effects that comes from the laser intensity fluctuations σ R .
Before describing the used methods, let us introduce the notation.We will denote by s i the system composed of i qubits.Globally, we consider systems with a number of qubits from 2 to 5 and in the case of 4-qubit systems we denote 6 different topologies with an extra alphanumeric index from a to f .Specifically, s 4a , s 4b , . . ., s 4f .Globally, we collected the measurements of nine different runs on the real Pasqal NISQ devices (6 different topologies with 4 atoms and single topologies with 2,3 and 5 atoms) characterized by a pulse with constant Rabi frequency 2 π rad/µs of duration 660 ns and null detuning but with different number and positions of the atoms.
In order to train the ML models to predict the values of σ R , we simulate the data for computation on the nine registers with different amount of simulated noise effects.In detail, we preliminarily generate a sequence of 10 000 σ R values extracted from a uniform distribution U(0, 0.15).These values are used to add noise in an equal number of simulations, whose results are occupation probability vectors.Therefore, in the end, 10 000 samples are obtained.This procedure is repeated for each of the 9 quantum systems we mentioned above.The occupation probabilities associated with the corresponding values of σ R for the 9 systems are used to evaluate two different scalings: (i) in the quantum register size comparing increasingly larger systems of 2,3,4 and 5 qubits and (ii) in the number of measurements of multiple systems with 4 qubits where the occupation probabilities of all the systems simulated with the same values of σ R contributes to gather information on the noise effects during the training of the ML models.In detail, we decided to use as input to the ML models the concatenation of the probabilities of the systems and, for two systems s A and s B , we indicate the latter with the notation s A ⊕ s B = P 1,A , . . ., P 2 n ,A , P 1,B , . . ., P 2 n ,B .In both scaling, the procedure is always the same: 20 models are trained on each dataset through a 20-fold cross validation.From the 20 predicted parameter values, the average value and the standard deviation can be obtained to include the variability of the models' predictions.Both analyses are performed with linear regression as baseline model and with ANNs.Regarding ANNs, they are trained for 150 epochs with the Adam optimizer and with hyperparameter optimization.For a more in-depth discussion of the technical details related to model design and hyperparameters optimization reefer to section VI A.
In the following, the ML models are trained and validated on the simulated data, and subsequently they are also tested on the real measurements.Using the simulated validation data, it is possible to monitor how the model is capable of generalization to unseen measurements.In this regard, we report in fig. 2 (the scaling (i) in fig.2(a) and (ii) in fig.2(b)) the Mean Absolute Error (MAE), averaged for all the samples of the validation set, between the predicted values of σ R and the ground truth that we recall is the value of σ R used to perform the simulation.Again, having 20 estimates (one for each model), we calculate mean value and standard deviation of the MAE to provide more robust results with associated uncertainty.Regarding the estimation on the real data, we show in fig. 3 3(b) we highlight in green for the linear regression and in red for the ANN a specific case: the concatenation of the measurements on two peculiar settings with four atoms, s 4a and s 4b , that have not only the same amount of atoms but also exactly the same topology.Therefore, the latter can be seen as a special case of the scaling (ii) where multiple measures of the same system are per-formed.Moreover, for the real measurements we consider both orderings s 4a ⊕ s 4b and s 4b ⊕ s 4a whose prediction results are reported with two couples of green and red points in fig. 3

(b) (not clearly visibles in the plot because they are almost overlapping).
As expected, the prediction error is decreasing with the number of atoms in the system because we get more information on the dynamic and thus on the noise influencing it.In fig. 2 we can also observe that ANN are in general more powerful respect to linear regression models (at the cost of more resource-intensive computations).In fact, the errors for the ANN models are always lower respect to the errors of linear regression models and the difference is more pronounced increasing the number of atoms and measurements.This can be explained with a better capacity of ANNs to model complex dynamics.
Overall, comparing fig.2(a) with 5 atoms and fig.2(b) with number of measurements equal to 2, seems to be more convenient to consider more measurements respect to increase the number of atoms of the setting.Also, comparing the green and red points with the black and blue ones for the same number of measurements in fig.2(b), can observe that can be slightly better to consider multiple measurements of the same setting with the same topology respect to collect measurements of a different setting with the same number of atoms.
We observe in fig. 3 that the values of σ R predicted for the measurements of settings with 2 and 5 atoms are close to the estimated value of 3%, however the prediction for the setting of 3 atoms is lower and the predictions for all the settings with 4 atoms, and concatenation of them, are around 7%.An explanation for this mismatch can be that the real data used for the experiments was collected when the device was still under development.Moreover the predictions consider only σ R as a variable source for the noise, thus variations of the other noise parameters in the real machine influence the predictions of σ R .Nevertheless, it is remarkable that the trained models have low standard deviations for the predictions that, even if this does not exclude an high bias error, still suggest a low variance error for the models.We can also observe that the order of the measurement for the settings s 4a and s 4b do not influence the predicted values -in fact the two green circles and the two red circles in fig. 3 are almost overlapping.
To summarize, noise estimation based on supervised learning is possible.The protocol we presented seems to suggest merging data from multiple similar registers instead of larger registers directly.This may be useful because of the difficulty in simulating larger systems.In addition, the estimates obtained are derived by averaging estimates from 20 models.Moreover, the associated standard deviation is small relative to the predicted value, so all 20 models converge to very similar values.Finally, we repeat that having neglected several noise sources, the parameter values found could be effective values.

B. Multiple parameters characterization
In this section we train a deep learning model in a multioutput regression setting to estimate the values of all the noise parameters in table I.We simulated a dataset of 54 000 labelled samples for the 6-qubit system whose topology can be observed in fig.1(b).The used pulse sequence that defines the dynamics is shown in fig.1(a).Analogously to the scaling experiments in the previous section, the measurement for each simulation is obtained sampling 500 runs.The values used in the simulations for each parameter are: σ R = U(0, 0.15), w(µm) = U(0, 200), T (µK) = U(0, 100), ε = U(0, 0.15) and ε = U(0, 0.15).
After finding the best set of hyper-parameters, 20 models are trained using the cross validation procedure to exploit the entire dataset and to obtain the standard deviations of the predictions.Each one of the 20 models is trained with early stopping for a maximum of 150 epochs.Further technical details related to ANNs design and hyperparameter optimization can be found in section VI B.
In table II we show the resulting estimation of the main noise factors.Each reported value is the average of the 20 models trained on different splits with the corresponding standard deviation.We observe that the predicted values do not match those estimated by Pasqal, although all 20 models always converge to very similar values of the predictions.In this regard, the same considerations expressed at the end of section III A are also valid for multi-parameter estimation: ie, that the parameter predictions obtained could be effective values that incorporate other neglected effects (noise sources, influence of other neighboring atoms, etc.).Another possible factor could be that the measurements came from a prototype NISQ, just as in the case of those used in section III A. Therefore, we can expect more agreement in the future as a result of technical improvements.Moreover, it is worth noting that, even if for the experiments in this section FIG. 3. Predictions on real data of the value of σR for the models trained for the scaling in the number of atoms (a) and in the number of measurements (b) reported in fig. 2. We report the average values and standard deviations for the 20 linear regression (in black and green) and the 20 ANN (in blue and red) models in the predictions of σR using a set of real measurements of the settings described in table III run on the Pasqal NISQ devices.The models in (a) uses as input the measurements of s2, s3, s4a and s5.The models in (b) uses as input one or more concatenated measurements of runs of the settings with four atoms (the fourth pair of points in (a) is equal to the first pair in (b)).We report in (b) in black and blue the incremental concatenation of s4a, s4c, s 4d , s4e and s 4f .In green and red we report the concatenation of s4a and s 4b .The order of the real measurements for the latter concatenation is irrelevant, thus we report two green and two red points (almost overlapping and not clearly discernible) to consider the two possible concatenations.The horizontal red line indicates the value of 3% for σR estimated by Pasqal.
the setting and the pulse are different to the ones used in section III A, the predicted value for σ R is comparable to the ones obtained for the estimation of the same parameter in the settings with four atoms previously illustrated.

IV. ERROR CORRECTION
Many techniques have been developed in the theory of classical error-correcting codes [25,26].The key idea on which they are based is mainly redundancy.Nonetheless, the addition of redundancy is not immediate in NISQ devices because of the no cloning theorem [27].However, some sort of redundancy can be achieved in quantum devices by expanding the system to more qubits [28].In fact, all the most used quantum error correction techniques require the use of more qubits than the ones strictly necessary for the computation [29] but it is not feasible with NISQ devices.Therefore, we propose to verify that it is possible to mitigate the effects of quantum noise without extra qubits through the use of RL techniques.RL is a ML area where an agent learns which actions to perform in order to maximize a reward [17].
Schematically, we can say that this is a closed-loop problem because the actions of the learning system influence subsequent inputs.In addition, the learner does not know a priori which action to perform and has to find out for himself through trials and errors which actions lead to larger rewards.Actions can influence not only the immediate reward but also future rewards.RL, unlike Supervised Learning, does not require labelled inputoutput pairs, but focuses on finding a balance between exploration of the actions space in an environment and exploitation of the acquired knowledge.The agent must exploit what it already knows in order to obtain reward, but it must also explore in order to make better action selections in the future.The trade-off is that neither exploration nor exploitation can be exclusively pursued without failing in the task.The agent must try a variety of actions and progressively favour those that seem to be the best.Any problem of learning goal-oriented behaviour can be reduced to three signals that are exchanged between an agent and its environment: a signal to represent the choices made by the agent (the actions), a signal to represent the basis on which the choices are made (the states) and a signal to define the agent's goal (the re-wards).In detail, for each action of the agent at time t, its effects on the environment are quantified by a reward r t .Then the objective of the training is to maximize the discounted cumulative reward R t0 = ∞ t=t0 γ t−t0 r t , where the discount γ ∈ (0, 1) is an hyperparameter that controls the importance of rewards far in the future respect to the ones immediately after t 0 .This objective is implemented with the idea that if we would have a function Q * : State × Action → R that given a state and an action performed over that state, returns the cumulative discounted reward, then the policy can be implemented with π * (s) = arg max a Q * (s, a).In general, Q * is unknown and is approximated by a neural network.For a defined policy π, the Q function obeys the Bellman equation Q π (s, a) = r + γQ π (s , π(s )) where r and s are respectively the reward and the next state obtained after the action a on the state s.The neural network that defines Q, and then the agent, is trained minimizing over a batch of transitions the Huber loss L(δ) of the temporal difference error δ = Q(s, a) − (r + γ max a Q(s , a)).
We choose to correct the standard impulse P depicted in fig.4(a) applied to a single qubit.P has a Gaussian profile in the Rabi frequency Ω of duration T = 500 ns and area π/2 and a ramp profile in detuning δ of duration T = 500 ns with δ 0 = −20 rad/µs and δ T = 20 rad/µs.The choosen approach to correct the noise is to apply the correction pulse fig.4(b) to be placed after the pulse to be corrected and having the same characteristics and length of T = 500 ns.In detail, we choose a Gaussian profile in the Rabi frequency with variable area a and a ramp profile in detuning δ with variable initial δ i and final δ f .In such a way, the final atoms occupation probabilities with the application of the corrected pulse P noisy P +P and after the ideal pulse P ideal P are closer than P noisy P and P ideal P .By the notation P i j we denote the measurement P obtained after running a simulation with the pulse j with or without noise (respectively, i = noisy or i = ideal).The training allows to find the three optimal parameters a, δ i and δ f for the correction impulse P .
In our RL framework, the state is represented by the occupation probabilities that are estimated from the average of 10 independent noisy simulations whose probabilities are extracted from the amplitudes of 25 quantum states uniformly sampled along the simulated dynamic.At the beginning of each episode we choose a = π/20 and δ i = δ f = 0 and they can have values in the ranges a ∈ [0, π/2] and δ i , δ f ∈ [−20 , 20].The agent, implemented with an ANN that have an input layer of 50 units (2 basis for each one of the 25 intermediate states), two ReLU hidden layer of 128 neurons and an output layer of 6 neurons, selects one among four possible actions: a t = a t−1 + ∆a, a t = a t−1 − ∆a, − ∆δ f .We choose fixed values for ∆a = π/200 and ∆δ i = ∆δ f = 0.2.Each episode is constituted of a series of steps at increasing values of t.For each step, the chosen action is applied, a correction impulse P t charac-terized by a t , δ t 0 and δ t f is generated and used in a new simulation obtaining a new probability vector P noisy P +P t for the final quantum state of the corrected noisy simulation and the reward r(t) before proceeding with the next step.The episode ends when the action causes a, δ 0 or δ f to go out of boundaries or after 100 steps.The reward is defined as: averaged for all the steps t within each episode.The evolution of the averaged KL divergence for the 1 000 training episodes is reported in fig. 5 where we can observe that it effectively decreases below the reference value of D KL (P noisy P , P ideal P ) = 0.0011 reported with the red line and calculated with the average for 100 noisy simulations without the correction pulse.

V. CONCLUSIONS AND OUTLOOKS
We presented two applications of ML to the context of quantum noise characterization and correction.To characterize the noise we collected a dataset of multiple simulated noisy measurement of different settings in Pasqal quantum machines to train ML models and we test them on real data.For the noise correction we trained a RL model to find a correction pulse to counteract the effects of the noise affecting a simulated test setting.Regarding the noise characterization, we compared ANN with linear regression models in predicting the value of the laser intensity fluctuation σ R , scaling the number of qubit in the register and the number of measurements of the system.We found that ANN perform better than linear regression and that the model accuracies increases both with the number of qubits and with the number of measurements.Moreover, we have insights that in order to better characterize the noise parameters it is more effective to increase the number of measurements respect to the number of qubits.When we tried to predict the noise parameters on real NISQ devices we found that, for every set of measurement, 40 different models (ANN and linear regression trained independently in a 20 fold cross validation setting) agree on the predictions and therefore the FIG. 4. Standard pulse P (a) to be corrected with a correction pulse P (b) to be added after P to counteract the effects of the noise.The Rabi frequency Ω is depicted in green and the detuning δ in purple.P is a pulse of duration T = 500ns, Gaussian Rabi profile with area equal to π/2 and detuning in the form of a ramp from δ0 = −20 rad/µs and δT = 20 rad/µs.P is a pulse with the same duration and characteristics of P but with variable Rabi area a, initial detuning δi and final detuning δ f .0 100 200 300 400 500 600 700 800 900 1,000 0 0.001 0.002 0.003 Episode Average KL divergence FIG. 5. Evolution of the KL divergence between the corrected noisy simulation and the ideal one averaged for each episode.The red line is the reference value of 0.0011 for the KL divergence between the uncorrected noisy simulation and the ideal one averaged over 100 simulations.
variance error is low.Finally, we trained 20 ANN models in a multiregression setting to predict five different noise parameter values and also in this case the models agree between them when tested on real data.Regarding the noise correction, the proposed approach successfully learns to correct a simulated noisy pulse and to make the measured probabilities closer to the ideal ones.
We believe that the results presented in this work can be used to better quantify the effects of the noise affecting the Pasqal, and in general neutral atoms, NISQ devices and to counteract those effects.The presented tecniques are dependent on the atoms topology and the pulse shape.Thus, the ML models can be trained to characterize and correct the noise of single quantum gates that compose more complex Hamiltonians.
The accuracy of the predicted noise parameters depends on the accuracy of the simulation and in particular on the accuracy of the simulator noise model.
In previous works [11,13,30,31] and in preliminary experiments using Pasqal simulator, there is an evidence of the improvement of the noise characterization when more temporal statistics are collected.We adopted this strategy in this paper for the noise correction, where the occupation probabilities are obtained from the amplitudes of the intermediate quantum states sampled at regular steps within the simulated dynamic.However, in real NISQ devices, intermediate measurements of the dynamic are less straightforward because of the impossibility of observing a system without changing it.We can obtain the same effect independently measuring incremental subdynamics from t = 0 to subsequent time steps of the full dynamic.To implement this approach on Pasqal machines, we can design a full pulse that is subsequently split in sub-pulses at times [t 0 , t 1 ], [t 1 , t 2 ], . . ., [t n−1 , t n ].The measurements at time t k for k = 1, . . ., n can be obtained initialising the register always to the same initial setting and performing the computation considering the effects of all the sub-pulses spanning the times [t 0 , t k ] from the first to the one before t k .The ML models can then process all the measurements obtained at times t 1 , . . ., t k and in that way we expect to obtain better results for the characterization of the noise.Moreover, we can also use ANN more suitable for data organized in temporal sequences, i.e.Recurrent Neural Network (RNN).
Finally, in the context of Quantum Machine Learning (QML) [32,33] our work is framed as a classical ML approach to process quantum data.Future research lines may include the design of QML models for the noise characterization and correction implemented directly within the quantum dynamic of neutral atoms devices or of other NISQ devices.For instance pattern matching QML techniques [34] can be adapted for the identification of noise patterns [13] characteristics to the neutral atoms dynamics.
In this section we describe with more details the topologies of the analyzed quantum systems and the ANN models used in section III A. The registers summa-TABLE III.Quantum systems used for single parameter estimation σR.By the notation si we denote the system formed by i atoms.In the case of 4 atoms, having used 6 different systems for the spatial arrangement of atoms, we use an additional subscript s4j, with j = {a, b, c, d, e, f }.Also, when a quantum register of a system s k is entirely contained in the quantum register of a larger system s k , with k > k, we use the notation rized in table III have an incremental number of atoms from 2 to 5 and some of them are chosen in a way such that the positions of the atoms of every register are included in the subsequent ones as far as possible.
To be precise, with the notation s k ⊂ s k and k < k , we indicate that the quantum register of s k is the same as that of s k with the addition of an atom and that therefore the coordinates of the atoms in common are the same.In detail, the setting with five atoms (s 5 ) have atoms in the same positions of the ones of the settings of dimensionality four (s 4a ) and three (s 3 ) plus extra atoms in other positions.Moreover, s 4a contains all the two atoms of the setting s 2 , but s 3 includes only one of the two atoms of s 2 and s 4a only two of the three atoms of s 3 .We denote the latter properties with the notation s 2 ⊂ s 4a ⊂ s 5 and s 3 ⊂ s 5 .In addition, we collected also further measurements of settings with 4 atoms.In detail, we run a second setting s 4b with the atoms in the same position of s 4a and other four settings with different positions for the atoms: s 4c , s 4d , s 4e and s 4f .We choose those specific settings because we want to evaluate the two different scaling: (i) in the number of atoms, (ii) in the number of measurements of different settings with the same number of atoms.Specifically, we consider for (i) s 2 , s 3 , s 4a and s 5 and for (ii) s 4a , s 4b , s 4c , s 4d , s 4e and s 4f .
The trained ANNs are composed by a single hidden layer of 100 neurons with ReLU activation function and the output layer with a single neuron with sigmoid activation function.The targets are normalized between 0 and 1 before the training and the inverse transformation is applied to calculate the prediction error.The models are developed in PyTorch [35,36] and trained with mini batch gradient descent to minimize the L1 loss using the Adam optimizer [37] with learning rate 0.001 and batch size 512.All models are trained with early stopping for a maximum of 150 epochs.
To perform the scaling (i) we trained four different models using as inputs the 2 2 , 2 3 , 2 4 and 2 5 measurements of respectively the settings s 2 , s 3 , s 4a and s 5 .To perform the scaling (ii) we consider the measurement coming from the following systems: • s 4a ⊕s 4b ⊕s 4c ⊕s 4d ⊕s 4e (2 4 +64 = 80 measurements) In all the cases, the datasets are split in 20 equal parts to perform a 20-fold cross validation and we report the resulting average mean absolute error and its standard deviation for the 20 models.
As a remark, the Pulser simulator allows to specify the number of samples per run to speedup the computation.In that case, for each run the final quantum state is preliminary calculated, then the specified number of measurements is obtained from such state.Even if this expedient is useful to spare resources, we found in preliminary experiments that it is counter-productive in the context of noise estimation.In fact, for all the samples of one run, the Hamiltonian defining the evolution is always the same and also the noise that influences it.For this reason, in our work we keep the number of samples per run equal to 1 forcing the resampling of the noise at each single measurement.
Moreover, in this context is better to consider more measurements respect to increase the number of atoms of the setting.In detail, considering both subfigures, the number of data points for the measurement of the setting with 5 atoms in fig.2(a), i.e. 2 5 = 32, is equal to the ones for two concatenated measurements of settings with 4 atoms in fig.2(b), i.e. 2 4 + 2 4 = 32, but the error in the latter case is lower than the former.

B. Multiple parameters characterization
Before training the models, the noise parameters are normalised between 0 and 1 to avoid uneven prediction error during the loss calculation.The models are implemented in Pytorch [35,36] and are trained with minibatch gradient descent to minimise the L1 loss using Adam [37].Regarding the architecture of the models, the ANNs is a Multi Layer Perceptron (MLP) with the ReLU activation function for all the hidden layers and the sigmoid activation function for the last layer.The best combination of number of neuron layers, number of neurons in each layer, batch size and learning rate is chosen with an hyper-parameter optimization procedure.The latter is implemented using the python library Ray Tune [38] with the ASHA scheduler [39] and the Hyper-Opt search algorithm [40].The ASHA scheduler allows multiple models to be trained in parallel, iteratively interrupting the training of the least promising one and thus reducing the duration of the hyper-parameter optimization.In our case at each epoch it halved the models by discarding those with the highest calculated loss on the validation set.HyperOpt search algorithm, on the other hand, chooses the most probable best combinations of hyper-parameters based on the previously trained and/or stopped models.By this procedure, the model with the most promising set of hyper-parameters is chosen from 1000 models trained with the Adam optimizer.The hyper-parameters are sampled in the following ranges: number of hidden layers from 1 to 100, number of neurons in each layer from 5 to 200, batch size in {2, 4, 8, 16, 32} and learning rate from log − unif orm(10 −4 , 10 −1 ).At the end, the best hyper-parameters combination is: 1 hidden layer of 117 neurons, batch size 16, initial learning rate ≈ 0.069, dropout probability ≈ 0.044 and L2 regularization ≈ 0.0002.
After finding the best set of hyper-parameters, 20 models are trained using the cross validation procedure to exploit the entire dataset and to obtain the standard deviations of the predictions.In detail, for the cross validation the dataset is divided into 20 equal parts, 18 are used for training, one for validation and one for testing.The advantage of using the cross validation procedure is that a different block is used for the test of each model and also, in this way all the samples of the dataset are exploited for the training.Each one of the 20 models is trained with early stopping for a maximum of 150 epochs.
(fig.3(a) for the scaling (i) and fig.3(b) for the scaling (ii)) the mean values and standard deviations along the 20 models of the predicted values of σ R .In both fig. 2 and fig.3, the result of the training of linear regression models are depicted in black and the results of ANN in blue.Additionally, in fig.2(b) and fig.

FIG. 2 .
FIG.2.Scaling of single measurement for systems with an increasing number of atoms (a) and scaling in the number of measurements for systems with four atoms (b).We report the average absolute errors and standard deviations for 20 linear regression (in black and green) and 20 ANN (in blue and red) models in the predictions of σR on the synthetic validation set.The models in (a) uses as input the measurements of s2, s3, s4a and s5.The models in (b) uses as input one or more concatenated measurements of runs of the settings with four atoms (the fourth pair of points in (a) is equal to the first pair in (b)).Indicating with • ⊕ • the concatenation of the measurements of the settings, we report in (b) in black and blue s4a, s4a ⊕ s4c, s4a ⊕ s4c ⊕ s 4d , s4a ⊕ s4c ⊕ s 4d ⊕ s4e, s4a ⊕ s4c ⊕ s 4d ⊕ s4e ⊕ s 4f and in green and red s4a ⊕ s 4b .

TABLE I .
Summary of the main noise parameters with their respective values.We considered the parameters that are expected to have a predominant effect.

TABLE II .
Predicted values on real data expressed as average and standard deviation of 20 models trained on cross validation.The last column report for practicity the same estimated values of table I.
1is the 1 norm.Specifically, the reward is 1 if the last action at step t makes the corrected noisy simulation closer to the ideal one respect to the previous step t−1 and 0 otherwise.During the training we monitor the Kullback-Leibler (KL) divergence between P noisy