Recursive Nearest Neighbor Co-Kriging Models for Big Multi-fidelity Spatial Data Sets

,


Introduction
Global geophysical information is measured daily by numerous satellite sensors.Due to aging and exposure to the harsh environment of space the satellite sensors degrade over time, resulting in decreased performance reliability.Decreased performance may affect data measurement accuracy (Goldberg, 2011).In addition, newer satellites with technologically more advanced sensors provide information of higher fidelity than older sensors.These discrepancies in sensor performance have created the need to develop efficient methods to analyse daily global remote sensing measurements with varying fidelity.Here, our work is motivated by a data set produced from the high-resolution infrared radiation sounder (HIRS), which provides hundred of thousands of measurements from multiple satellite platforms daily.
Multiple methods in remote sensing have been developed to assess satellite sensor performance and consistency (Chander et al., 2013;Xiong et al., 2010;National Research Council, 2004).These methods do not account for spatial correlation and oversimplify the relationship between sensors.Furthermore, statistical methods to analyse these data sets which account for spatial correlation pose challenges due to the multifideltiy presence as well as the size and computationally intensive procedures.Nguyen et al. (2012Nguyen et al. ( , 2017) ) have proposed data fusion techniques to model multivariate spatial data at potentially different spatial resolutions based on fixed ranked kriging (Cressie and Johannesson, 2008).The accuracy of this approach relies on the number of basis functions and can only capture large scale variation of the covariance function.When the data sets are dense, strongly correlated, and the noise effect is sufficiently small, the low rank kriging techniques have difficulty accounting for small scale variation (Stein, 2014).
Autoregressive co-kriging models (Kennedy and O'Hagan, 2000;Qian et al., 2005;Le Gratiet, 2013), originally built for computer simulation problems, can be used for the analysis of multiple fidelity remote sensing observations with spatially nested structure and no random error.Nested design in the multifidelity setting means the design points at the higher fidelity levels are subsets of the lower fidelity ones.Konomi and Karagiannis (2021) and Ma et al. (2022) relaxed the nested design requirements by properly introducing an imputation mechanism.However, the aforesaid methods rely on Gaussian process models and are computationally impossible for big data problems.For cases when the observed space can be expressed as a tensor product Konomi et al. (2023) uses a separable covariance function within the co-kriging model to improve the computational efficiency.For large data sets which are irregularly positioned over space and contaminated with random error, Cheng et al. (2021) proposed the nearest neighbour co-kriging Gaussian process (NNCGP) to embed nearest neighbor Gaussian process (NNGP; Datta et al., 2016) into an autoregressive co-kriging model to make computations possible.NNCGP achieves this by using imputation ideas into the latent variables to construct a nested reference set of multiple NNGP levels.
Although NNCGP makes the analysis of big multi-fidelity data sets computationally possible, its computational speed depends on an expensive iterative MCMC procedure which makes it impractical for analysing daily large data sets.
To overcome the iterative MCMC procedure, we propose a recursive formulation based on the latent variable of the NNCGP model following similar ideas with Le Gratiet and Garnier (2014), who proposed the recursive formulation directly into the observations.Based on this new formulation, which we call recursive nearest neighbors co-kriging (RNNC), we are able to build a nearest neighbors co-kriging model with T levels by building T conditionally independent NNGPs.This enables the development of two alternative inferential procedures which aim to reduce high-dimensional parametric space, improve convergence, and reduce computational time in comparison to the NNCGP.Both proposed procedures are able to address applications for large non-nested and irregular spatial data sets from different platforms and with varying quality.The first proposed procedure, called Collapsed RNNC, reduces the MCMC posterior sampling space by integrating out the spatial latent variables.
Based on the collapsed RNNC, we propose an MCMC free procedure to speed up Bayesian inference.We build an algorithm which sequentially decomposes the parametric space into conditionally independent parts for each fidelity level where we can apply a K-fold optimization method.Each sequential step can be viewed as collapsed NNGP (Finley et al., 2019) where the bases functions of the Gaussian process mean are determined at the previous step.
We name this second inferential procedure conjugate RNNC.We note that the MCMC free procedure proposed in (Finley et al., 2019) cannot be applied directly in the NNCGP model because the computational complexity of the K-fold cross-validation method depends on the dimension of the parametric space.Our simulation study and our analysis of the HERS data set shows that the proposed conjugate RNNC procedure reduces the computational time notably without significantly sacrificing prediction accuracy over the existing NNCGP approach.
The layout of the paper is as follows.In Section 2, we introduce the high-resolution infrared radiation sounder data studied in this work.In Section 3, we review the NNCGP model.In Section 4, we introduce the proposed RNNC model.In Section 4.1, we integrate out the latent variables from the model and design an MCMC algorithm for this model.In Section 4.2, we design an MCMC free approach tailored to the proposed RNNC model that facilitates parametric and predictive inference.In Section 5, we investigate the performance of the proposed procedure.Specifically in Section 5.1 we introduce two simulation studies and in Section 5.2 we implement the proposed method for the analysis of data sets from two satellites, NOAA-14 and NOAA-15.Finally,in Section 6 we give a summary and conclusion.
2 High-resolution Infrared Radiation Sounder Data Satellite soundings have been providing measurements of the Earth's atmosphere, oceans, land, and ice since the 1970s to support the study of global climate system dynamics.Long term observations from past and current environmental satellites are widely used in developing climate data records (CDR) (National Research Council, 2004).HIRS mission objectives include observations of atmospheric temperature, water vapor, specific humidity, sea surface temperature, cloud cover, and total column ozone.The HIRS instrument is comprised of twenty channels, including twelve longwave channels, seven shortwave channels, and one visible channel.The dataset being considered in this study is limb-corrected HIRS swath data as brightness temperatures (Jackson et al., 2003).The data is stored as daily files, where each daily file records approximately 120,000 geolocated observations.The current archive includes data from NOAA-5 through NOAA-17 along with Metop-02, covering the time period of 1978-2017.In all, this data archive is more than 2 TB, with an average daily file size of about 82 MB.The HIRS CRD faces some common challenges regarding the consistency and accuracy over time, due to degradation of sensors and intersatellite discrepancies.Furthermore, there is missing information caused by atmospheric conditions such as thick cloud cover.
We examine HIRS Channel 5 observations from a single day, March 1, 2001, as illus-  trated in Figure 1.On this day, we may exploit a period of temporal overlap in the NOAA POES series where two satellites captured measurements: NOAA-14 and NOAA-15.The HIRS sensors on these two satellites have similar technical designs which allow us to ignore the spectral and spatial footprint differences.NOAA-14 became operational in December 1994 while NOAA-15 became operational in October 1998.The spatial resolution footprint for both satellites is approximately 10 km at nadir.Given the sensor age difference, it is reasonable to consider that the instruments on-board NOAA-15 are in better condition than those of NOAA-14 and hence provide more accurate data.Therefore, we treat observations from NOAA-14 as a low fidelity dataset, and those from NOAA-15 as a high fidelity dataset.

Nearest Neighbor Co-kriging Gaussian Process
Let y t (s) denote the output function at the spatial location s at fidelity level t = 1, ..., T in a system with fidelity T levels.The fidelity level index t runs from the least accurate to the most accurate one.Let z t (s) denote the observed output at location s.We specify the co-kriging model as: where z t (s) is contaminated by additive random noise ϵ t ∼ N (0, τ 2 t ) for t = 2, . . ., T , and , where θ t = {σ 2 t , ϕ t }.This indicates that discrepancy term δ t (s) is a Gaussian process.Finally, the unknown scale discrepancy function ζ t−1 (s) is modeled as a basis expansion ζ t−1 (s|γ t−1 ) = g t−1 (s) T γ t−1 (usually low degree), where g t (s) is a vector of polynomial basis functions and {γ t−1 } is a vector of random coefficients, for t = 2, . . ., T .
Let us assume the system is observed at n t locations at fidelity level t.Let S t = {s t,1 , . . ., s t,nt } be the set of n t observed locations, let w t = w t (S t ) = {w t (s t,1 ), . . ., w t (s t,nt )} the latent spatial random effect vector at fidelity level t, and let Z t = z t (S t ) = {z t (s t,1 ), . . ., z t (s t,nt )} represent the observed output at fidelity level t.If data {Z t } are observed in non-nested locations across the fidelity levels, the calculation of the likelihood requires O(( T t=1 n t ) 3 ) flops to invert the covariance matrix of the observations (denoted by Λ) and additional O(( T t=1 n t ) 2 ) memory to store it as explained in (Konomi and Karagiannis, 2021).To reduce the computational complexity, Cheng et al. (2021) proposed NNCGP which assigns conditionally independent NNGP models within a nested reference set.For Bayesian inference, Cheng et al. (2021) proposed a Gibbs sampler taking advantage of the nearest neighbour structure at each fidelity level.However, this sampler is based on updating a conditionally independent high dimensional latent variable which could cause slow convergence and high autocorrelation (Liu et al., 1994).The slow convergence can significantly increase the number of the Gibbs sampler iterations I.The overall computational cost of NNCGP for m neighbours and non-nested spatial locations is O(I × ( T t=1 n t )m 3 ) floating point operations (flops).Also, for a fixed computational budget, the produced Monte Carlo estimates may be sensitive to the initial values of the MCMC sampler.Despite reducing the computational complexity to linear for every MCMC iteration, the number of the iterations can significantly increase computational cost.This simple observation makes the existing NNCGP too expensive for the vast majority of real remote sensing applications.

Recursive Nearest Neighbor Co-kriging Model
Improvement in the convergence of MCMC can be achieved by integrating out the latent variable w = (w 1 , . . ., w S ) from the Bayesian hierarchical NNCGP model which allows dimension reduction in the sampling space and the involved posterior distributions.However, integrating out the latent variables w in the NNCGP model is not feasible under non-nested designs.This is because the posterior distribution of latent variable of the lower fidelity is affected by the likelihood of higher fidelity.To make possible the integration of latent variables w, we propose a recursive formulation for the co-kriging model by using ideas similar to (Le Gratiet and Garnier, 2014).Precisely, our proposed recursive nearest neighbors co-kriging (RNNC) has the following hierarchical structure: where δ t (s) is a Gaussian process as before and ŷt−1 (s) is a Gaussian process with distribution . Essentially, we express y t (s) (the Gaussian process response at level t) as a function of the Gaussian process y t−1 (s) conditioned by the values Z (t−1) = (Z 1 , . . ., Z t−1 ).For computational efficiency, we assume NNGP independent priors for w t (s), t = 1, . . ., T .Based on the NNGP priors, the conditional distribution can be computed for all types of reference sets.So, based on the recursive representation we can relax the nested condition on the NNCGP nested reference set.Specifically, . The • represents the Hadamard product between two matrices.
Using the Markovian property of the co-kriging model (O'Hagan, 1998), the joint likelihood of the proposed model in (4.1) can be factorized as a product of likelihoods at different fidelity levels conditional on ŷt−1 (S t ) = {ŷ t−1 (s t,1 ), . . ., ŷt−1 (s t,nt )} for t = 2, . . ., T and prior w t for t = 1, ..., T , i.e.: where (•) denotes all the parameters associated with the model.This representation makes it possible to integrate out the latent variable w t independently for each fidelity level t = 1, . . ., T .

Collapsed Recursive Nearest Neighbor Co-kriging Model
We represent the multivariate Gaussian latent variable w t (S t ) as a linear model: , for i = 2, . . ., n t for t = 1, . . ., T .We set η t,i ∼ N (0, d t,i,i ) independently for all t, i, d t,1,1 = δ(w t,1 ) and d t,i,i = δ(w t,i |{w t,j ; j < i}) for i = 2, . . ., n t and t = 1, . . ., T .In a matrix form we can write w t (S t ) = A t w t (S t ) + η t , where A t is an n × n strictly lower-triangular matrix and η t ∼ N (0, D) and D is diagonal.Based on the structure of A t , we can write the covariance of each level as The NNGP prior constructs a sparse strictly lower triangular matrix A with no more than m (where m ≪ n) non-zero entries in each row resulting in an approximation of the covariance matrix T is a sparse matrix and can be computed based on O(n t m 3 ) operations.
We call the integrated version of the above model collapsed RNNC model.Specifically, after integrating out w t the proposed RNNC model can be written as: for t = 2, . . ., T , where Λt (θ t , τ t ) = Ct (θ t ) + τ 2 t I = σ 2 t Rt (ϕ t ) + τ 2 t I is the covariance matrix of the observations, Ct (θ t ) is the sparse covariance matrix with parameters θ t = {σ 2 t , ϕ t } and τ 2 t is the variance of the error ϵ t at level t.By applying Sherman-Morrison-Woodbury formula, the inverse and determinant of Λ get the computationally convenient form For simplicity let us denote Θ t = (β t , γ t , θ t , τ t ).Based on this representation, the joint posterior approximation of all the unknowns is: where L(Z t |Θ t , ŷt−1 (S t )) is the approximated likelihood using the sparse representation and t } as a set of knots of fidelity level t.This contains the observed locations that are not in the t th level but in the higher fidelity levels.The (•) represents the parameters and data necessary to produce the prediction distortion at level t − 1.Note that the prediction probability can be excluded for cases with hierarchically nested structure for the spatial locations.For each level, a Gibbs sampler can be employed to fascilitate inference based on the conditional representation p(Θ T |ŷ t−1 (S t ), Z T ) and p(ŷ t−1 (S * t )|Θ T , Z T ) which is given in Eq. (4.2).
In the special case of hierarchical nested structure for the spatial locations, we can avoid sampling from p(ŷ t−1 (S * t )|Θ T , Z T ) using the observed locations z t−1 (S * t ).We can prove that the mean and variance of the predictive distribution at level T of the collapsed RNNC is the same as the mean and variance of the predictive distribution of the NNCGP.The proof is very similar to Le Gratiet and Garnier (2014) in the sense that we just need to substitute the GP priors with the NNGP priors and add the nugget effect in each level.

Conjugate Recursive Nearest Neighbor Co-kriging Model
Both NNCGP and collapsed RNNC models rely on the MCMC inference which can be practically prohibitive when analyzing thousands or millions of spatial data sets.Following recent work by Finley et al. (2019), we propose a MCMC free procedure to achieve exact Bayesian inference at a more practical time.Because the computational efficiency of the estimation procedure in Finley et al. ( 2019) is sensitive to the number of parameters, it cannot be applied directly to our model.We utilize the RNNC model conditionally independent posterior representation to decompose the parametric space into smaller different groups based on the fidelity levels.To make MCMC free inference possible, we re-parameterize the covariance function of the collapsed recursive co-kriging model as Λt (θ t , τ 2 t ) = σ 2 t Σt , where Σt = Rt + τ 2 t I , Rt is the nearest-neighbor approximation correlation matrix, and To avoid the computational bottleneck due to the MCMC, we propose to make fast estimation (ϕ t , τ 2 t ) through a cross-validation approach for each level as well as use the prediction means of y t−1 (S t ) based on the estimated values.We estimate ŷt (S * t ) by the posterior mean ȳt (S In the case that we have nested locations, y t (s u ) for a location s u ∈ S t−1 is estimated with an empirical approach ȳt−1 (s u ) as z t−1 (s u ) and its variance is equal to the variance of the nugget effect.Given ϕ t , τ 2 t and ŷt (S t ), the convariance matrix Σt can be calculated analytically.For computational convenience, we assign an independent conjugate prior for the parameters of each level such as p(β 1 , . . ., Based on this specifications, the posterior density function can be separated for each level t such as: We can compute the full conditional density function of γ t−1 , β t , and σ 2 t as ), (4.9) were Ṽγ t−1 , μγ t−1 , Ṽβt μβt , σ 2 t Ṽβt , a * t , and b * t are given analytically in Appendix D. Note that for t = 1, γ 0 and y 0 (S t ) do not exist.Also the conditional posterior density function of β 1 and σ 2 1 are slightly different as explained in Appendix D. A K-fold cross-validation method is used for the selection of optimal values for the parameters ϕ t and τ 2 t at level t that provide best prediction performance for the model, from a group of candidates.The criteria for choosing ϕ t and τ 2 t can be the root mean square prediction error (RMSPE) over the K folds of data set.The geolocated observations of the t fidelity are partitioned into K equal size subsets.Then, one of the subsets is used as a test set and the others are used for training.The procedure is repeated K times such that each subset is used once as a test set.The computational complexity of these procedures is reduced significantly from the use of the NNGP priors in the recursive co-kriging model.
Find a confidence interval based on the quantiles of the above distributions.
step 8 Repeat steps 1-7 over all T fidelity levels.
the dimension of S * t .
Step 7, to predict at a new location requires O(m 3 ) flops.When we use parallel computing within a fidelity level for parameters (ϕ t , τ 2 t ) the computation in step 3 becomes extremely fast.The conjugate RNNC model provides an empirical estimation of spatial effect parameter ϕ t and noise parameter τt 2 within a given resolution.We note that the proposed MCMC free inference can be viewed as a sequential optimization technique which splits the parametric space into several lower dimension components where we can apply conditional independent conjugate NNGP models.

Synthetic Data Example and Real Data Analysis
We study the performance of our proposed procedures, the conjugate RNNC and the collapsed RNNC, as well as compare their performances with that of the sequential NNCGP.
The empirical study is based on one synthetic data set example with nested and one with non-nested input data sets.Also we use a real satellite data set application.As measures of performance we use the root mean squared prediction errors (RMSPE), coverage probability of the 95% equal tail credible interval (CVG(95%)), average length of the 95% equal tail credible interval (ALCI(95%)), and continuous rank probability score (CRPS) (Gneiting and Raftery, 2007).Details of these measurements are given in Appendix E. The simulations were performed in MATLAB R2018a, on a personal computer with specifications (intelR i7-3770 3.4GHz Processor, RAM 8.00GB, MS Windows 64bit).We have also included a simulation study with four levels of fidelity in the supplementary materials.

Simulation Study
We consider a two-fidelity level system represented by the hierarchical statistical model (3.1) defined on a two dimensional unit square domain with univariate observation data sets for both Z 1 and Z 2 .Let the design matrix be h(S t ) = 1, the autoregressive coefficient function be an unknown constant ζ 1 (s) = γ 1 , and exponential covariance functions.We generate two synthetic data sets for the above statistical model.The true values of the parameters are listed in Table 1 and Table 3.The data sets on the nested spatial locations consists of observations Z 1 and Z 2 from 100 × 100 grids S 1 and S 2 , respectively.The data sets, shown in Regarding the Bayesian inference, we compared the sequential NNCGP model, with the proposed collapsed RNNC model and that with the conjugate RNNC model, on both nested and non-nested data sets.We assigned similar non-informative priors for all the four models.
In Tables 1 and 3, we report the Monte Carlo estimates of the posterior means and the associated 95% marginal credible intervals of the unknown parameters using the two different NNCGP based procedures: sequential NNCGP, collapsed RNNC, along with the posterior  mean and tuned values of parameters using conjugate RNNC, with m = 10.There is no significant difference in the estimation of parameters for all MCMC based models (NNCGP and collapsed RNNC) and the true values of the parameters are successfully included in the 95% marginal credible intervals.The introduction of latent interpolants may have caused a small overestimation of τ 2 2 for all models.Instead, the conjugate RNNC is underestimating the variance of the nugget for the second fidelity level.The parameter estimations can be improved with a semi-nested or nested structure between the observed locations of the fidelity levels, and it is also shown for the auto-regressive co-kriging model in Konomi and Karagiannis (2021).The conjugate RNNC model has similar performance on estimating the  mean of the parameters compared to the NNCGP and RNNC models.However, it does not provide uncertainties regarding these estimations.
In Table 2 and Table 4

Application to High-resolution Infrared Radiation Sounder data
We We assign independent normal distribution priors with zero mean and large variances for β 0,t , β 1,t , β 2,t and γ.We assign independent uniform prior distributions U (0, 1000) to the range correlation parameters (ϕ t,1 , ϕ t,2 ) for t = 1, 2. Also, we assign independent IG(2, 1) prior distributions for the variance parameters σ 2 t and τ 2 t .For the Bayesian inference of the sequential NNCGP, we run the MCMC sampler with of 35, 000 iterations where the first 5, 000 iterations are discarded as a burn-in.For the Bayesian inference of conjugate RNNC, we consider using posterior means as the estimated values for parameters β 0,t , β 1,t , β 2,t , σ 2 t and γ; we also use posterior means as the imputation values for latent process ỹt and for the prediction values of z(s p ) at location s p ̸ ∈ St .
The prediction performance metrics of the four different methods are given in Table 5.Compared to the single level NNGP model and combined NNGP model, the sequential NNCGP model and conjugate RNNC model produced a 20-30% smaller RMSPE and their NSME is closer to 1.The sequential NNCGP model and collapsed RNNC model also produced larger CVG and smaller ALCI than the single level NNGP model and combined NNGP model.The result suggests that the NNCGP and RNNC models have a substantial improvement in terms of predictive accuracy in real data analysis too.In the prediction plots (Figure 4) of the testing data of NOAA-15, we observe that RNNC models are more capable of capturing the pattern of the testing data than single level NNGP model and combined NNGP model.This is reasonable because the observations from NOAA-14 have   ) and that of NNCGP model is O((n 1 + n 2 )m 3 ), for an MCMC iteration.However, the whole computational complexity of the conjugate RNNC model is O((n 1 + n 2 )m 3 ) with parallel computational environment, which makes it remarkably computationally efficient without losing significant prediction accuracy.This is consistent with the running times of the models shown in Table 5.
We apply the MCMC free conjugate RNNC model for gap-filling predictions based upon a discrete global grid.We chose to use 1

Summary and conclusions
We have proposed a new computationally efficient co-kriging method, the recursive nearest neighbor Autoregressive Co-Kriging (RNNC) model, for the analysis of large and multifidelity spatial data sets.In particular, we proposed two computationally efficient inferential procedures: a) the collapsed RNNC, and b) the conjugate RNNC.Regarding the collapsed RNNC, we integrate out the latent variables of the RNNC model which enables the factorisation of the likelihood into terms involving smaller and sparse covariance matrices within each level.Then, a prediction focused approximation is applied to the aforesaid model to further speed up the computation.The cross-validation using grid search on a two or three dimensional space is a computationally feasible method to estimate the hyperparameters.
Regarding the proposed conjugate RNNC, it is MCMC free and at most computationally linear in the total number of all spatial locations of all fidelity levels.We compared the proposed collapsed RNNC and conjugate RNNC with NNCGP in a simulation study and a real data application of intersatellite calibration.We observed that similar to NNCGP, the collapsed and conjugate RNNC were also able to improve the accuracy of the prediction for the HIRS brightness temperatures from the NOAA-15 polar-orbiting satellite by incorporating information from an older version of the same HIRS sensor on board the polar orbiting satellite NOAA-14.The RNNC can be viewed as a modularization approach to NNCGP model in Bayesian statistics Bayarri et al. (2009) where the analysis is done in steps rather than jointly.
The proposed procedures can be used for a variety of large multi-fidelity data sets in remote sensing with overlapping areas of observed locations.A natural extension of our model can be done based on a recently proposed sparse plus Low-rank Gaussian Process (SPLGP) (Shirota et al., 2023) who used a combination of Gaussian predictive process and NNGP in an MCMC-free framework.A natural choice for introducing the non-stationarity in the conjugate RNNC is to use non-dynamic partition methods such as (Matthew J. Heaton and Terres, 2017;Konomi et al., 2019).Moreover, we can use more complex Vechias approximations (Vecchia, 1988;Stein et al., 2004;Guinness, 2018;Katzfuss et al., 2020), similar to the NNGP, where the ordering of the data is more complicated but results in a better approximation.These Vechias approximation techniques of ordering can be applied naturally in the proposed RNNC model, however, they are out of the scope of this paper and will be investigated in future work.Next steps will include extending the proposed method in the multivariate setting by using ideas from parallel partial autoregressive co-kriging (Ma et al., 2022) and NNGP spatial factor models (Taylor-Rodriguez et al., 2018).Spherical covariance function can be used for global data set analysis (Guinness and Fuentes, 2016), however their extension to anisotropic representation is not straightforward.Still, work needs to be done in developing new strategies for tuning the hyperparameters in more complex covariance functions with multiple parameters within a fidelity level.2019) using only the fourth single level training data.The prior specification for the conjugate NNGP are the same as above.Figure 7 gives the prediction mean and the prediction standard deviations using both methods.Figure 7a gives the single layer conjugate NNGP and compared to our proposed method Figure 7 is less accurate.Figure 7b shows the single layer conjugate NNGP prediction standard deviations which is considerably larger than the prediction standard deviations using conjugate RNNC.Moreover, the computed conjugate RNNC RMSPE is 0.64 with 95% CVG 0.94 and ALCI 2.37.Instead, the computed conjugate NNGP RMSPE is 1.68 with 95% CVG 0.86 and ALCI 3.98.These comparisons shows that accounting for the multi-fidelity dependencies can improve our prediction accuracy and its variation.The computational time for our proposed method is around four times slower than the conjugate NNGP in a single level.More precisely our method take around 117 sec and the single layer conjugate NNGP 27 seconds).This is very normal to expect since the computational complexity is linear to the observed data.

Figures
Figures 2a and 2b are based on a fully non-nested input where the low fidelity observations Z 1 and the high fidelity observations Z 2 are generated at irregularly located at point in sets S 1 and S 2 of size 5000, while S 1 ∩ S 2 = ∅.In all data sets, a few small square regions from Z 2 are treated as a testing data-set, and the rest of Z 2 and Z 1 are treated as training data sets.The testing regions for the non-nested input can be seen as white boxes in Figure 2(b).
(a) Non-nested low fidelity observations (b) Non-nested high fidelity observations

Figure 2 :
Figure 2: Observations for two fidelity level structure of non-nested observed input space.White boxes indicate the testing regions.

Figure 2
Figure2and 3 provide the non-nested synthetic observations and the prediction plots from sequential NNCGP, collapsed RNNC, conjugate RNNC, combined NNGP and single level NNGP models.We observed that for the testing regions the NNCGP models provides similar prediction surfaces and all NNCGP models has the better presentation of patterns in prediction surface comparing to single level NNGP model.

Figure 3 :
Figure 3: Non-nested input observations with two fidelity level structure.Original testing data (a) along with predictions of the high fidelity level data-set by (b) Sequential NNCGP, (c) Collapsed RNNC, (d) Conjugate RNNC, (e) Combined NNCGP and (f) single level NNCGP.
(a) NOAA-15 testing data-set (b) Prediction means by Sequential NNCGP model (c) Prediction means by Collapsed RNNC model(d) Prediction means by conjugate RNNC model (e) Prediction means by single level NNGP model (f) Prediction means by combined NNGP model

Figure 5 :
Figure 5: The global prediction brightness temperature values of NOAA 15 using the MCMC free conjugate model.
pixels as grids with global spatial coverage from −70 • to 70 • N. By applying the NNCGP model, we predict gridded NOAA-15 brightness temperature data on the center of the grids, based on the NOAA-14 and NOAA-15 swath-based spatial support.The prediction plot (Figure 5) illustrates the ability of the MCMC free conjugate RNNC model to handle large irregularly spaced data sets and produce a gap-filled composite gridded dataset.The resulting global image of the brightness temperature is practically the same as the sequential NNCGP.
(a) First level fidelity observations (b) Second level fidelity observations (c) Third level fidelity observations (d) Fourth level fidelity observations (e) Fourth level fidelity testing data

Figure 6 :
Figure 6: Observations for four fidelity level structure: a) first fidelity level training data, b) second fidelity level training data, c) third fidelity level training data, d) forth fidelity level training data (white boxes indicate the testing regions), e) fourth fidelity level testing data.
Figure 7: Prediction mean and the prediction standard deviations using both methods: a) prediction mean using conjugate NNGP, b) Prediction standard deviation (sd) using conjugate NNGP, c) prediction mean using conjugate RNNC, and d) Prediction standard deviation (sd) using conjugate RNNC.
is the noiseless output.Here, ζ t−1 (s) and δ t (s) represent the scale and additive discrepancies between systems with fidelity levels t and t − 1, h t (•) is a vector of preselected bases functions, and β t is a vector of coefficients at fidelity level t.The latent random function w t (s) is modeled as a Gaussian process, mutually independent for different t; i.e. w t (•) ∼ GP (0, C t (•, •; θ t )) where C t (•, •; θ t ) is a covariance function with covariance parameters θ t at fidelity level t.Any well defined covariance function can be used The estimation, tuning and prediction procedure of conjugate RNNC model are given in Algorithm 1. Similar to NNCGP model, the conjugate RNNC model analyzes the data set of each fidelity level sequentially from the lowest level to the highest.For each single fidelity level t, the conjugate RNNC model is able to run in parallel for tuning the parameter ϕ t The Algorithm steps for the MCMC free conjugate RNNC procedure.MCMC free posterior sampling for multi-fidelity level system with T levels.step 1 Start from fidelity level 1(t = 1), construct a set L t that contains l t number of candidates of parameters ϕ t and τ 2 t .step 2 Choose a (ϕ t , τt 2 ) from L t .Split the data set of fidelity level t into K folds.step 3 Remove k th fold of data set S t , denote as S t,k , then estimate σ 2 t |Z t , ŷt−1 (S t ) with the posterior mean σ2 t = Estimate β t |σ 2 t , Z t , ŷt−1 (S t ) with the posterior mean βt = Ṽβt μβt of (4.10).Estimate γ t−1 |β t , σ 2 t , Z t , ŷt−1 (S t ) with the posterior mean γt−1 = Ṽγ t−1 μγ t−1 of (4.9).Predict test data set z t (S t,k ) by posterior mean ẑt (S t,k ) = 1 t>1 (t)g T t ) requires O(n t m 3 + n t mp 2 t ) floating point operations (flops) where p t is the dimension of β t .Step 6, to predict at new locations S * t requires O(n * t m 3 ) flops where n * t is Algorithm 1 t−1 (S t,k )γ t−1 ŷt−1 (S t,k ) + h T t (S t,k ) βt + V t,S t,k µ t,S t,k .step 5 Repeat steps 3-4 over all K folds, calculate the average root mean square prediction error(RMSPE) by RMSPE = K k=1 s=S t,k (z t (s) − ẑt (s)) 2 /n k K .step 6 Repeat steps 2-5 over all values in candidate set L t , choose the value of φt and τ 2 t that minimizes the RMSPE.Repeat step 3 on full data set S t by fixing ϕ t = φt , t step 7 For a new input location s p , predict y t (s p ) by posterior:

Table 1 :
The estimation of parameters in nested input dataset, using sequential NNCGP, collapsed RNNC and conjugate RNNC models.

Table 2 :
Performance measures for the predictive ability of the sequential NNCGP model, collapsed RNNC model and conjugate RNNC model.

Table 3 :
The estimation of parameters in non-nested input dataset using sequential NNCGP, collapsed RNNC and conjugate RNNC models.

Table 4 :
Performance measures for the predictive ability of the Sequential NNCGP model, collapsed RNNC model and conjugate RNNC model, in non-nested input.
, we report standard performance measures (defined in Appendix E) for the sequential NNCGP, collapsed RNNC and conjugate RNNC with m = 10 number of neighbours.All performance measures indicate that the collapsed RNNC model has similar predictive ability with the sequential NNCGP model.The conjugate RNNC model produced RMSPE value that is 10% larger than other NNCGP and collapsed RNNC models, but it is still significantly smaller than the RMSPE values from single level NNGP and combined NNGP.The tables also show that the running time of the collapsed RNNC model is not

Table 5 :
model our data based on the two-fidelity level conjugate RNNC model and on the twofidelity level sequential NNCGP model.Moreover, we provide comparisons with the single level NNGP model and combined NNGP model.We consider a linear model for the mean of the Gaussian processes, in y 1 (•) and δ 2 (•), with linear basis function representation h(s t ) and coefficients β t = {β 0,t , β 1,t , β 2,t } T .We consider the scalar discrepancy ζ(s) to be unknown constant and equal to γ.The number of nearest neighbors m is set to 10, and the spatial process w t is considered to have a diagonal anisotropic exponential covariance function.Performance measures for the predictive ability of sequential NNCGP, single level NNGP, combined NNGP, collapsed RNNC and conjugate RNNC models in NOAA 14 and NOAA 15 HIRS instrument data analysis.