Sonic layer depth (SLD), an important parameter in underwater acoustics, is the near surface depth of first maxima of the sound speed in the ocean. The lack of direct observations of vertical profiles of velocimeters or temperature and salinity, from which sound speed and SLD can be calculated, hampers the investigation of SLD. In this study, we demonstrate SLD estimation using artificial neural network (ANN) from surface measurements that can be replaced with satellite observations later. Surface and subsurface measurements from a central Arabian Sea mooring are used for this purpose. The estimated SLD had a root mean square error (correlation coefficient) of 11.83 m (0.84). Approximately 76% (91%) of estimations lie within ±10 m (±20 m). SLD has also been estimated from surface parameters using multiple regression technique (MRT). ANN proved its superiority over MRT in estimating SLD from surface parameters.
 Water is an efficient medium for the transmission of sound. Sound travels more rapidly and with much less attenuation of energy through water than air. This characteristic resulted in development of submarine acoustic methods having tremendous value in navigation [Sverdrup et al., 1961]. Sonic Layer Depth (SLD) is the near surface depth at which first maxima of the sound speed occurs (Figure 1) and is an important parameter in underwater acoustics (http://metocph.nmci.navy.mil/KBay/soundprop.htm). SLD plays an important role in refraction of sound rays traveling in the ocean, which in turn affects the sonar detection ranges (http://metocph.nmci.navy.mil/KBay/hs∼acousticpaths.htm).
 There are many surface and subsurface parameters, which affect the temperature and salinity profiles in the ocean, which in turn affect the sound speed and cause the change in SLD as well. Some of the important factors affecting the temperature and salinity profiles in the ocean are sea surface height anomalies (SSHAs), sea surface temperature (SST), wind stress, net heat flux (NHF), fresh water flux, radiation, Internal waves, frontal zones, meso-scale synoptic eddies and subsurface currents. These parameters indirectly change SLD.
 The conventional approach to estimate SLD is to use the sound speed profiles (SSPs) in the ocean, obtained either by a velocimeter that measures sound speed directly or by using the in situ temperature and salinity profiles [Lü et al., 2003]. These observations have limitations in both temporal and spatial resolutions. Estimation of SLD using parameters obtainable from remote sensing platforms overcomes this problem and provides an opportunity to monitor operationally the changes in SLD more frequently at a finer spatial resolution over larger spatial scales. The main objective of this work is to demonstrate the estimation of SLD from surface parameters alone, using artificial neural network (ANN) approach. This method also demonstrates the potential of estimating SLD using remote sensing parameters in future. However, due to the limitation of the remote sensing observations, particularly the SSHAs, corresponding to the in situ measurements, we used the observations from a central Arabian Sea mooring as proxy to the remotely sensed data.
2. Data and Methodology
 In the present study, we have used the observations from the central Arabian Sea mooring located at 15.5°N and 61.5°E, deployed by the Woods Hole Oceanographic Institution during 16 October 1994–22 October 1995. Rudnick et al.  and Ali et al.  have given the details of the instrumentation on this mooring and its observations. We selected these data for the analysis, even though observations are during 1994–1995, as this is the only dataset with both meteorological and oceanographic observations in the north Indian Ocean with continuous time series of hourly sampling over one year. The value of hourly sampling lies in having a large number of total samples. We are not, given the present lack of availability of remote sampling methods that resolve the diurnal cycle in surface forcing of the ocean, working to resolve diurnal variability in SLD.
 The temperature and salinity measurements were available down to a depth of 3025 m and 250 m respectively. Vertical resolution of salinity is poor compared with the temperature observations. We considered only those depths where temperature measurements are available throughout the study period along with the surface parameters. Thus 27 depths (in m) selected for temperature are 1.4, 1.8, 1.91, 2.4, 3.5, 4.5, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 80, 90, 100, 125, 175, 200, 225 and 250. Similarly, the 6 depths (in m.) selected for salinity measurements are 1.4, 10, 35, 100, 200, and 250. Due to these quality checks, data could not be analysed during the entire months of February, August, and September 1995. Hence, only 9 months data have been analysed in this study. Thus, out of the total number of 8858 hourly observations, 5281 profiles have been used. Salinity measurements were linearly interpolated to the 27 temperature depths. The UNESCO algorithm originally developed by Chen and Millero  and modified by Wong and Zhu  was used to estimate the sound speed using temperature (in °C), pressure (in bars) and salinity (in psu) values. Since this algorithm uses pressure as a variable for sound speed calculation, the depth values have been converted into pressure following Leroy and Parthiot . SLDs have been computed from these SSPs. We refer to this SLD based on in situ temperature and salinity profiles as SLDIS. SLD has also been computed from the Levitus monthly climatological profiles (SLDCL hereafter) of temperature [Locarnini et al., 2006] and salinity [Antonov et al., 2006]. In addition, we computed SLD directly from surface parameters using the ANN approach referred as SLDAN and using multiple regression technique (MRT) referred henceforth as SLDRT. Surface parameters considered for the present study are wind stress, NHF, radiation, dynamic height (DH) and SST. Using the temperature values at all 27 depths and the interpolated salinity values at those depths DH was calculated. DH is used as the surface parameter though it is estimated from the temperature and salinity profiles as this parameter can be replaced with SSHA available from satellite altimeters. The NHF across the air-sea interface is computed from the surface buoy measurements. The details of these flux estimations and the accuracy of the measurements, are described by Weller et al.  and Fischer . Methods of estimating SLDAN and SLDRT are described in the following section.
3. Artificial Neural Network and Multiple Regression Analyses
 An ANN is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANN models learn through examples that can be designed for the specific applications like pattern recognition, data classification or parameter prediction, through a training (learning) process.
 For the present ANN analysis, a radial basis function (RBF) model [Broomhead and Lowe, 1988; Moody and Darkin, 1989; Haykin, 2002] employing a pseudo invert algorithm consisting of one input layer, one output layer and one hidden layer containing 379 radial (hidden) units was used to estimate the SLD from surface parameters. The RBF network model is essentially a three layered model with a single hidden layer of radial units. It has two layers of processing: In the first, input is mapped onto each RBF in the ‘hidden’ layer. The RBF chosen is usually a Gaussian function. In regression problems the output layer is the linear combination of hidden layer values representing mean predicted output. Pseudo Invert algorithm uses the singular value decomposition technique to calculate the pseudo-inverse of the matrix needed to set the weights in a linear output layer, to find the least mean squared solution. Essentially, it guarantees to find the optimal setting for the weights in a linear layer, to minimize the root mean square (RMS) training set error [Bishop, 1995; Press et al., 1992; Golub and Kahan, 1965]. This is the standard least-squares optimization technique.
 Performing an ANN analysis requires three sets of data under the categories training, verification (validation) and prediction (testing). The data set marked for training is used to train the neural network. Verification cases are used to validate the model during training so that the model does not over-fit. Once the ANN model is trained with good amount of in situ data covering as wide oceanographic regimes/conditions as possible, the output (SLD in this case) can be predicted with input parameters alone without any regional/seasonal constraints. If we have data for many years, a few years could be used for training and the other for validation. However, in the present case we barely have one-year dataset. Training for a few months and testing for the remaining months was not feasible, as the training of the model (especially the RBF network model) needs a good coverage of the entire seasonal conditions. Hence, we are forced to use random days for training so that the entire annual cycle is covered. The remaining days are used for validation and testing. A random selection is most suited for an ANN analysis for such a dataset [Haykin, 2002; Ripley, 1997]. Richaume et al. , Pozzi et al. , Schroder et al. , and Bourras and Liu  used random selection technique for the prediction of various parameters. Random selection of data for training and validation also eliminates the periodicity/bias that may creep in while using a systematic selection of data for the network training and validation, which is desirable. In this analysis data are divided in 3 segments randomly. First segment is used for training, second segment for the validation and the third for the prediction.
 Independent parameters (input) for the ANN analysis are wind stress, radiation, NHF, SST and DH. Dependent parameter (target output) is the SLDIS. We have, however, not considered the special cases of zero layer depth (no SLD), both in training and prediction of the ANN model. Hence, out of 5281, only 3367 hourly observations were analysed. About 50% of this dataset (1642 observations) was used for training, about 25% (841 observations) for verification and 25% (841 observations) for prediction. The 25% of the data (third segment), marked for prediction or testing, were held back and were not used in training the model. Thus the present model is “truly” tested with in situ observations that were not used for training. Similarly, in MRT approach, all the surface parameters used for training and validation data segments (2483 observations) are regressed linearly with SLDIS. Regression coefficients thus generated are used to predict the SLDRT from prediction input dataset.
4. Results and Discussions
 In this section, we compared the hourly estimations of SLDAN and SLDRT with SLDIS only for the 841 predicted observations that were not used for training/developing the ANN model. Statistical results (Table 1) indicate that SLD can be estimated with a better accuracy if this parameter is directly estimated from surface observations using ANN approach compared with MRT. Absolute error mean (average of absolute differences between estimated and in situ), standard deviation (SD) of the errors in the estimations (error SD), SD ratio (ratio of error SD to data SD), coefficient of correlation (R), are much less for SLDAN compared with SLDRT.
Table 1. Statistical Analysis of SLDRT and SLDAN
Absolute Error Mean
Correlation Coefficient (R)
 Histograms of the errors in SLD estimations from these two methods on hourly and on daily average bases are shown in Figures 2 and 3respectively. Out of 841 (186) number of hourly estimations (daily averages), SLD could be estimated within an error limit of ±10 m in about 76% (83%) using ANN method and about 58% (49%) using MRT. These results also indicate the superiority of ANN approach over MRT.
 Monthly averages of SLDIS, SLDAN, SLDRT and SLDCL are shown for the entire study period in Figure 4. These monthly means do not represent the hourly observations of the entire month but only 25% of the values in that particular month as we compared only the predicted dataset (25% of the entire dataset). Monthly SLDIS and SLDAN during the study period significantly deviate from climatology (SLDCL) emphasizing regular monitoring of SLD rather than using climatological values. SLDIS increases continuously from October 1994 to January 1995 and then decreases till April 1995. It again increases from May 1995 to July 1995. Deeper SLD is observed during December to February and shallow SLD during October–November and during April and May. Thus, SLD has a bimodal oscillation. Incidentally, mixed layer depth (MLD) also has bimodal oscillations [Swain et al., 2006, Figure 5]. The deep SLD during January 1995 is due to the changes in the vertical structure of temperature and salinity due to convective mixing. The second deep SLD during July 1995 is caused by the strong monsoonal winds. Deeper SLD during January compared to July indicates the dominant nature of convective mixing compared with the wind affect through stirring or Ekman pumping. SLDAN is closer to SLDIS and shows a similar trend with lesser variations compared with those of SLDRT. SLDRT variations are significant, particularly, during the pre-monsoon period of March–May.
 Monthly root mean square error (RMSE) error between SLDIS & SLDAN and SLDIS & SLDRT were computed. Monthly RMSE error between SLDIS and SLDAN vary approximately from 1.8 to 12.1 m and are less than that between SLDIS and SLDRT, which vary from 3.9 m. to 32.3 m (Figure 5).
5. Summary and Conclusions
 Data obtained from the central Arabian Sea mooring located at 15.5°N and 61.5°E, deployed by the Woods Hole Oceanographic Institution during October 1994 to October 1995 have been used in the present study to estimate SLD from surface parameters. The predictions from ANN model and MRT have been validated with in situ SLD on hourly and daily bases. The independent surface parameters considered are SST, wind stress, NHF, radiation and DH. The ANN model predicted SLDs compared well with those obtained from in situ profiles. Statistical comparison of SLD on hourly, daily and monthly bases from these approaches shows the superiority of ANN approach over MRT. In case of hourly analysis, the RMSE between SLDAN and SLDIS is 11.83 m with R of 0.84. About 76% (91%) of the estimations from ANN lie within ±10 m (±20 m). RMSE between the two estimates reduced to 7.7 m with R of 0.90 when daily averaged SLD were compared. About 83% (96%) of the daily estimations lie within ±10 m (±20 m). Hourly estimations from ANN approach have been averaged on monthly basis also to study the monthly variations in RMSE. The RMSE ranged from 1.8 to 12.1 m. Monthly variations of SLD estimated from ANN approach has a good correspondence with monthly averages of SLDIS. Monthly variations during the study period significantly differ from the climatology indicating the requirement of the estimation of SLD on regular basis instead of considering the climatological values. Similar to the variation of MLD, SLD also has bimodal oscillations having one peak during winter season and the other during monsoon season. Deeper SLD during the winter season due to convective mixing is more compared with that during the monsoon season that is due to the wind stress mixing or Ekman pumping. This result indicates the dominant nature of convective mixing over the wind affect through stirring or Ekman pumping.
 Due to the limited availability of in situ profiles and the corresponding surface observations affecting SLD, the capability of ANN to estimate the SLD could be demonstrated only at one location. Hence, the ANN model developed at this location may not yield very good results over other regions. To extend this study over a large spatial extent, deployment of more ocean moorings such as the one used in this study is recommended. Since sufficient satellite observations affecting SLD are now available, observations from these moorings will also help in demonstrating the use of remote sensing data to estimate SLD at larger spatial and temporal extent. Due to the limited remote sensing observations, particularly, of SSHAs from altimeters, we demonstrated the approach by using in situ surface observations alone as a proxy to the remotely sensed data. Though dynamic height is computed from the subsurface temperature and salinity profiles, we considered this as a surface parameter as this can be replaced with SSHAs at a later stage.