Best hyperspectral indices for assessing leaf chlorophyll content in a degraded temperate vegetation

Abstract Extensive studies have focused on assessing leaf chlorophyll content through spectral indices; however, the accuracy is weakened by limited wavebands and coarse resolution. With hundreds of wavebands, hyperspectral data can substantially capture the essential absorption features of leaf chlorophyll; however, few such studies have been conducted on same species in various degraded vegetations. In this investigation, complete combinations of either original reflectance or first‐order derivative spectra we conducted a complete combination on either original reflectance or its first‐order derivative value from 350 to 1000 nm to quantify leaf total chlorophyll (Chll), chlorophyll‐a (Chla), and chlorophyll‐b (Chlb) contents. This was performed using three hyperspectral datasets collected in situ from lightly, moderately, and severely degraded vegetations in temperate Helin County, China. Suitable combinations were selected by comparing the numbers of significant correlation coefficients with leaf Chll, Chla, and Chlb contents. The combinations of reflectance difference (D ij), normalized differences (ND), first‐order derivative (FD), and first‐order derivative difference (FD(D)) were found to be the most effective. These sensitive band‐based combinations were further optimized by means of a stepwise linear regression analysis and were compared with 43 empirical spectral indices, frequently used in the literature. These sensitive band‐based combinations on hyperspectral data proved to be the most effective indices for quantifying leaf chlorophyll content (R 2 > 0.7, p < 0.01), demonstrating great potential for the use of hyperspectral data in monitoring degraded vegetation at a fine scale.

and assessment of the overall health of the vegetation, indicating its degradation status (Gottardini et al., 2014;Peng et al., 2014). This will allow us to conduct restoration and revegetation actions where they are required. Remote sensing, including hyperspectral remote sensing, is one of the most common pathways for fast and nondestructive Chll content estimation at leaf and canopy scales (Elarab, Ticlavilca, Torres-Rua, Maslova, & McKee, 2015;Houborg et al., 2015a). Numerous spectral indices have been developed to estimate leaf Chll and its composition. Hyperspectral data, with its hundreds of wavebands and 1-3 nm resolution, can greatly improve prediction accuracy, have attracted extensive attention and been regarded as a powerful proxy to extract the information of plant physiological parameters. Since then, extensive studies have been conducted aiming at to develop better hyperspectral indices than before (Lu & Lu, 2015;Liang et al., 2016). With the appearances of hyperspectral satellite (Marshall & Thenkabail, 2015), hyperspectral data demonstrate great potential for ecological application. To date, most published hyperspectral indices for estimating chlorophyll content generally use the wavelength domain ranging from 400 to 860 nm, on either original reflectance or derivative value-based indices (Peng, Gitelson, Keydan, Rundquist, & Moses, 2011). Most spectral indices are only applicable to vegetation types which are developed, subject to site-specific problems. Numerous indices were developed based on purely statistical analysis. Specific wavelengths selected through this method could change from one location to another, as lack in the consideration of leaf Chll absorption characteristics. It can reasonable to deduce that hyperspectral indices developed on narrow bands sensitive to leaf Chll content could perform better than empirical spectral indices solely based on several bands.
Currently, the first derivative value (FD) is often used to decompose a mixed spectrum and reduce the noise in hyperspectral data (Yao et al., 2015). Many studies have demonstrated the potential of derivative spectra for estimating chemical contents of noncrop vegetation types (Chen, Li, Wang, Peng, & Chen, 2011;Cao, Wang, & Zheng, 2015). Derivative spectral indices are found very sensitive to Chll, among them the first-order derivative spectra are the best predictors for Chll content (Liang et al., 2016). However, few studies have examined the performance of first-order derivative spectra or its combinations in estimating leaf Chll content through wavelengths from 400 to 1000 nm, which is the frequently used domain in spectra for most spectral sensors worldwide.
Considering above background analysis, this study uses the entire reflectance data ranges from 350 to 1000 nm, and complete combination of reflectance or its FD, followed by correlation and stepwise regression, which were not used before, to improve hyperspectral indices. The main aim of this study was therefore oriented toward developing suitable hyperspectral indices for estimating leaf chlorophyll content in temperate degraded vegetation. To achieve this key objective, this study will identify the narrow wavebands sensitive to these elements, through the comparison of correlation coefficients among a complete combination of reflectance and its FD. This will be performed across the entire available wavebands. We use a stepwise linear regression analysis for combination optimization. The newly developed indices are then compared with published empirical indices in order to select the best performing hyperspectral indices for the estimation of leaf Chla, Chlb and Chll status. An extensive dataset of in situ hyperspectra and leaf Chla, Chlb, and Chll contents was collected over three degraded intensity vegetation sites in Helin County, Inner Mongolia over a 2-year period (2012)(2013), and also used for simulation and validation.

| Study area
The study was conducted in Helin County, Inner Mongolia, China.
Helin County locates at the northern agro-pastoral ecotone, characterized by a collection of flat plains, hills, and mountains with relatively equal area (Figure 1). The highest elevation was 2031 meters and was a total area of 3401 square kilometers. Helin County has a temperate climate with obvious wet (summer) and dry (winter) seasons. Its annual average temperature is 5.6°C, with a seasonal average temperature of -12.8°C in January and 22.1°C in July. The average annual precipitation is 417 millimeters, with approximately 30 millimeters in January and 103 millimeters in July. The average wind speeds are slightly higher in spring and winter than in the summer and fall seasons. The average relative humidity for the whole year does not show obvious seasonal changes. The semi-arid climate supports sandy vegetations, in which grass and shrubs are predominant in this area.

| In-situ hyperspectral and chlorophyll measurements
In-situ datasets were collected from degraded sparse-forest grassland in Helin County (Fig. 1). The degradation intensity of vegetation was classified into three intensities: light, moderate F I G U R E 1 The land use map of Helin County, China, indicates the location of present study and severe degradation, according to canopy coverage, plant diversity and soil conditions . Light degraded vegetation has highest canopy coverage (76%), species diversity (richness is 32 and Shannon-Weaner index is 2.36) and soil moisture (Relative weight in 20 cm soil depth, 24%), followed by moderate (52%, 28 and 1.34, 16%) and severe degraded vegetation (33%, 22 and 0.88, 7%). The field measurements were col-

| Hyperspectral indices development and validation
In the raw data, the marginal ranges 325-350 nm and 1,000-1,075 nm from each spectrum were removed due to noise effects. The aim of spectral indices is to construct a mathematical combination of spectral band values for enhancing the information content in regard to the parameter under study. Most published indices (Stagakis, Markos, Sykioti, & Kyparissis, 2010;Inoue, Sakaiya, Zhu, & Takahashi, 2012) are expressed as reflectance (R i ) or a first-order derivative (FD) at a given wavelength, wavelength difference (D ij ), ratio (RR), normalized difference (ND) or inverse reflectance differences (ID). Thus, ten common types of indices based on both original reflectance and derivative spectra, as follows, were used in this study: where R is reflectance, FD is first-order derivative spectra and the suffixes (i or j) are wavelength(nm). In the entire wavelength domain ranging from 350 to 1,000 nm, these indices were evaluated by correlation analysis with leaf Chll and its composition. The optimum wavelength representing Chll, Chla and Chlb content was identified based on the highest R 2 between the in-situ hyperspectral data and leaf Chll contents.
The sensitive bands will be further filtered through stepwise multiple linear regression analysis. Stepwise multiple linear regression analysis can reduce the redundancy collinear spectral variables to a few noncorrelated latent variables, thereby avoiding the potential overfitting problems that are typically suffered with correlation analysis (Yu et al., 2013;Luo et al., 2017).
Still, in order to evaluate our developed hyperspectral indices, we have derived 43 empirical frequently used indices from the published literature. We compared the performance of the empirical indices with newly developed hyperspectral indices by comparing the R 2 value and its significant level.
The final spectral indices, which were extracted from narrow sensitive narrow bands, passed the statistical significant significance test, performed better than empirical indices, would and can be regarded as a global index which will sufficiently represent leaf Chll content.

| Hyperspectral curves
We first investigated the hyperspectral curves of degraded vegetations with various degradation intensities and estimated to what degree the spectral response differentiates. It is obviously the reflectances differed along degradation intensities ( Figure 2). We performed a t-test for the bands at 350 and 1,000 nm which represent the optical and nearinfrared zones. For each pair datasets, the p value was less than 0.05, indicating that the discrepancies between degradation intensities are statistically important. Thereafter, the development of new hyperspectral indices and utilization of empirical spectral indices should be conducted on three degradation intensities. Finally, the indices which have the high consistency and high accuracy across three degradation intensities were selected as the best indices to predict leaf chlorophyll content.
(1) can narrow the wavelength range, and hence may yield a more powerful indicator of plant leaf Chll content. By counting the number of significant correlation coefficients, the sensitive bands for each selected combination were identified. Using this method, a total of 28 sensitive bands were identified for leaf Chll content. We used a stepwise linear regression analysis to further identify the best bands among the 28 sensitive bands selected above. This allowed us to find the best combination of bands for each index. Once this step had been completed, four regression equations were established for leaf Chll, Chla, and Chlb content, respectively (Table 1).

| Empirical hyperspectral indices assessment
There are 43 frequently used empirical indices cited in the previous studies. These have been selected as the reference indices. Table 2 lists the correlation coefficients of the 43 empirical indices along with our in-situ spectral measurements for the three degradation inten- Based on this criterion, we considered the consistency and the number of significant correlation coefficients across the three degradation intensities. Three spectral indices were finally selected for leaf Chll estimation ((SDr − SDy)/(SDr + SDy), SDr/SDy, DVI). These indices have high consistency and good performance (Table 2). Also, the comparison of R 2 values between the optimized stepwise regression indices (which we derived from the complete combinations (Table 1)) F I G U R E 2 Mean leaf reflectance (left), the first deviation (right) and 95% confidence intervals (in light gray) for the samples collected from the light, moderate and severe degradation vegetations in Helin County, Inner Mongolia, China | 7073 PENG Et al. and the empirical indices (Table 2), reveals that the R 2 values of the proposed stepwise regression indices are significantly higher than the best performing empirical indices.
In order to conduct an empirical index-based leaf Chll content assessment, we must establish a suitable equation for each of the three selected indices from Table 2. The linear regression equations were established (Table 3)

| The validation of the selected empirical indices and the newly developed hyperspectral indices
We used the newly developed (Table 1), and selected empirical (Table 3) spectral indices to predict plant Chll content on other test samples. Based on both the predicted values for each leaf chlorophyll parameter, and those from the field survey, the linear regression and correlation coefficients were established and calculated.
In total 21 graphics were created (Figure 4) showing the relation between predicted values and those measured in the field. In most cases, the R 2 of the empirical model predictions was considerably lower than that calculated from the newly developed models. Even this, the empirical models also statistically satisfy the plant parameter assessment, as indicated by significant correlation coefficients.
It is also interesting that the leaf Chla content can be perfected better predicted by both the new and empirical indices, than the other two leaf parameters.

| D ISCUSS I ON
It is convenient to estimate leaf Chll content by means of spectral indices retrieved from observed reflectances by a handheld spectrometer. The complete combinations of reflectance and its first-order derivative value across the entire band acquired by the ASD spectrometer were constructed. The sensitive bands were identified, and the most suitable models of combined sensitive bands were further selected. The results demonstrate that the newly developed models perform better than solely empirical spectral indices in estimating plant leaf Chla, Chlb, and Chll contents.
Most previous studies developed spectral indices based on visible bands ranging from 400 to 760 nm, on either original reflectance or derivative value-based indices . This is because chlorophyll strongly absorbs light at blue (400-500 nm) and red (600-700 nm) regions, and does not include light in the green, orange (500-600 nm), and near-infrared regions (Houborg et al., 2015b;Beck et al., 2016;Sonobe & Wang, 2017). Wider spectra may capture more information of leaf physiological status. Our study demonstrated several bands beyond 760 nm are also highly sensitive to leaf Chll content. The NIR reflectance, is known to be affected by leaf anatomical structure such as leaf thickness, cell walls, and intracellular air spaces (Slaton, Hunt, & Smith, 2001), could also indicate leaf Chll content (Pastorguzman, Atkinson, Dash, & Riojanieto, 2015), more obviously under the condition of high chlorophyll concentrations (Gitelson & Merzlyak, 2003). Compare to limited combination on original reflectance within visible region in previous some studies, we conducted a complete combination on either original reflectance or its first-order derivative value, through wavelengths from 350 to 1,000 nm. It has increased the possibility to find more suitable combinations than limited wavelengths used by before.  F I G U R E 3 The correlation coefficient curves of combinations of R i , R j with leaf Chla, Chlb and Chll content, respectively. The x-axes indicate the wavelength ranges from violet, blue, cyan, green, yellow, orange, red to near-infrared light (350-1000 nm  received are confounded by the influence of atmospheric effects, vegetation characteristics and background reflectance (Daughtry, Walthall, Kim, & de Colstoun, 2000). The first-order derivative spectra, which may be calculated approximately by dividing the difference in reflectance between successive wavebands, can eliminate background noise and resolve overlapping spectral features (Aneece, Epstein, & Lerdau, 2017). FD was widely used in various indices for estimation vegetation parameters with a considerable higher accurate (Chen et al., 2011;Cao et al., 2015) than other indices. In the present study, the FD-based indices were great success in estimating plant leaf Chll and its composition, even we conducted measurements at leaf level which has not background noise and resolve overlapping spectral matters. Thus we strongly suggest FD should be used as an independent variable to estimate plant leaf Chll content.
A high consistence in estimating plant parameters across various degraded vegetations should also be a criterion for desirable spectral indices (Lu & Lu, 2015). We used three intensities of degraded vegetations to test the consistence and credibility of proposed spectral indices. Discrepancies in reflectance and its FD tend to be more pronounced with degradation intensity increase, may partly be explained by the increase in vertical Chll distribution in light degradation vegetation (Gitelson, Peng, Arkebauer, & Schepers, 2014) than severe degraded vegetation. In the severe degraded vegetation, with decreases in canopy Chll content, the absorption capacity also decreases, reflected by lower reflectance values in visible and red edge spectrum than those in the light degraded vegetation . It is argued that the narrow bands can capture the Chl-a reflectance red minimum and near-infrared peak, estimate Chl-a concentrations well (Beck et al., 2016). Our study confirmed this deduction. It is valuable to notice that the selection of sensitive bands by stepwise linear regression can greatly improve the predictive performance. As a "full spectrum" method, stepwise linear regression can not only efficiently deal with the strong multi-collinearity problem, but also considers the covariance problem in the model response/dependent variable(s) (Yu et al., 2013;Luo et al., 2017). Therefore, it is better to deal with potential confounding factors rather than employ a simple index-based approach.
The present study has considerable applicable potential for practice. Compared with multiple-spectral imagery, the hyperspectral data gained by handheld portable device has the advantages of high spectral resolution, low labor cost, and less affected by atmosphere layer and background environment. It is more suitable when carrying out repeat measuring on fine-habitat vegetations over large areas, especially for croplands, grasslands and desert vegetation.
With the appearance of more and more satellite-based hyperspectral data, that is, Hyperion onboard Earth Observing-1 (EO-1), with 10-nm spectral information from 350 to 2,500 nm, and several other extend the usage of hyperspectral data to a wider scope, as ecorestoration, eco-condition assessment, and precise agriculture.

| CON CLUS ION
Complete combinations (value at a given wavelength, wavelength difference, value ratios, normalized differences, and inverse differences) based on either original reflectance or first-order derivative spectra have been developed to quantify leaf chlorophyll and its composition content using three datasets collected in-situ from light, moderate and severe degraded vegetations in temperate Inner Mongolia, China. The best combinations identified have further been optimized by sensitive bands selection and stepwise linear regression analysis, and were compared with the 43 empirical indices frequently used in the literature. By validating, the proposed indices proved to be the most effective indices for quantifying chlorophyll contents (R 2 >0.7 and p < 0.01), demonstrating great potential for using hyperspectral data in vegetation physiological monitoring at a fine scale. While these, hyperspectral indices are spectrally very narrow and can be applied only when the spectrometer has a very high spectral resolution of 1-3 nm. The new understandings obtained in this study may help to improve the potential of hyperspectral data for world degraded vegetation monitoring. Using these proposals, hyperspectral indices can improve the data quality of satellite/airborne imageries through scale conversion.
Future work will encompass more sensors other than hyperspectral devices from satellite, airborne, and LiDAR, to make an application and comparison on plant physiological assessment of desert, grassland, and cropland vegetations. Such research will help us to better understand the dependability of hyperspectral models and to extend the scope of its application.

ACK N OWLED G M ENTS
The study was financially supported by the National Key Research

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
Yu Peng designed and performed the study. Min Fan performed the analysis of hyperspectral and field survey data. Qinghui Wang, Wenjuan Lan and Yating Long collected and pre-processed the hyperspectral and field survey data. All authors discussed the results and contributed to the manuscript.