Chlorophyll-Based Model to Estimate Underwater Photosynthetically Available Radiation for Modeling, In-Situ , and Remote-Sensing Applications

al. (2020) have analyzed its performance in the global oceans Abstract Accurate estimation of the underwater light field associated with photosynthetically available radiation (PAR) is critical to compute phytoplankton growth rate and net primary production (NPP), and to assess photo-physiological response of phytoplankton, such as changes in cellular pigmentation. However, methods to estimate PAR used in many previous studies lack in accuracy, likely resulting in significant bias in light-dependent products such as NPP derived from remote sensing, model simulations, or autonomous platforms. Here we propose and validate a new model for more accurate estimation of the subsurface PAR profile which uses chlorophyll concentration as its input. Validation is performed using 1,744 BGC-Argo profiles of chlorophyll fluorescence that are calibrated with surface satellite-derived chlorophyll concentration over their lifetime. The independent verification with the float's PAR sensors confirms the accuracy of satellite chlorophyll estimate worldwide and in the Southern Ocean in particular.


Introduction
The vertical distribution of underwater photosynthetically available radiation (PAR) is a necessary input for models of net primary production (e.g., review by Behrenfeld & Falkowski, 1997) and is key to interpret physiological response of phytoplankton to light, its change in cellular pigmentation (e.g., Behrenfeld et al., 2016) or the associated solar-stimulated fluorescence (Behrenfeld et al., 2009). In the last two decades, carbon-based primary production models (CbPM, Behrenfeld et al., 2005;Westberry et al., 2008), and photoacclimation models (Fox et al., 2020), provide novel approaches to estimate marine net primary production (NPP) from satellite measurements that have also been applied to in-situ platforms (Estapa et al., 2019;Yang et al., 2020). These models derive the vertical distribution of PAR by attenuating surface PAR using an estimate of its diffuse attenuation coefficient, K d (PAR). As we show below, the methods used to determine K d (PAR) in the above studies could be significantly biased and be improved using the model proposed here.
In some studies, the diffuse attenuation coefficient at 490 nm (K d (490)) was used as a proxy of K d (PAR) for calculating the mixed-layer median light level (PAR ML , e.g., Behrenfeld et al., 2005 andYang et al., 2020). However, since the blue light around 490 nm generally penetrates deepest in clear waters (e.g., Morel, 1988), this would lead to an overestimation of light level. In another study, Behrenfeld et al. (2016) used an equation proposed by Morel et al. (2007) to compute PAR ML . This approach, however, has not been thoroughly evaluated for the full range of MLDs in the ocean (e.g., 10-500m). Lee et al. (2005, hereafter Lee05) proposed a depth-resolved K d (PAR) algorithm, based on the sea-surface inherent optical properties (IOPs). Xing et al. (2020) have analyzed its performance in the global oceans Abstract Accurate estimation of the underwater light field associated with photosynthetically available radiation (PAR) is critical to compute phytoplankton growth rate and net primary production (NPP), and to assess photo-physiological response of phytoplankton, such as changes in cellular pigmentation. However, methods to estimate PAR used in many previous studies lack in accuracy, likely resulting in significant bias in light-dependent products such as NPP derived from remote sensing, model simulations, or autonomous platforms. Here we propose and validate a new model for more accurate estimation of the subsurface PAR profile which uses chlorophyll concentration as its input. Validation is performed using 1,744 BGC-Argo profiles of chlorophyll fluorescence that are calibrated with surface satellite-derived chlorophyll concentration over their lifetime. The independent verification with the float's PAR sensors confirms the accuracy of satellite chlorophyll estimate worldwide and in the Southern Ocean in particular.

Plain Language Summary
In this study, we propose a new model to compute the underwater light field available for phytoplankton growth. The model uses as its inputs the above water radiation (which can be obtained from public databases) and the subsurface distribution of chlorophyll-a, a pigment shared by all phytoplankton that can be estimated from sensors deployed on robotic platforms and that is used by many ocean ecosystem models. We test the model accuracy using data observed by the light sensors on autonomous profiling floats and find it performs well. The comparison also highlights that estimates of surface chlorophyll concentrations from satellites are unbiased worldwide, in contrast to some published accounts.
with an extensive BGC-Argo data set and found it works well near the sea surface, but exhibits a significant underestimation in subsurface oligotrophic waters. This is not surprising as Lee05's algorithm assumes constant IOPs independent of depth, and in oligotrophic waters the presence of a deep chlorophyll maxima (DCM) increases the attenuation of light at depth. In addition, its applicability is limited, given that IOPs are not readily available in observations from most in-situ platforms and in ecosystem models.
Here we propose a new chlorophyll-based algorithm to compute the subsurface light field. The algorithm is evaluated by comparing the predicted light field to the observed underwater light field, as well as the associated diffuse attenuation coefficient, K d (PAR), based on the PAR observations from the global BGC-Argo array. In addition, we evaluate its performance relative to previous approaches in retrieving the mixed-layer median light level from ocean color remote sensing, a necessary input to some NPP models.

BGC-Argo Data Set
To provide input data as well as to evaluate the model, we use measurements from 193 BGC-Argo floats, operated from October 2012 to March 2020, with data compiled by the Global Data Assembly Center (GDAC) of the Argo program. These floats are equipped with Satlantic OCR504 radiometers and Wet Labs ECO in-vivo chlorophyll-a fluorometers. Each OCR504 has four channels, of which three channels measure downwelling irradiance (E d ) at 380 (or 443), 412, and 490 nm, while the fourth channel measures the instantaneous planar Photosynthetically Available Radiation (iPAR, the downwelling photon flux integrated from 400 to 700 nm). All profiles are classified into 10 regions (see criteria in Table S1), as shown in Figure S1, so that regional biases can be evaluated. All 35,729 raw iPAR profiles are quality controlled following the scheme detailed in Organelli et al. (2016). Finally, only Flag-1 (18,359, 51.4%) and Flag-2 (5,315, 14.9%) profiles are used in this study. The sea-surface iPAR(0 − ) is retrieved following Xing et al. (2020), and the layer-averaged diffuse attenuation coefficient at depth z, K d (iPAR, z), is defined as: The 1% light layer depth (sometimes referred to as the euphotic layer depth, e.g., Ryther, 1956), z 1% , is determined as the depth where iPAR reaches 1% of iPAR(0 − ). For evaluating K d (iPAR) within the mixed layer, MLD is calculated based on a threshold potential density difference (0.03 kg m −3 ) from a near-surface value at 10 m, following De Boyer Montégut et al. (2004).

Satellite Data
Daily 4-km-resolution remote-sensing reflectance (R rs ), and standard chlorophyll-a concentration (Chla-Sat ), provided by the MODerate resolution Imaging Spectroradiometer (MODIS) on the satellite "EOS PM (Aqua)," are used. The matchup criteria between float and satellite data is similar to Bailey and Werdell (2006): The median value within a 3 × 3 pixel box centered on each profile's position on the same date, is extracted from satellite data as the "matchup" value; Furthermore, we require that (1) the corresponding BGC-Argo observation time is within 3 h of satellite overpass; (2) there are at least 50% valid values in the box; and (3) for R rs at any band, the coefficient of variance (CV) within the box must be lower than 0.15. We obtain 1,744 valid satellite-float match-up profiles (Table S1).

Existing PAR Models
The daily PAR profile, PAR(z), is calculated from Equation 2, for a given sea surface PAR Sat , and estimated layer-averaged K d (PAR, z) profile.
Here, α represents the transmission coefficient of sun light through the air-sea interface provided by Mobley and Boss (2012). Note that here K d (PAR, z) represents the diffuse attenuation coefficient for the daily PAR which is well approximated by K d (iPAR, z) at local noon (Wei & Lee, 2013).
Given our data set, this study only evaluates the performance of our algorithm and other three published methods/models to estimate K d (iPAR, z) and iPAR(z). The first one, proposed by Behrenfeld et al. (2005) for satellite data and followed by Yang et al. (2020) for NPP estimation with BGC-Argo, used the K d (490) satellite product as the proxy for K d (iPAR). These studies implicitly assumed, (1) K d (490) at the first optical depth (z pd = 1/K d (490)) is equal to K d (iPAR), and (2) K d (iPAR) is depth-independent. Here we use K d (490) satellite product derived from the semi-analytical algorithm of Lee et al. (2013, from here on "Lee13-490").
The second one, called as "Morel07," employs the K d (490) The third model is "Lee05," a depth-dependent K d (iPAR) model which assumes depth independent IOPs in its calculation. Its inputs are a(490) and b b (490), which could be retrieved, for satellite-based applications from the Quasi-Analytical Algorithm (QAA) of Lee et al. (2002). In this study, we used its latest version QAA (QAAv6, updated in 2014), with the pure-seawater absorption coefficients from Lee et al. (2015).
Once the K d (iPAR) profile is determined, the instantaneous PAR profile (iPAR(z)) can be estimated with Equation 1, as long as the surface value (i.e., iPAR(0 − )) and appropriate sun angle are known. As for computation of the daily integrated PAR(z) estimates, we recommend that K d (iPAR) around local noon be converted to K d (PAR) following Equation 12 in Wei and Lee (2013). Additionally, one needs to be aware that while the technology used to measure PAR (and hence use it to develop algorithms) is typically measuring the planar downwelling irradiance, for NPP computation one should preferentially use scalar PAR. Wei and Lee (2013) compute, based on radiative transfer computations, a mean absolute percentage error of 10.5 between K d (PAR) and the diffuse attenuation of scalar PAR.

Chlorophyll-Based PAR Model
In this study, we propose a chlorophyll-based PAR model, which combines the clear-sky spectral irradiance model of Gregg and Carder (1990) For z = 1 m, (z−1) represents the sea surface (0 − ); finally, all E d (λ, z) at each depth are wavelength-integrated to iPAR(z), and sea-surface E d (λ, 0 − ) values are wavelength-integrated to iPAR(0 − ): where, h is Plank's constant, and c the speed of light in vacuum; then K d (iPAR, z) is determined through Equation 1. K d (iPAR, z) computed by the different models evaluated here is then used to estimate instantaneous PAR at any depth for any given iPAR(0 − ). As for the daily PAR estimate, the relationship proposed by Wei and Lee (2013, their Equation 12) should be employed to convert the local-noon K d (iPAR) to the daily K d (PAR). To account for reduction in sun light relative to the clear sky model of GC90, we use the local iPAR(0 − ) from float or PAR(0 − ) from satellite and K d (iPAR, z) derived from GCMM to compute iPAR(z) or PAR(z). This, de-facto, assumes that the differences between the PAR attenuation of cloudy skies and clear skies are similar. Indeed, our results (see below) suggest it does a reasonable job in predicting iPAR even to the depth of 1.5-fold z 1% . The R programmes of GCMM are available in https://github.com/BGC-Argo/ GCMM.
For satellite applications, the Chla profile could be assumed to be homogenous, or a vertical profile is estimated from surface Chla (Uitz et al., 2006; Text S1 and Figure S3). For in-situ platforms (e.g., BGC-Argo or glider), the input could be the observed Chla profile. In this study, "GCMM surf " represents the satellite-Chla-based method, with inputs of Chla Sat and homogenous assumption; "GCMM Uitz " represents another satellite-Chla-based method, with inputs of Uitz06's estimated Chla profile from Chla Sat ; and "GCMM prof " uses the the float-observed Chla profiles, whose quality-control procedure is shown in Text S2.

Statistical Metrics
The statistical metrics used in this study, include the root mean square error (ε), mean error (δ), unbiased root mean square error (σ), and relative mean error (δ%), defined as: Here, M i represents the float-measured value, E i represents the satellite-retrieved or model-estimated value, and n is the number of observations. ε represents the total deviation/difference, δ represents the systematic bias, σ represents the variance, and δ% represents the relative system bias.

Retrieval of Mixed-Layer Median iPAR
Mixed-layer median iPAR requires an estimation of a representative K d (iPAR), that is, K d (iPAR, MLD/2). Here we compare estimates from different models using the BGC-Argo iPAR data (note that the iPAR and Chla are measured by different sensors on the floats). The IOPs-based model (Lee05) performs best (Figure 1a), with lowest error (ε = 0.024 m −1 ) and negligible systematic bias (δ = 0.002 m −1 ). GCMM surf model also works well (Figure 1b), with a little higher bias, and similar ε and σ to Lee05. A consistency test (Figure S4) shows a negative correlation between its retrieval error and the float-observed K d (iPAR) (r = −0.66), indicating a tendency of overprediction in clear waters (lower K d ) and underprediction in turbid waters (high K d ). The main areas with this issue include the Mediterranean Sea and Black Sea, suggesting the cause to be a CDOM influence on K d (PAR) retrieval in the GCMM model. It is noteworthy that the other models show a similar issue to different extents ( Figure S4), implying that a potential for CDOM-driven bias exists in all models. A test of Lee05 and GCMM surf sensitivity to solar zenith angle shows that both have XING AND BOSS 10.1029/2020GL092189 5 of 11 good performance for zenith angles varying between 20° and 60° ( Figures S5-S6); GCMM surf has a slight overprediction at low zenith (<20°, Figure S5b), while Lee05 exhibits overprediction at high zenith (>60°, Figure S6c).
Using Morel07's empirical relationship (for 2z pd , Equation 5 in this study, Equation 9 in Morel et al., 2007) displays an underestimation of about 16% on average K d (iPAR, MLD/2) (Figure 1c), and with higher variability (σ = 0.031 m −1 ). This is likely due to the fact that the relationship was determined at 2z pd , without depth-dependence. We also test Morel07's empirical relationship for z pd (their Equation 9 in Morel et al., 2007), which performs similarly with ε = 0.032 m −1 (figure not shown). The use of K d (490) as a proxy for K d (iPAR, MLD/2) performs worst, underestimating by about 49% (Figure 1d). It follows that studies using K d (490) for primary production models (e.g., Fox et al., 2020;Yang et al., 2020) underestimated K d (iPAR, MLD/2) significantly and thus likely overestimated iPAR, PAR, as well as NPP.

Retrieval of the Vertical Distribution of iPAR
The good performance of Lee05 and GCMM surf in retrieving K d (iPAR, MLD/2) is not only due to their depth-resolving characteristics for different MLDs, but also because of the assumption of homogenous IOPs and Chla in the mixed layer (which is consistent with expectation). However, the later assumption is likely wrong below the MLD, particularly in the presence of DCM which is ubiquitous in oligotrophic waters. More Chla below the ML would attenuate light faster than near the surface. Thus, we expect to see an underestimation of K d (iPAR) and overestimation of iPAR at subsurface in oligotrophic waters, for both surface-based Lee05 and GCMM surf models (as shown in Xing et al., 2020).
An example of a profile observed in the Western Mediterranean Sea is provided in Figure 3. The MLD is at 15 m, z 1% reaches ∼55 m, and DCM appears at ∼38 m ( Figure 3a). Uitz06-estimated Chla profile has a similar DCM depth to the observed one, but with much lower Chla concentration at DCM (Figure 3a). The observed iPAR profile (Figure 3b) is characterized by two layers with larger attenuation, one near the sea surface and the other around the DCM. The first one is driven by the faster attenuation of the red light due to absorption by water (Lee 2009;Morel et al., 1988) and the second one is due to the higher chlorophyll-a concentration, which increases both absorption and backscattering losses. All four methods estimate iPAR well near sea surface (Figure 3b), in other words, all resolve the issue of faster loss of red light near the surface. However, Lee05 and GCMM surf do not resolve the DCM, and thus result in a uniform K d (iPAR) below the MLD, shown as a straight line on the semi-logarithmic plot (Figure 3b). Consequently, both models underestimate K d (iPAR) and overestimate iPAR near the DCM. By contrast, GCMM prof calculates the vertical change of K d (iPAR) based on the float-observed Chla with inhomogeneous distribution, and thus provides an estimated iPAR profile more consistent with the observed one (Figure 3b). Lee05 and GCMM surf overestimate iPAR below about 0.6z 1% while GCMM prof performs well throughout the 1.5-fold z 1% layer (Figure 3c).
The relative errors (δ%) for all profiles are presented in Figure 4. They are first classified into two types, denoted as the "Mixed Type" (maximal Chla within the ML) and "DCM Type" (maximal Chla value below the ML). In Mixed-Type waters (371 profiles in total), the maximal Chla is located within the ML, with relatively shallow z 1% , and overall, all four models estimate the iPAR profile well (Figure 4a). Lee05 has a slight underestimation, with lowest relative errors (δ% = 12.8%) from surface to 1.5z 1% ; GCMM surf and GCMM Uitz exhibit an overestimation from 0.2z 1% ; GCMM prof performs better than other two GCMM models (δ% = 16.8%), but with a slight overestimation at depth from 0.8z 1% to 1.5z 1% . It implies that Lee05's homogenous assumption works down to the 1.5-fold 1% light level depth for a mixed profile.
As for the DCM-Type waters (1,334 profiles in total), Lee05 and GCMM surf overestimate iPAR from half z 1% (Figure 4b), and reach an error of ∼100% at z 1% , and even more than 300% at 1.5z 1% (not shown in Figure 4b). GCMM Uitz demonstrates an improvement relative to GCMM surf , as the former estimates a DCM below the MLD, but the latter assumes no DCM at all. However, because Uitz06 is established based on a statistical analysis of observational data, and thus with an estimated Chla profile representing a global mean for a given sea-surface Chla, it can neither account for regional differences in DCM magnitude and depth, nor generate an abrupt Chla change driven by short-term physical and biological processes (e.g., meso-/submesoscale processes and synoptic-scale events). Nevertheless, GCMM Uitz performs significantly better than GCMM surf (δ% = 30.8%, compared to 119.8%), exhibiting an overestimation below ∼0.8z 1% , and ∼100% of relative error at 1.5-fold z 1% . This emphasizes the difficulty of obtaining accurate estimates of the light intensity profile from satellites. Although DCM depth and magnitude often display some statistical relationships with surface Chla (e.g., Mignot et al., 2011;Uitz et al., 2006), it is difficult to accurately derive the subsurface Chla distribution from surface observation. Comparatively, GCMM prof performs well down to 1.5-fold z 1% , XING AND BOSS 10.1029/2020GL092189 7 of 11 with a mean relative error of 5.3%, suggesting a good consistency between iPAR and Chla profile, which are independently measured by the BGC-Argo floats.
Regional analysis reveals that GCMM prof outperforms the other algorithms in all oligotrophic waters (mostly with DCM-type profiles), for example, subtropical gyres ( Figure S7a), eastern and western Mediterranean Sea ( Figures S7b-S7c), as well as the Transition area ( Figure S7d). In the subtropical gyres ( Figure S7a), although GCMM Uitz has a lowest systematic bias (δ% = 26%), GCMM prof works better in the upper 1% light level layer. Lee05 outperforms all three GCMM models at high latitudes (where there are more Mixed-type profiles), for example, the Southern Ocean ( Figure S7g) and subpolars ( Figure S7h). In the Black Sea (Figure S7f) and Arctic Ocean ( Figure S7i), although none works well, GCMM prof displays the best statistical performance.
Besides, the good performance of GCMM surf model suggests that, (1) satellite Chla could be used for the retrieval of mixed-layer median K d (iPAR) (Figure 1b) and near-surface iPAR (Figure 4) with the GCMM model; and (2) it proves, although indirectly, that no significant regional bias exists for the satellite estimate of Chla retrieved with the OCI algorithm. This is of particular interest in the Southern Ocean ( Figure S8a), where the global satellite Chla algorithm have long been challenged (e.g., Dierssen & Smith, 2000;Johnson et al., 2013;Kahru & Mitchell, 2010) though others (Haëntjens et al., 2017;Marrari et al., 2006) suggested they are unbiased. We test the GCMM surf model's sensitivity in the SO by multiplying and dividing the OCI Chla product by a factor of 2, as the input to retrieve K d (iPAR, MLD/2), and find significant bias (+22.5% and −13.8%; Figures S8b-S8c). This suggests that the model is sufficiently sensitive to the input Chla, and thus useful to evaluate the possible system bias of Chla algorithms. Note that in this analysis, the float-observed and satellite-derived subsurface light fields are mutually independent, except for sharing the same MLDs observed by floats.

Summary
Existing models for subsurface light field are evaluated in this study, with the recent BGC-Argo database. We find that the Lee05 model provides the most accurate iPAR(z) retrieval near sea surface and within the mixed layer, and we recommend it be employed for ocean color satellite-based mixed-layer photoacclimation and primary production models. Models using K d (490) as the proxy for K d (iPAR) and the formulation of Morel07 significantly underestimate iPAR attenuation resulting in an overestimation of iPAR. Lee05 works well not only near the sea surface, but also within the 1.5-fold 1% light level layer, except for DCM-Type waters. In such waters, for example, the subtropical gyres and Mediterranean Sea, a DCM appears remarkably at subsurface, leading to an obvious overestimation of iPAR from half z 1% to 1.5 z 1% (relative error of over 300%).
The chlorophyll-based PAR model, GCMM, proposed here, can be used for both remote-sensing (with the homogenous assumption as in Lee05, GCMM surf ) and in-situ platforms with depth-resolved data (GCMM prof ). While Lee05 outperforms GCMM surf in general, it requires inputs seldom available from in-situ platforms. GCMM prof performs best for in-situ platforms that resolve the vertical structure of chlorophyll and can easily be implemented in ecosystem models where chlorophyll-a concentration is included. We test the Uitz06 model to provide a Chla-profile estimate from the satellite Chla and then merge it with GCMM model (i.e., GCMM Uitz ), which shows an improvement of estimated iPAR when compared with GCMM surf . However, the Chla profiles retrieved were often significantly biased. Improvement in estimating vertical profile from surface observations may come from taking more environmental data into consideration (e.g., wind, temperature) and addtional in-situ Chla data (e.g., BGC-Argo). Besides, the novel machine learning techniques are potentially helpful to better retrieve underwater light fields from surface observations. XING AND BOSS 10.1029/2020GL092189 9 of 11 . Averaged relative error (δ%) at each depth normalized by z 1% , for four estimated iPAR by the model satellite IOPs-based Lee05 (blue), satellite Chla-based GCMM surf (green), satellite Chla and Uitz06-based GCMM Uitz (purple) and float Chla-based GCMM prof (red), for Mixed Type (maximal FChla locates within the mixed layer) (a) and DCM Type (maximal FChla locates below the mixed layer) (b) waters, respectively. The numbers in legend represent the relative error (δ%) averaged from surface to the 1.5-fold 1% light level depth (1.5z 1% ).
While it may seem that using Chla and PAR is a step backward in comparison to using IOPs and spectral radiometry, the reality is that IOPs are scarcely measured globally (except for b bp (700)) and are also scarcely parameterized into biogeochemical models (for exceptions see IOCCG, 2020). In addition, both PAR and Chla are considered mature products from Ocean Color (as of today, no ocean color mission provides spectral surface irradiance as a product and IOPs are only distributed as products by a handful of missions). Hence, it is important that, the community that parameterizes biogeochemical processes based on Chla and PAR, uses unbiased and validated relationships such as we provide here. Besides the inclusion of vertical structure, future improvement should include conversions of planar downwelling iPAR to scalar PAR (e.g., Frouin et al., 2018;Mobley & Boss, 2012;Wei & Lee, 2013), and the potential sources of uncertainty in the present GCMM model (including the Chla-K d relationship variability at different solar zenith angles and under heavy clouds, and the change of average cosine under clouds relative to the clear-sky condition [Frouin et al., 2018]), to provide even less biased inputs to NPP algorithms.