Detecting the Non‐Separable Causality in Soil Moisture‐Precipitation Coupling With Convergent Cross‐Mapping

Soil moisture‐Precipitation (SM‐P) coupling, especially the causality from SM to P (SM → P), is an important and highly debated topic. The causal inference from observational data provides a statistical approach for this issue, where the experimental research is infeasible. Various causal inference methods exist, each assuming distinct underlying systems for the targeted variables: pure stochastic or deterministic dynamic system (DDS), which means that these methods detect different types of causality: separable or non‐separable. Acknowledging the inherent deterministic dynamic nature of SM‐P coupling, this study employs a DDS‐based method: Convergent Cross‐mapping (CCM), to detect their non‐separable causality. Centering around SM‐P coupling, we also detect the causality between SMs in shallow and deeper layers, and the causality between evapotranspiration (ET) and SMs in all layers. Notably, before applying CCM, a preconditional procedure is required: verifying the DDS nature of targeted variables P, SM, and ET. Key findings of this research include: (a) In SM‐P coupling, only SMs in shallow layers but not deeper layers could render causalities toward P, while P renders causalities toward SMs in all layers. (b) The time‐delay of causality from SM1 to P in spring/summer is around 4–6 days, and that from P to SM1 is around 2–4 days. (c) Inside SMs, shallow SMs have obvious causalities downwards to deeper SMs, but deeper SMs seem harder to render causalities upwards to shallow SMs, explaining why they hardly render causalities to P. In summary, this study furnishes an indispensable DDS‐based complement to stochastic‐based methodologies for SM‐P coupling.


Introduction
As a vital land surface parameter, soil moisture (SM) affects the overlying weather condition by influencing water and energy fluxes, like supplying water vapor (Yang et al., 2016), partitioning radiative energy into latent and sensible heat fluxes (Seneviratne et al., 2010), inducing moist convection and lateral convergence (Pielke, 2001), potentially leading to cloud formation (Ek & Holtslag, 2004;Taylor et al., 2011Taylor et al., , 2012)), and eventually affecting precipitation (P).Soil moisture-Precipitation (SM-P) coupling is therefore an important topic, especially the causality from SM to P (SM → P), that is, how P changes as SM is modified, which helps to understand how water exchanges between land and atmosphere, bearing implications for weather forecast and future climate change adaptation (Koukoula et al., 2021;Sehler et al., 2019;Tuttle & Salvucci, 2016;Wei & Dirmeyer, 2012).
Although controlled (interventional) experiments provide a robust framework to understand causality (Pearl & Mackenzie, 2018), such approaches are ethically, practically, and/or economically infeasible for large-scale SM-P coupling with intricate physical processes (Runge, Nowack, et al., 2019).Hence, inferring the causality from the observational data, that is, time series of SM and P, is an alternative avenue (Pearl, 2009a(Pearl, , 2009b;;Runge, Bathiany, et al., 2019), where the goal is to statistically quantify the causal effect of SM on P (Runge et al., 2023).In the land-atmosphere system to which SM-P coupling belongs, observational-based causal detection proves instrumental in enhancing our understanding of physical mechanisms and contributes to the improved construction of models, facilitating more accurate forecasts (Imbens & Rubin, 2015;Seneviratne et al., 2010;Tuttle & Salvucci, 2017).
The characteristic of purely stochastic system is separability: the information about a "cause" is independent to its "effect" and can be removed simply by eliminating the "cause" from the system (Yang et al., 2018).Hence, separability is the key requirement of those stochastic-based methods (Sugihara et al., 2012).Whereas the separability is not satisfied in a DDS (even with noise): information about a "cause" will be redundantly present in its "effect" and cannot formally be removed from the system (Deyle & Sugihara, 2011), and thus stochastic-based methods like GC or SCM are no longer valid in a DDS, but only CCM applies (Breston et al., 2021).Therefore, detecting the nature of the underlying system of the targeted variables is essential before choosing the appropriate tool for causal detection.
Stochastic-based methods like GC have found widespread application in hydro-meteorological analysis (Galytska et al., 2023;Gupta & Jain, 2020, 2021;Silva et al., 2021;Sun et al., 2016), especially in assessing the impact of SM on P (Luo et al., 2023;Salvucci et al., 2002).However, arguments exist supporting that Earth's systems lean toward a deterministic dynamic nature (Boers et al., 2020;Rajagopalan et al., 2019;Tesch et al., 2023).In the real-world land-atmosphere systems, particularly, variables are intricately interwoven and coupled (Díaz et al., 2022), where separability may not be satisfied and non-separability prevails instead (Leng et al., 2020).As thus, the causality of SM → P might not be captured well by stochastic-based studies, which underscores the indispensability of applying CCM (Gao et al., 2023).
To this end, we select four study regions and, in each region, we employ CCM to detect the causalities from SMs in different layers (shallow and deeper SMs) to P, respectively, and their causal time-delay (i.e., the time it takes for the causal information to flow from "cause" to "effect," or the causal response time).Simultaneously, recognizing the non-separable nature of causal information between SM and P, to fully understand SM-P coupling, we investigate P → SM besides SM → P, that is, the causality from P to SMs in different layers and the causal response time.Furtherly, to ensure a comprehensive and nuanced exploration centering around SM-P coupling, we use CCM to detect the causalities between deeper and shallow SMs, and the causalities between evapotranspiration (ET) and SMs in different layers.

Deterministic Dynamic System (DDS)
As discussed, a critical prerequisite for the application of CCM is to ascertain the DDS nature of the underlying system governing the targeted variables.Although the nature of SM-P coupling is theoretically consistent with a DDS (D'Odorico & Porporato, 2004;Qing et al., 2023;Zhou et al., 2021a), a rigorous proof is still necessary before applying CCM, telling it apart from purely stochastic system.This section delves into the concept and the formal definition of a DDS, elucidating the non-separable causal connections among variables within such a system, exploring its properties, elucidating how to confirm DDS nature from individual time series.

Definition of DDS and Phase Space
A DDS is mathematically defined as follows: In Equation 1, for a variable (e.g., X in Equation 1.1) in a DDS, its value at t is decided by its previous value and those of others (e.g., Y and Z) at t τ jointly.This is how variables causal-connected in a DDS: deterministically governed by those underlying equations f, g, and u.These equations also imply non-separability, for example, it is generally unable to remove Y and Z from Equation 1.1 by deformation or derivation; also see Leng et al. (2020) for the rigorous concept of non-separability.By contrast, we also provide the definition of GC in Text S1 in Supporting Information S1, to demonstrate the separable case among variables.
A DDS is often analyzed in phase space (Tarasova et al., 2020;Wendi et al., 2018).For example, considering the three variables X(t), Y(t), and Z(t) linked by Equation 1, we construct a three-dimensional space whose axes represent them respectively, and each point in this 3-D space represents a certain phase.Let's put a little ball "m" in this space, which can be regarded a simple 3-D DDS: m(x(t), y(t), z(t)).Governed by the underlying equations and with time progression, "m" will be moving in this phase space, depicting the dynamic evolution of the system.Meanwhile, each axis starts to generate a single time series X(t), Y(t), or Z(t), indicating the dynamic evolution of each individual variable.Lorenz (1963) utilized three real-world meteorological variables: convective overturning, horizontal temperature, and vertical temperature as X(t), Y(t), and Z(t), connected by a set of nonlinear differential equations, to simulate the DDS of the Convective Motion in weather systems (Figure 1a).Over time, the trajectory of the "m" gradually forms a butterfly shaped geometric construction in phase space, called a manifold, which is defined as the attractor of "m," since it looks like that "m" is attracted by this manifold and never escapes from it.
The attractor is the foundation of DDS and a distinctive characteristic differentiating it from stochastic process.If X(t), Y(t), Z(t) in "m" were three stochastic processes, the trajectory of "m" will also form a manifold in phase space, but it is not an attractor, since a real attractor is driven by the interaction among variables, that is, the underlying equations, which means that it is essentially deterministic, not stochastic, despite its strange geometric construction.Specifically, an attractor has the following features distinguishing itself from stochastic process: 1.For the manifold of a DDS's attractor, the trajectory of any individual variable cannot be separated from it.
However, for the manifold of stochastic processes in phase space, the trajectory of any variables could be separated from it, since this manifold is simply a linear superposition of different stochastic processes, and different trajectories are independent.2. The attractor only takes up a finite sub-region of the phase space, because the DDS "m" has a spontaneous shrinking tendency during its evolution (Milnor, 1985), which means that the trajectory of "m" cannot fill the entire 3-D phase space.However, if variables in "m" are stochastic processes, "m" has a probability of appearing anywhere in the phase space, and this probability follows some distribution.3.As a finite subset of the phase space, the manifold has its own dimension, termed fractal dimension (Mandelbrot, 1983)-a special geometrical structure, which is often smaller than the dimension of the phase space.Simply speaking, in panel (e) left, when a time series is reconstructed into a certain dimensional phase space by PSR (ascending dimensions transform), some points that were the nearest neighbors in a lower-dimensional space no longer remain as the nearest as the dimension increases (like the three red points were reckoned as nearest neighbors in 2-D world, but not any more in 3-D world.Based on this, FNN method calculates the fraction of false nearest neighbor, and increases the dimension to see whether the fraction decrease: if it shows a clear decreasing trending (like the blue line in panel (e) right), it means there is clear deterministic characteristics of deterministic dynamic system, otherwise not (red line in panel (e) right).FNN method finds the optimal E when the fraction of FNN starts to be stable at a certain embedding dimension.Reader are directed to Kennel et al. (1992) for the rationale of FNN method.
For example, the Lorenz system is in a 3-D phase space, whereas its butterfly attractor has a 2.06 fractal dimension.Notably, only the manifold of an attractor has a deterministic fractal dimension, whereas not for the manifold of stochastic process.
The attractor is a collection of all phases that "m" could reach during its evolution, and in this sense, attractors can be understood as a way to summarize the system's dynamic information (Javier, 2020), including how variables causally interact in the system during its evolution.Also, not every attractor is as complex as a "butterfly": we provide another example of a simpler attractor in Figure 1b, considering only two variables (2-dimensional) that are linearly causal-linked, and their attractor is a "circle," implying that the trajectory of a DDS (i.e., its dynamic evolution) could also be stable and periodic.In extreme cases, the circle attractor will shrink into a fixed-point attractor, implying an extremely stable DDS, where the time series are relatively flat without large fluctuation.
The difference between butterfly, circle, and fixed-point attractors will be explained in the following sections.Sugihara et al. (2012) denoted a hallmark of DDS, termed "state-dependent," where a pair of variables are positively correlated at some times but appear unrelated or even negatively correlated at other times, that is, a time-varying correlation (Grziwotz et al., 2018;Lima et al., 2008;Sugihara, 1994;Sugihara & May, 1990).Explained in phase space, as the DDS "m" is evolving on its attractor, the correlation between its internal variables is also changing (i.e., time-varying), and the correlation coefficient depends on the position of "m" on this attractor (Deyle, May, et al., 2016;Petchey, 2016).The "state-dependent" hallmark explains why correlation is not the evidence of causality in a DDS: the causality is fixed by underlying equations, but the correlation is constantly changing.Such a time-varying correlation serves as an implication of DDS (Anderson et al., 2008;Tsonis et al., 2015;Ushio et al., 2018).

Takens Theorem, Phase Space Reconstruction (PSR), and Shadow Attractor
In reality, we can only obtain multiple discrete time series, but it is unknown whether there is a DDS with deterministic underlying equations connecting them, or they are simply stochastic processes.However, Takens Theorem (Takens, 1981) denoted that if a time series is from a DDS, an approximation of the deterministic attractor's manifold of this DDS could be restored from it.The method Phase Space Reconstruction (PSR) is applied for this purpose (Shu et al., 2021;Sivakumar et al., 2002;Tao et al., 2018).We turn to the Lorenz system to elucidate the procedure of PSR (Figure 1c): focusing on one of its time series X(t), we make three copies of it, but each copy has a time-lag of τ compared to its predecessor, that is, X(t), X(t τ), and X(t 2τ).The resulting set <X(t), X(t τ), X(t 2τ)> is embedded and reconstructed in a new 3-D phase space.Similarly, placing a small ball in this new phase space initiates movement, gradually forming a manifold akin to the Lorenz attractor <X(t), Y(t), Z(t)>: also a geometric "butterfly" (with slight distortion).
This reconstructed "butterfly" (marked as Mx) is referred to as the shadow attractor of the original (marked as Ms).
Takens Theorem ensures that such a shadow attractor is a diffeomorphism of the original.As diffeomorphism, shadow attractors have two important features: they "replicated" the dynamic information from their original attractor (Rosenstein et al., 1994;Uzal et al., 2011;Wasserman et al., 2022), and they share similar fractal dimension with their original attractor.Likewise, a shadow attractor My can also be reconstructed from Y(t), or Mz from Z(t): they are all diffeomorphic to Ms. Figure 1d provides an intuitive grasp of diffeomorphism via common geometric figures, but since the concept of diffeomorphism is complex, this research shall not involve detailed discussion; in this study, readers are only expected to understand the above two important features of a diffeomorphism.
It is worth noting that, to guarantee a diffeomorphism reconstructed during PSR, the dimension of the reconstructed shadow attractor, that is, the embedding dimension E (the number of copies for one time series, e.g., E = 3 in the shadow attractor of Lorenz system) should be given an optimal choice (Takens, 1981); and usually, an optimal E satisfies that E ≥ 2D + 1, where D is the dimension of the original DDS (the number of variables in this DDS, also called the complexity of the DDS, e.g., D = 3 in Lorenz system).Additionally, another parameter in a shadow attractor: time-lag τ (step length of the historical trace of X(t)) should also be selected carefully (Wang et al., 2019).A rather small τ would cause information overlap, whereas a rather large τ could render information loss (Kim et al., 1999).Here is a straightforward interpretation of τ: when τ = 1, one point on the shadow attractor Mx is <X(t 1 ), X(t 2 ), X(t 3 )>; when τ = 3, it will be <X(t 1 ), X(t 4 ), X(t 7 )>, etc.

Water Resources Research
10.1029/2023WR034616 ZHAO ET AL.
There are several methods to find the optimal E and τ, and in this study, we apply two methods: False Nearest Neighbor method (FNN, Figure 1e; Kennel et al., 1992) and autocorrelation function, to determine E and τ respectively.Readers are directed to Yasmin and Sivakumar (2018), Adenan et al. (2017) for detailed calculating metrics.It is worth noting that in the case of Lorenz system, its original and shadow attractors are both butterfly manifolds with the same dimension of three (i.e., E = D = 3), but in fact, the dimension of the shadow attractor is not necessarily equal to the original one to satisfy diffeomorphism: they could have different dimensions (E≠D).
In general, Takens Theorem, that the original deterministic dynamic attractor could be restored from its time series, serves as one important principle of verifying the DDS nature of a single variable: if a variable is from a DDS, a shadow attractor diffeomorphic to the original is supposed to be restored from it.However, as mentioned, a manifold in phase space could be reconstructed from any time series including stochastic processes via PSR.
According to Taken Theorem, only the time series from a DDS will have an optimal E, distinguishing themselves from stochastic processes.Thus, the existence of an optimal E serves as a piece of evidence that the targeted variables (time series) are from a DDS.A simplified schematic of FNN and instructions for determining an optimal E are provided in Figure 1e (Rhodes & Morari, 1997), with a comprehensive introduction of FNN available in Text S2 in Supporting Information S1.
Meanwhile, a shadow attractor should have a similar fractal dimension with its original attractor, and as mentioned, the manifold of a real attractor should have a deterministic fractal dimension.Therefore, we could calculate the fractal dimension of a shadow attractor with the method of Correlations Dimension (CD; Kawamura et al., 1998;Sivakumar, 2004Sivakumar, , 2014;;Wang & Gan, 1998), and for a manifold reconstructed with a time series from stochastic process, no stable result of CD will be calculated; that is, we could conduct multiple computations of CD, and if considerable discrepancy exists among the results of each calculation, this time series can be regarded as arising from a stochastic process.In this sense, the stable CD value of a shadow attractor also serve as a proof of DDS nature.In light of the complexity, the concept of fractal shall not be discussed in detail in this research, and readers are directed to Ahmadi et al. (2011), Grassberger andProcaccia (1983), andProcaccia (1988) for further information.

Chaos of DDS
Apart from the reconstructed shadow attractor with optimal E and its stable value of CD, a special property of DDS termed chaos (Ghorbani et al., 2018;Sivakumar, 2000Sivakumar, , 2009Sivakumar, , 2017) ) can also help to verify the DDS nature of targeted variables.As illustrated in Lorenz system (Figure 1a), its original attractor takes the form of a butterfly; by contrast, an attractor could also be very simple, for example, the "circle" in Figure 1b.The "butterfly" is termed a strange attractor, while the "circle" is termed a common attractor, and the extreme case of common attractor is fixed-point.In short, DDS with a strange attractor is regarded as a chaotic one, while circle and fixed-point common attractors indicate non-chaotic DDS.The strangeness of an attractor is reflected in its geometrical structure: simply speaking, the trajectory of "m" in a common attractor is periodic and predictable (one closed line), whereas in a strange attractor, the trajectory will be aperiodic and unpredictable (there are usually countless trajectories).Accordingly, the time series from a common attractor is periodic, while the time series from a strange attractor is aperiodic.
The strange attractor of a chaotic DDS has a peculiar feature: the rate of separation between two infinitesimally close trajectories on it grows exponentially, rather than converging.For example, in Figure 2a, two points "p" and "q" were initially close-located on the attractor.With time marching, their trajectories are initially similar, but after a while, they start to drift apart, and their position on attractor will be greatly different.By contrast, if "p" and "q" is in a common attractor like a circle (Figure 2b), their distance will remain unchanged as time goes on.In the extreme case of a fixed-point attractor, the trajectories of "p" and "q" will converge until extremely stable (Figure 2c).The separation or convergence of "p" and "q" in an attractor could be reflected on its time series as a divergent or convergent trend, and Lyapunov exponent (λ) is used to evaluate the convergent/divergent trend on individual time series (Di et al., 2019;Wolf et al., 1985): a time series with a positive value of λ indicates a divergent trend as in Figure 2a; a time series with λ = 0 indicates a stable trend as in Figure 2b; and a time series with λ < 0 indicates a convergent trend as in Figure 2c.A DDS usually has multiple time series, and each time series has a λ.If this DDS is non-chaotic, all of its time series should have a λ ≤ 0. However, for a chaotic DDS, at least one positive λ is supposed to exist out of all the λ(s); that is, λ ≤ 0 might also be calculated from some of its time series, but their existence does not affect its overall chaos.

Water Resources Research
10.1029/2023WR034616 ZHAO ET AL.
Besides, the numerical values of CD can also be used to distinguish a chaotic DDS from a non-chaotic one.A chaotic DDS often has a finite non-integer CD value (e.g., the fractal dimension of the butterfly manifold in Lorenz system is 2.06), while a non-chaotic DDS often has an integer CD (e.g., the fractal dimensional of circle attractor is 1).It is worth noting that, however, the shadow attractors reconstructed from different time series in the same DDS might have different value of CD, and some shadow attractors of a chaotic DDS might even have an integer CD, since the behavior of individual time series could be different even though they are from the same DDS.
A time series from a DDS, irrespective of chaotic or non-chaotic, will have a stable λ, whereas the λ of a time series from stochastic process is non-stable.This is similar to CD: irrespective of positive or negative, as long as the stable value of λ is calculated from a time series, it suggests that this time series is from DDS; otherwise, it is likely to be stochastic.Simply put, the existence of optimal E, the existence of stable CD and λ all serve as principles of confirming the DDS nature of a time series, distinguishing it from stochastic process.
Although we could distinguish a DDS from stochastic process simply based on stable values of CD and λ, it is still necessary to further investigate whether it is a chaotic DDS or a non-chaotic one based on the specific values of CD (integer or non-integer) and λ (positive or negative), since chaos might bring extra challenges to the identification of the system's nature.Chaos could be easily oversimplified and confused with stochastic process (Chang et al., 2017;Ma et al., 2014Ma et al., , 2018;;Stavroglou et al., 2020), since they would both produce aperiodic, irregular, and seemingly unpredictable time series, which are hard to distinguish (Bowers et al., 2012;Hsieh et al., 2005;Sangoyomi et al., 1996).Chaos emerges in a DDS when nonlinearity plays a significant role in this system (May, 1976;Ombadi et al., 2021); however, it does not change the deterministic nature of this system, irrespective of the magnitude of its nonlinearity; that is, chaos is fundamentally deterministic, distinguishing itself from stochastic process (Chennaoui et al., 1990;Elshorbagy et al., 2002;Karunasingha & Liong, 2018;Khatibi et al., 2012).
In reality, the targeted time series with causality commonly exhibit aperiodicity and irregularity, and thus understanding how chaos is generated within a DDS carries additional importance in avoiding the pitfall of conflating it with stochastic behavior, enhancing the rationality of applying CCM and improving the reliability of corresponding causal results from CCM.Additionally, as previous studies have noticed that chaos might affect soil-atmosphere interactions (Koster et al., 2004(Koster et al., , 2006)), the investigation of chaos in this study would also help us to see if the causal detection with CCM will be affected when the target variables are in a chaotic DDS. Figure 2d provides an overall schematic of distinguishing stochastic and DDS, and chaotic and non-chaotic DDS via CD and λ.

Measure of Causality in DDS: Convergent Cross-Mapping (CCM)
Standard CCM was invented by Sugihara et al. (2012) to detect the non-separable causality between variables in a DDS.Based on standard CCM, Ye et al. (2015) proposed an extended CCM which covers the deficit of the standard type, and detects the causal response time.

Standard CCM
Similar to the analogy that the original attractor is the "summary" of a DDS, we can think as Mx and My as summaries of X(t) and Y(t), and CCM examines the causality between X(t) and Y(t) based on Mx and My (Mønster et al., 2017; Figure 3a).Standard CCM has two key requirements: Cross-mapping and Convergence (Tsonis et al., 2018).

Cross-mapping:
The basic idea of Cross-mapping detecting causality is to use one shadow attractor to estimate the other (Figure 3b): 1. On Mx, for one single point X t′ , use a minimum Euclidean distance to find its three nearest neighbors X tα , X tβ , X tγ .2. According to the same timestamp, find the counterparts of X tα  3b for a clearer visualization.What is counter-intuitive about CCM is that it uses "effect" to estimate "cause": CCM believes that only the information of "cause" gets stored in "effect," not vice versa, so that it is only able to better estimate the nearest neighbors of "cause" by "effect," not vice versa; that is, the mapping skill will only be good after Cross-mapping from "effect" to "cause."Building on this idea, the mapping direction is reverse to the causal information direction in CCM: if the nearest neighbors in Mx remain the "nearest" after cross-mapping to My (X → Y), it means that the causal information has been transferred and stored from Y ("cause") to X ("effect") (Y → X ).An illustration of this idea is presented in Figure S1 in Supporting Information S1.In this sense, CCM is different from the idea of GC: X causes Y if we can predict Y better given X.
Convergence: For variables with causalities in DDS, the longer our observation period is (the more data gathered), the better one variable will be estimated from the other.Explained by attractor, we can imagine the attractor will get denser with more observation (i.e., longer time series), since the system (e.g., "m") will gradually fill in the gaps in the manifold.If the observation was short, the manifold of an attractor would be sparse with lots of gaps, and the nearest neighbors found on it would be inaccurate, since the real nearest neighbors might be in those gaps.
Stated another way, as manifold goes denser, we can expect the accuracy of estimated Ŷt′ from Mx to improve.

Water Resources Research
10.1029/2023WR034616 ZHAO ET AL.That is, there will be a convergent tendency of mapping skill as the time series goes on.On the other hand, if one variable has no causality on the other, improving their manifolds (i.e., prolonging the time series) will not translate to improvement in estimation (i.e., non-convergence).Therefore, CCM requires enough samples from the target time series to form an abundant "library" to illustrate the convergence.Figure 3c illustrates the convergence (blue line) and non-convergence (red line).The final convergent value indicates the strength of causality (e.g., blue line is convergent to 0.8 in Figure 3c).Besides, we need to use stochastically generated series as surrogate data, apply CCM on them, and compare the CCM results with those based on original data.This is a necessary procedure for the significance test of CCM.Details are provided in Text S3 in Supporting Information S1 (Bonotto et al., 2022).
Repeat the above procedures of Cross-mapping and Convergence from the other direction Y → X, and thus the causality from both directions X → Y and Y → X could be detected.The causality between X and Y can be unidirectional (Figure 3c) or bidirectional (Figure 3d).When X(t) and Y(t) do not share a same DDS, standard CCM will only return two non-convergent lines.Readers are directed to the animation provided by Sugihara Lab (2015) for the most straightforward understanding about CCM.

Extended CCM
It is worth noting that the above standard CCM detects an instantaneous causality, but in reality, the expression of causality from one variable to another usually takes time and/or will last for some time: x t+T = f(x t ,y t ), that is, the possible causal response time.Thus, the performance of CCM using the instant value of X estimating Y: X(t) → Y (t), might not be as good as that of using the historical value of X estimating Y: X(t) → Y(t + T ).That is, the best performance of CCM (i.e., maximum of CCM skill) might not lie in the instantaneous causality (where time-delay is zero).Given this, Ye et al. (2015) proposed an extended CCM considering several time-delays between two sample variable time series.The basic idea of extended CCM is similar to that of cross-correlation, which is to consider some "dislocations" between the 1:1 correspondence of two time series when applying CCM: to see if these CCM skills would be different, and the time-delay with the maximum of CCM skill is estimated as the possible causal response time.It is worth noting that in some cases, there might be multiple highest CCM skills with similar numerical values in a row, together forming a highest "plateau" (e.g., 0, +1, +2, and +3 time-delays all have the max-values).Such a case implies the causal information will last for a while after occurring (a persistence of causality).
In addition, Sugihara et al. (2012) denoted a defect of standard CCM: when the nonlinear connection between variables reaches a certain strength, the targeted variables are referred to as nonlinearly strong-coupling, and standard CCM would be "deceived" in such a case.Specifically, when X and Y are strong-coupled, but X receives unidirectional causality from Y and no causality from X to Y in turn, standard CCM will still detect a causality from X to Y with similar strength to that from Y to X, and this "fake" causality could even pass the statistical significance test.With such a fake causality, CCM erroneously infers that the causality between X and Y is bidirectional (X⇄Y), but in fact, it is only unidirectional (Y → X ).That is to say, under the context of strongcoupling, when we find a standard CCM result like Figure 3d, it is still hard to tell whether it is a true bidirectional causality (both X → Y and Y → X are true), or only a unidirectional causality (either X → Y or Y → X, one of them is fake).Ye et al. (2015) provides a simple solution to distinguish them in extended CCM: apart from the positive time-delays in Equation 2, some negative time-delays should also be considered: Positive time-delays in Equation 2 indicate the causal information from a past value of X to a future value of Y (Figure S2b in Supporting Information S1), whereas negative time-delays in Equation 3 indicate the causal information from a future value of X to a past value of Y (Figure S2c in Supporting Information S1).Based on the CCM results of both positive and negative time-delays, extended CCM proposed that, if the best CCM skill (maxvalue) appears in the negative time-delays, it means that CCM detects some causal information from a future value to a past value, which philosophically breaks the law that "the cause always precedes the effect in time," and therefore, such a causality from "future" to "past" is a fake one.According to extended CCM, the true causality could only show up in zero (instantaneous causality) or positive time-delay zone.In practice, using extended CCM, we should first determine the position of the best CCM skill, and if it is in the negative time-delay zone which means that this causality is a fake one, there is no need to further consider its possible causal response time.
The schematic of extended CCM is shown in Figure 3e, which is basically a collection of standard CCM results in different time-delays, and two corresponding results are illustrated as template in Figures 3f and 3g: when the max-value of both lines are shown in positive time-delays zone (Figure 3f), it means that the causality from both directions (X → Y and Y → X ) are true, and thus the causality in X → Y is indeed bidirectional; when the max-value of one line is shown in negative time-delays zone whereas the other one is in positive time-delays zone (Figure 3g), it means that only one causality (i.e., blue line) is true whereas the other one (orange line) is not, and thus the causality in X → Y is unidirectional.It is worth noting that the max-value of a fake causality could still be statistically significant (like the yellow dots in Figure 3g).Readers are directed to Ye et al. (2015), Huang et al. (2020), andCoufal et al. (2017) for further understanding, including why it indicates a fake causality when the max-value of CCM skills shows in negative time-delays zone.

Research Areas and Data
This research involves P, SM(s) (in different layers), and ET as the targeted variables.The data are from the ERA5-land global reanalysis data set (Muñoz-Sabater, 2019a, 2019b) with a spatial resolution of 0.1°, including both the monthly and daily (a) volumetric soil water (SM from layer 1-4 (0-7 cm, 7-28 cm, 28-100 cm, and 100-289 cm beneath the surface respectively), (b) total precipitation (P), and (c) total ET data.The monthly data are selected during 1950-2022, 876 months in total; The daily data are selected during 1959-2022, 23,356 days in total.The reason for using ERA5-land global reanalysis data set is that, although there are a large number of ground weather station data and satellite monitoring data, they are not evenly distributed geographically, and some of them were not observed (e.g., 0), or mistakenly observed (e.g., marked as ±99,999).These raw data may have stochastic noise in time series, which could distort the nonlinear dynamic information within and affect the accuracy of our results.Using Kalman filtering, different kinds of estimation and interpolation, ERA-5 reanalysis data set could retain the features of original nonlinear dynamic information in time series as much as possible, making it a good material for causal detection with CCM.
This research will focus on the Pearl River Basin (PRB, China, 21°31′-26°49′N, 102°14′-115°53′E) as the main research area, and for a universal applicability, we also take another three research areas: Wei River Basin (WRB, China, 33°30′N-37°31′N, 103°30′E-110°30′E), Nebraska (America, 40°20′-43°50′N, 96°10′-104°30′W), and Medjerda River (Algeria, 35°30′N-37°40′N, 7°30′E-10°00′E) as supplementary, to see if the methodology applied in this research would remain effective for SM-P coupling in different areas, and what are the similarities or differences among these different climatic conditions including arid (Nebraska), semiarid (WRB; Medjerda River), and humid (PRB).The gridded data are aggregated to the basin scale by taking the spatial average of the gridded data for each research area, and therefore, within one area, each variable (P, SM1-4, or ET) will only have one single corresponding time series.Due to the complication of causal detection and analysis and the CCM methodology per se, at this stage, we shall only focus on four areas instead of conducting an analysis of the spatial distribution pattern of causal on a large scale (e.g., globally), and we shall consider a broader analysis if this research is proven successful in these four areas.
The reason for specifically choosing these four areas for research instead of others is that, compared with local climatic conditions, we hypothesize that the characteristics of underlying DDS (i.e., chaotic or not) might be a more significant factor affecting causal detection.All of these four areas happen to have been conducted research related to chaos using PSR, λ, and/or CD, regarding relevant hydro-meteorological variables (P, SM, or ET), or the causal detection with CCM between the variables (Dai et al., 2020;Di et al., 2019;Kardi et al., 2020;Shi, Zhao et al., 2022;Zhou et al., 2021bZhou et al., , 2021c)), which, therefore, could be helpful for evaluating and validating our finding, so that our conclusions would be corroborated and universal when compared with previous studies.In short, this research is currently designed as theory-oriented; and this research can be conducted on a broader scale (e.g., globally) if the validity of our theory can be assessed based on these four areas.

Results and Discussions
In Section 5.1, we will demonstrate how to recognize and confirm the DDS nature from the targeted time series P, SM(s), and ET (e.g., the time series of all targeted variables in PRB are illustrated in Figure 4a), followed by the identification of chaos, as the preconditions for applying CCM.From Section 5.2 to 5.6, we will detailly demonstrate how to detect causality among the targeted variables via standard and extended CCM.

DDS Nature and Chaos
In the summary of Tuttle and Salvucci (2016), we noticed that some past studies regarding the relationship between SM and P had findings similar to the "state-dependent" hallmark of a DDS: the feedback signs and statistical strengths between SM and P varies over the same regions, even using the same data (Alfieri et al., 2008;Ferguson et al., 2012;Guillod et al., 2014;Zeng et al., 2010).For example, during 2002-2011 period, Welty and Zeng (2018) found that the correlation between antecedent SM and subsequent afternoon P in U.S. Southern Great Plains varies in magnitude, and shifts between positive and negative.Therefore, in our study, we use Pearson correlation analysis with a sliding-window (12-month length) to briefly illustrate the time-varying correlation between the time series of P and SM in layer-1 (SM1) in four areas, from 1950 to 2022, 876 months in total.As shown in Figure 4b, in four areas, the numerical values of the Pearson correlation coefficient between P and SM1 constantly vary as time goes on, that is, a time-varying pattern, suggesting a "state-dependent" feature of DDS.
Particularly, the time-varying patterns in WRB and Nebraska are the most obvious, as their correlation coefficients oscillate between +1 and 1.The time-varying correlations suggest a "state-dependent" feature and an underlying DDS nature for P, SM(s), and ET in each area, and also denote the fact that correlation-based analysis cannot be applied for causal inference.
Table 1 summarizes the results of optimal E and τ, CD, and λ of all variables in four areas.FNN and autocorrelation function calculate that E = 6 and τ = 3 are the optimal embedding dimension and time-lag for the reconstruction of most of the time series in four areas.Based on FNN, we clearly found the existence of optimal E for all variables, indicating an underlying deterministic dynamic attractor.Moreover, the results suggest that P, SM(s), and ET within one research area share similar E and τ (most of them are 6 and 3, respectively), which means that it takes similar parameters E and τ in shadow attractors to reserve the original dynamic information, and implies that these individual variables potentially share the same underlying DDS (arise from the same original attractor).Regarding the CD and λ of three types of variables (P, SM, and ET) in all four areas (Table 1), all of them are stable, indicating a DDS nature underlying these variables.In summary, the existence of optimal E, stable values of CD and λ found among the time series of P, SM(s), and ET in four areas, which recognized and confirmed their DDS nature, distinguishing it from stochastic process, allowing us to further apply CCM to detect their causality.
After finding the stable values of CD and λ that verify the underlying DDS nature, we could further infer the chaos of the DDS based on the specific values of CD and λ.As shown in Table 1, most of these CDs are finite noninteger, indicating chaotic behaviors.Only the CD of P in PRB is an integer: 2.00 (marked in red in Table 1), but this does not affect the overall chaos of their underlying DDS.In terms of λ, focusing on all variables in one area, both positive and negative λ(s) are found, indicating a chaotic DDS of all variables in this area.The coexistence of positive and negative λ(s) is found in all four areas, and specifically, most of SMs in deeper layers (SM3/4) have negative λ(s) (marked in red in Table 1, except the SM3 in Medjerda River with a positive λ), whereas P, ET, and SMs in shallow layers (SM1/2) have positive λ(s) in all four areas.In summary, chaotic DDS nature is identified for four areas.
Finally, it should be emphasized that not all time series from a chaotic DDS have λ > 0, and a time series with λ < 0 is not necessarily from a non-chaotic DDS with fixed-point attractor.In our study, we found that most SM3 and SM4 (i.e., SM(s) in deeper layers) in four areas are found to have λ < 0, indicating a convergent trend on them.However, the shadow attractors reconstructed from SM3/4 still have finite non-integer CDs indicating chaotic behaviors.This is because that CD could still take a fractal value despite the existence of negative λ (Molz & Hyden, 2006) and identify the chaotic behaviors of SM(s).Thus, combining CD and λ, we could infer that the underlying DDS in each area is chaotic, only that SM3/4 convergent trend; or we could rephrase as despite the convergent trend of SM3/4, their underlying DDS is still chaotic.Essentially, CD and λ evaluate chaos from different perspectives, which are not comparable, and in terms of the identification and evaluation of chaos, it is emphasized that a combination of various methods is needed in practical research, since a single method may not fully reveal the nature of chaos.This is also proven by P in PRB, where P has an integer CD (2.00) but with λ > 0 (0.68).Also, different variables have different natures, which is another the reason why various methods should be applied in combination instead of only one (Di et al., 2023).But limited by the length and scope of this study, only these two methods are employed to investigate chaos in this research.

Causality in SM-P Coupling
After proving the DDS nature of targeted variables, in this section, we use CCM to investigate the causality in SM-P coupling, using the shadow attractors reconstructed with the optimal E and τ.The analysis of CCM is conducted in four research areas between the monthly data of P and SMs in each layer (1-4).First, standard CCM is conducted without considering time-delay to detect the instantaneous causality (0 time-delay) in SM-P coupling.As stated, standard CCM determines the existence of causality if there is a convergence in mapping skill, but meanwhile, it risks fake causality.Given this, temporarily, only the CCM results between SM1 and P in four areas are shown, only to illustrate the signature of convergence (Figure 5), and more thorough analysis for SMs in all layers and P will be conducted with extended CCM later to address the risk of fake causality.
Take Figure 5a (PRB) for example,: the blue line indicates a convergent tendency of mapping skill with the increase of time series, which characterizes the existence of causality from P to SM1; the convergent red line indicates the existence of causality from SM1 to P. Based on Figure 5, standard CCM infers that there is a bidirectional instantaneous causality between P and SM1 in PRB, WRB, and Medjerda River, since the blue and red lines are both convergent to a certain value above 0 in Figure 5a, 5b, and 5d.However, Figure 5c suggests that, the instantaneous causality in SM1-P in Nebraska is unidirectional from P to SM1, since only the blue line (P → SM1) is convergent (around 0.3), but the red line is around 0.
Considering the risk of fake causality from a strong-coupling situation, it should be noticed that in Figure 5a, 5b, and 5d, it seems that the instantaneous causalities exist in both directions (P → SM1 and SM1 → P) given that both red and blue lines are convergent to a value above 0, but yet the possibility remains that one of the instantaneous causalities (either P → SM1 or SM1 → P) is a fake one, meaning that the real causality between P and SM1 might actually be only a unidirectional one.Meanwhile, it should also be noticed that, even though the instantaneous causality in Nebraska (Figure 5c) seems not to exist according to standard CCM, its real causality might be found with certain time-delays.Therefore, it is necessary to conduct extended CCM.
Both negative and positive time-delays from 8 months to +8 months are considered in extended CCM.The specific calculation is that, the CCM will be applied between SM and P with time-delays of 1-month, 2-month, …, and 8-month respectively, and for each application, the final convergent value (i.e., CCM skill like those in Figure 5) will be collected to illustrate.A positive time-delay in CCM in P → SM represents the causality from past P to future SM, whereas a negative one represents the causality from future P to past SM, which are defined as Equations 4.1 and 4.2) respectively:  Note.CD takes a fractal value despite existence of negative Lyapunov Exponents, for example, SM3 and SM4 in most areas (Molz & Hyden, 2006).The data marked in red are analyzed in research.
The same principle applies to SM → P, which are defined as Equations 5.1 and 5.2 respectively: From a philosophical perspective, "cause" always happens before "effect" chronologically; that is, a real causality is not supposed to exist in negative time-delays.Therefore, as long as the max-value of CCM skill shows up in 0 or positive time-delay zone (in Equation 4.1 or Equation 5.1), it can be regarded as a true causality, in correspondence with the logic of "cause happens before effect."However, when it shows up in negative time-delay zone (in Equations 4.2 and 5.2), it is regarded as a fake causality from a unidirectional strong-coupling case, since it is impossible and illogical for the causal information to flow from future to past.Simply speaking, extended CCM focuses on the position of the max-value of CCM skill.
Figure 6 shows the results of extended CCM with different time-delays between P and SM in each layer (1-4) in four areas.Take the results in PRB as an illustration (Figures 6a-6d): focusing on the positions of max-values of CCM skills, that is, the signatures of extended CCM results, we found that, for the CCM results between P and shallow SMs (layer 1-2), the max-value of CCM skill in P → SM (blue line), and that from the other way around SM → P (red line), are both located 0-month and 1-month in positive time-delay zone (Figures 6a and 6b), which indicates that the causalities between P and SM in shallow layers (1 and 2) are mutual (i.e., real bidirectional causality instead of unidirectional strong-coupling case, similar to the template in Figure 3f).On the other hand, the signatures of the extended CCM results between P and deeper SMs (layer 3-4) indicate that P has a real Red and blue lines in one subplot indicate the instantaneous causality between two variables with reverse directions respectively.It is worth noting that merely based on such signatures, it is unable to distinguish real bidirectional causality between two variables from unidirectional strong-coupling (i.e., one of the convergent lines might be a fake causality).Therefore, extended CCM is needed for the following analysis.
unidirectional causality to deeper SMs, given that the max-values of CCM skills from P to SM3 and SM4 are found in 0-month and 1-month in positive time-delay zone (blue but that from the other way around is located in 1 month in negative time-delay zone (red line), which means that the causalities in SM3 → P and SM4 → P are fake, and thus, deeper SMs are only receiving a unidirectional causality from P (Figures 6c and 6d) without rendering causality in turn to P, similar to the template in Figure 3g., it can be distinguished between the real bidirectional causality and unidirectional strong-coupling.Besides, the causal response time and its lasting time can be also detected.For example, in panel (a), the causality in SM1-P in Pearl River Basin is analyzed: the max-value of orange line lies in the +1 in positive time-delay, indicating that SM1 would render the maximum strength of causality to P 1 month after it, which means that the time-delay between SM1 and P is 1 month.Similarly, the max-value of blue line also shows up in +1 in positive time-delay, which suggests that the causality from P to SM1 also has a time-delay of 1 month.In general, extended CCM focuses on the position of max-value of CCM skills: negative or positive time-delay zone.
After conducting the same procedure of extended CCM in the rest three areas, we found that they have similar signatures of CCM results to PRB, all shown in Figure 6.For a summary, in four areas, P has bidirectional causalities with shallow SMs (layer 1 and 2) but unidirectional causalities toward deeper SMs (layer 3 and 4).It can also be rephrased: while P has a causality toward each layer of SMs, only SMs in shallow layers could have a reverse causality to P. Such a finding is consistent with that of Cao et al. (2021).Also, it is worth noting that, despite the fact that the instantaneous causality from SM1 to P in Nebraska is not clear, as suggested by the red dots convergent to 0 in Figure 5c, its real causality actually lies in the time-delay of 1-month, as suggested by the orange line is around 0.3 in 1 time-delay in Figure 6i.
We can also infer the possible causal response time from the extended CCM results.For example, in Figure 6d, PRB, the max-value of blue line shows in 1-month positive time-delay, which suggests that the possible causal response time of SM4 to P in PRB is 1 month.However, we notice that the max-values of blue lines found in Figures 6a-6c all seem to appear in both 0-month (instantaneous) and 1-month time-delay, with very little discrepancy, and given that the CCM skills in further time-delays (2-month and more) are much smaller than the max-values in 0-month and 1-month, we assume that the real causal response time of SM to P lies in somewhere middle between 0 and 1-month; that is, it is supposed to be measured in daily units: less than 1 month.Such a signature can be also found on orange lines which indicates the causal response time of P to SM.However, due to that the temporal resolution in CCM results is 1-month in this section, we shall use daily data to detect the more precise causal response time later.
To make sure that the causality with time-delays between P and SM detected by extended CCM are statistically significant, we assess the statistical significance of max-value of CCM skills with time-delays using surrogate data.For example, Figure S3 in Supporting Information S1 uses box plots to show that the CCM results are significant in SM-P coupling in PRB compared with the surrogate time series.It is worth noting that, however, given the impacts of strong-coupling, only using surrogate data for significance test is not enough; this is because that such a fake causality could also pass the significance test given the defect of standard CCM.We must conduct extended CCM with time-delays.In other words, compared with testing its statistical significance, the position of max-value CCM skill detected by extended CCM is an even more important evidence: it proves the authenticity of causality.

Causal Response Time in SM-P Coupling
Due to the temporal scale of monthly data, we have only inferred that the time-delay of causality in SM-P coupling is supposed to be less than 1 month in last section.To detect more precise causal response time, in this section, daily data is used in extended CCM.Since only the shallow layers of SM was found to have causality on P, this research uses daily data of SM1 and P. It is worth noting that when the time series of P is on a daily basis, there are too many 0 samples consecutively existing in the time series, especially in autumn and winter (i.e., it does not rain every day), which will affect the accuracy of PSR and CCM, since the dynamic information will be broken when a time series involves too many 0 samples continuously.Besides, CCM requires the samples of time series (i.e., library) to be abundant enough to thoroughly retrieve the original attractor in order to fulfill the requirement of convergence; otherwise, the reconstructed shadow attractor with insufficient samples will be very "sparse," and the nearest neighbors found on it will be inaccurate.Previous research suggested that the selected time series for CCM should be more than 100 samples.Based on these, this research selects the time series of P and SM1 in spring and summer (from March to August) when precipitation is relatively abundant and there are fewer 0 samples in the time series of P, and the library size (the total number of samples in time series used in PSR) includes at least 100 samples (X t1 :X t100 ), up to 180 samples (X t1 :X t180 ).
In PRB, we conducted CCM on the time series in those years where the precipitation is relatively abundant in spring and summer, and we found that the results of CCM in most years follow a similar pattern: as shown in Figure 7a, the max-values of CCM from SM1 to P form a "plateau" at [0, 1, 2, 3, 4], indicating that the causality from SM1 to P is a 4-day distributed delay; in other words, the causality from SM1 to P can last up to 4 days.On the other hand, the causality from P to SM1 only lasts for 2 days, indicating that the causality from P to SM1 is prompter.Likewise, for most years in WRB (Figure 7b), the causality of SM1 → P could last for 5 days, and the reverse causality (P → SM1) is only up to 3 days.In Nebraska (Figure 7c), the causality of SM1 → P is up to 6 days, and the reverse causality (P → SM1) is up to 4 days.In Medjerda River (Figure 7d), the causality of SM1 → P is up to 5 days, and the reverse causality (P → SM1) is up to 4 days.Using another information-based method for the feedback in SM → P, Lou et al. (2021) found a similar time-delay of 3-10 days in summer in Illinois, denoting that the causal response time of SM → is indeed on a daily scale, corroborating our findings.
It is important to note that extended CCM does not provide the exact causal response time, but rather an estimate.
In addition, given the defect of standard CCM regarding strong-coupling cases, not every time-delay with a final convergent value >0 implies causality, even though they might all pass the significant test with surrogates.Therefore, extended CCM appeals to the focus on the position of the max-value of CCM skills, which could imply two important things: (a) It conforms to the philosophical law that "cause" happens before "effect," so it implies a true causality.(b) It also implies a potential causal response time.For those positive time-delays with a CCM skill >0 but not the highest value, it might suggest that, in reality, the causal information flow from "cause" to "effect" can always happen, which is not strictly concreted in a specific time point (with highest CCM skill), but only the time point with the highest CCM skill has the strongest causal information flow.Anyway, according to Ma et al. (2017), in this study, we only regard those time-delays with the highest value of CCM skill followed by a clear declining slope as an estimate of the causal propagation time, that is, the time witnessing the strongest causal information flow.

Causal Strength of SM → P and P → SM
Previous sections mainly focused on the convergence in CCM results which indicates the existence of causality (in standard CCM), and the position of the max-value of CCM skill which distinguishes the true causality from a one and detects the causal response time (in extended CCM).Besides these two signatures, the final convergent value of CCM skill (of a true causality) is also important: it numerically indicates the relative strength of causality.As shown in Figures 6 and 7, we could spot an interesting feature of CCM results for SM-P coupling, that some of the final convergent values in SM → P (yellow dots on orange lines) are larger than those from the other direction P → SM (blue dots on blue lines), implying that the strength of causality from SM to P is slightly greater than those from P to SM.Such a finding seems counter-intuitive.In general, during a precipitation event, water enters the soil as the recharge by infiltration, and P serves as a direct "cause" (recharge) to SM: with higher rainfall, more water is likely to enter the soil.This physical process (mainly infiltration) occurs very quickly and promptly, in a relatively short time frame (hourly).By contrast, in such a short temporal frame, the impact from SM to P might not even be identified; that is, the non-separability in SM-P coupling might not hold.In this sense, naturally, a higher strength of causality is expected for P → SM compared with SM → P.
It is worth noting that, however, the analysis perspective of this research is not fixed on the single precipitation event; instead, we focus on the causal interaction and feedback between P and SM in the long run; that is, this research is conducted under a much larger temporal context: the time series applied in this research is either on the monthly or daily scale.Under the long-term scale, the interaction and feedback between P and SM are interdependent and non-separable, so the finding "SM → P > P → SM" in terms of causal strength is logically reliable from the perspective of DDS.However, the factors influencing the relative strengths between these two reciprocal procedures SM → P and P → SM remains unknown at this stage, which requires further investigation based on more research areas.We assume that these factors might include regional climate conditions, especially the type of precipitation (e.g., heavy or shower rainfall), and the type of soil (e.g., sandy or clay soil).

Causalities Among SMs in Different Layers
This section detects the causalities among SMs in each layer (i.e., 1↔2, 1↔3, 1↔4, 2↔3, 2↔4, 3↔4, six types in total).We will use PRB as the main area for illustration, and then provide a summary of all four areas.Figures 8a-8f shows the results of extended CCM with different time-delays among SMs in each layer (1-4) in PRB.
The analysis is similar to the previous section, which uses extended CCM to find the positions of max-values of CCM skills.For SM1, we found that the causality between SM1 and SM2 (Figure 8a) and that between SM1 and SM3 (Figure 8b) both seem to be mutual, whereas the causality between SM1 and SM4 is unidirectional from layer 1 to 4 (SM1 → SM4; Figure 8c); only notice that the max-value of CCM skill from SM3 → SM1 (Figure 8b) is not exactly in the positive time-delay zone, but actually lies in 0 time-delay (i.e., the instantaneous causality).For SM2, the causality between SM2 and SM3 also seem to be a mutual one (Figure 8d), whereas that between SM2 and SM4 is unidirectional from SM2 to SM4 (Figure 8e); similarly, notice that the max-value of CCM skill from SM3 → SM2 (Figure 8d) is also not exactly in the positive time-delay zone but in 0 time-delay.For SM3, the causality between SM3 and SM4 (Figure 8f) is unidirectional.Finally, for SM4, it seems to have no causality to any upper layer of SMs but only receive causalities unidirectionally from them.As mentioned, it is the position of the max-value of CCM skills that extended CCM focuses, and the results demonstrate a general tendency: the causality always exists from a shallow SM toward a deeper-layered one, which is indicated by the fact that the max-values of blue lines always exist in 0 or positive time-delay zone; whereas the causality from the other way around seems to less obvious with the layer going deeper, especially when the SM in layer 4 barely has no causality upwards, indicated by a general tendency that the max-values of orange lines moves toward negative time-delay zone.In short, downward causalities in SMs outstrip the upward ones.
Previously in SM-P coupling, we found that only SM1/2 has causalities to P, but no causalities were found from SM3/4 to P. In this section, we might have an explanation for this from the interactions among SMs: SM3/4 can hardly have causalities upwards to SM1/2; otherwise, there will also be some causalities found from SM3/4 to P, since SM1/2 would contain the causalities from SM3/4 and then pass it to P indirectly.
It is worth noting that the causality from SM3 → SM1 and SM3 → SM2 have a max-value at 0 time-delay, neither in a positive nor a negative time-delay zone; yet such a case can be still regarded as a true causality, since the strongest CCM skill did not appear in negative time-delay zone.Also, if Figures 8a-8c are compared in a row, the tendency that the max-value of orange lines moving from the positive toward the negative time-delays zone (passing the 0 time-delay) is obvious, indicating that as the layer of SM goes deeper, causality upwards will become weaker.As for the actual time-delay in each causality, most final convergent values of CCM are found peaking at 1-month, and that from SM3 to SM4 could be 3-month (i.e., a "plateau"), thus we assume that the response time among SM with different layers could be longer than that between P and SM. that to what extent it can be reckoned as a real causality, instead of a strong-coupling; for example, the color shades of red arrows are lighter for the causalities from SM3 to SM1/SM2, this is based on the fact that the orange lines in panels and (d) has its max-value in 0 time-delay instead of completely in positive time-delay zone, and there is a tendency that the max-value of orange line moves from positive to negative time-delay zone.Such patterns of causality are similar in all four research areas, based on which we can summarize: P always renders causal impacts to SMs, and SMs in shallow layers always render causal impact to SMs in deeper layers, whereas SMs in deeper layers can hardly render any causal impact to SMs in shallow layers or P.
After adopting extended CCM considering different monthly time-delays in the rest three research areas, we can summarize the causality patterns between P and SM, and among SMs in different layers.Based on the results, we found that the signatures of CCM (i.e., the positions of max-values of CCM) are all similar in four research areas (statistical significance of the max-value CCM skill are all proven using surrogates).Given that it is only the signatures of CCM (the positions of max-values) that we care about to infer the causality, the whole CCM results for the rest three research areas will not be demonstrated in case of redundancy; instead, we summarize all the CCM signatures between P and SMs, and among SMs in different layers in all four research areas, and show them in Figure 8g.Such a summary that, the causalities between P and SM, and the causalities among SMs in different layers, all follow the basically identical patterns as Figure 8g in four research areas, even though with different local climatic conditions.In other words, the conclusions about causality signatures seem to remain unchanged, regardless of local climatic conditions.The CCM results among SM(s) all pass the significant test with surrogates.

Causality Between ET and SMs
The water molecule escapes from soil mainly via ET and affects P afterward, and thus ET plays an indispensable role in SM-P coupling (Wei & Dirmeyer, 2012).Figuring out the causality between SM and ET will help to understand the causality in SM-P coupling.Therefore, we also apply extended CCM with time-delays between ET and SMs in each layer (1-4), in PRB first (Figure 9).For the SM1 and SM2, their causalities with ET are shown as Figures 9a and 9b, which clearly indicate that the causalities between SM1/2 and ET are bidirectional, and the causal strength from SM1/2 to ET is stronger than the other way around.Also, given that the position of max-value lies in 0 time-delay in both subfigures, it could be inferred that the causalities between SM1/2 and ET are instantaneous.However, for the causalities between SM3/4 and ET (Figures 9c and 9d), the extended CCM results seem very strange: they have similar periodic fluctuations, ranging from 8 months to +8 months, where there is not a clear peaking point with max-value of CCM as the ideal time-delay, different from the previous results of in SM1/2-ET.Based on such results, unfortunately, it is unable to detect either the direction of causality between SM3/4 and ET, or the actual time-delay among them; that is, CCM seems invalid for the interaction between ET and SMs in deeper layers.
Furtherly, we conducted extended CCM on SM-ET interactions for the rest three research areas.After summarizing all CCM results of SM-ET interactions in four areas, we found a pattern: when the λ of SM is positive (λ > 0), the CCM results signature will follow a similar pattern as Figures 9a and 9b, from which the time-delay less than 1-month and the direction of causality can be clearly spotted: there is mutual causality between ET and SMs; the CCM results of SM-ET interactions all pass the significant test with surrogates.However, when the λ of SM is negative (λ < 0, for most SM3 and SM4 in four regions), the CCM results signature of SM-ET interaction will follow a similar pattern as Figures 9c and 9d, from which the actual time-delay and the direction of causality cannot be clearly identified.
The causal interaction between SM and ET has been always recognized as bidirectional (An et al., 2018), which is also proven in our findings (Figures 9a and 9b).However, what is more interesting in our finding is that the λ of SMs (implying a convergent or divergent trend) seems to play a part affecting the causality SM-ET interaction.However, by contrast, in the extended CCM results in SM-P coupling, we did not find any fluctuating patterns like Figures 9c and 9d, despite the fact that SMs play important roles in both SM-P and SM-ET.At this stage, however, the reason behind this phenomenon is still unknown in our study.
In general, based on the signatures of extended CCM in SM-ET interaction (i.e., fluctuating or not), we could notice that the causality in SM-ET interaction has a close relationship with the chaotic behaviors of SMs indicated by λ.Meanwhile, it seems to have fewer connections with local climatic conditions, since such signatures are observed in four research areas.At this stage, it is a limitation that we fail to detect further underlying reason how and why the CCM results in SM-ET interactions are affected by λ of SMs, and especially, why the λ of SMs did not affect the CCM results in SM-P coupling.In addition, we cannot rule out the possibility that the strange fluctuating signatures of extended CCM in SM-ET interaction are affected by the imperfection of the algorithm per se (Bartsev et al., 2020).Yet, such a limitation requires a more comprehensive investigation based on more research areas.

Concluding Remarks
Highlighting the deterministic dynamic nature of SM-P coupling and its internal non-separable causality, the primary motivation of this research is to provide an indispensable complement to the previous studies for this issue by employing CCM.Meanwhile, we denoted that it is necessary to recognize and confirm the DDS nature of targeted variables before applying CCM.Particularly, as the most important property of DDS, the existence of chaos in both variables (their time series) should be elucidated, since it not only verifies their DDS nature but also prevents the confusion with stochastic process.The major findings of this research are summarized as follows: 1.The DDS nature and, especially, the existence of chaos were detected from each individual time series of P, SM, and ET via multiple methods, proving that the DDS is a suitable underlying model for SM-P coupling and the SM-ET interaction and distinguishing it from pure stochastic, which provides the precondition for the application of CCM. 2. For the causality in SM-P coupling, we found that only those causalities between P and shallow SMs are mutual (i.e., P affects shallow SMs, meanwhile shallow SMs also render reverse causality on P); whereas the causality from P to deeper SMs are unidirectional, without clear evidence for the reverse causality from deeper SMs toward P being found.Besides, some of the causal strength of SM → P are actually larger than those from P → SM. 3.For the causality among SMs in different layers, in general, shallow SMs could always render causality to deeper SMs downwards, whereas deeper SMs could hardly render reverse causality upwards; such a tendency is more obvious with the depth going deeper, especially when SM4 could barely render no causality upwards to SM1, SM2, or SM3.This finding is consistent with the last one: SM3/4 cannot have causality on P because they can hardly have causality on SM1/2 as a channel.4. The daily time series of P and SM1 is applied to detect the more precise causal response time in SM-P coupling.Due to the limitation of the time series samples and CCM, only the data in spring and summer were selected.Summarizing all four research areas, we found that the causal response time from SM1 to P in spring/summer seems to be 4-6 days, and that from P to SM1 in spring/summer seems shorter: 2-4 days.5.For the causality in SM-ET interactions, they have two patterns across the four areas, which depends on the λ of SMs.Therefore, we hypothesized that the causalities in SM-ET seem being under strong influences from the nonlinear chaotic behaviors of SMs, instead of local climatic condition; however, as a limitation, the reason behind this finding is still unknown.
By summarizing all CCM results in the four research areas, we observe similar signatures of causality in SM-P coupling, among SMs in different layers, and in SM-ET interactions.This suggests that these causalities may not be solely controlled by local climatic conditions.Instead, we propose that they might be influenced by the chaotic behaviors of time series, especially the causalities in SM-ET interactions, where the chaotic behaviors of time series of SM indicated by λ seem to play an important role.However, at this stage, this remains as only an assumption, the reason of which is unknown and requires further investigation based on a broader spatial scale.Still, we could speculate that it is the local climatic conditions that affect the nonlinear chaotic behaviors in the DDS of local SM-P coupling and other interactions, subsequently influencing their internal causality.
Additionally, some extensions and limitations of this research regarding the DDS and stochastic descriptions of hydro-meteorology, along with the concepts of attractor and chaos under the hydro-meteorological context, are provided in Text S4 in Supporting Information S1.

Figure 1 .
Figure1.(a) Lorenz system, its underlying nonlinear governing equations, and its attractor with a butterfly manifold.(b) A simpler case of attractor with a circle manifold along with underlying governing equations and time series.It is worth noting that the underlying equations are linear, which means that such a system is a linear one (the nonlinearity in this system is zero).(c) The procedure of Phase Space Reconstruction (PSR): using one of the time series from an original attractor as material.The purpose of PSR is to reconstruct a shadow version which is diffeomorphic to its original attractor.(d) A simple illustration of diffeomorphism with common geometric figures: the connection among A, B, C, D (or A′, B′, C′, D′) remains unchanged from a triangle to a sphere module, which means that they are diffeomorphic.(e) The schematic of False Nearest Neighbors (FNN) method.Simply speaking, in panel (e) left, when a time series is reconstructed into a certain dimensional phase space by PSR (ascending dimensions transform), some points that were the nearest neighbors in a lower-dimensional space no longer remain as the nearest as the dimension increases (like the three red points were reckoned as nearest neighbors in 2-D world, but not any more in 3-D world.Based on this, FNN method calculates the fraction of false nearest neighbor, and increases the dimension to see whether the fraction decrease: if it shows a clear decreasing trending (like the blue line in panel (e) right), it means there is clear deterministic characteristics of deterministic dynamic system, otherwise not (red line in panel (e) right).FNN method finds the optimal E when the fraction of FNN starts to be stable at a certain embedding dimension.Reader are directed toKennel et al. (1992) for the rationale of FNN method.

Figure 2 .
Figure 2. (a) Strange attractor of a chaotic deterministic dynamic system (DDS), and its time series with λ > 0. (b) Circle common attractor of a non-chaotic DDS, and its time series with λ = 0. (c) Fixed-point attractor of a non-chaotic DDS, and its time series with λ < 0. (d) A general schematic of how to distinguish DDS nature and stochastic process, and distinguish a chaotic DDS from non-chaotic one, based on CD and λ.

Figure 3 .
Figure 3. (a) Illustration of relationships between the original and shadow attractors.(b)The principle of Cross-mapping: using nearest neighbors and mapping to determine whether there is causal information between two variables.This property of remaining as nearest neighbors after mapping is the evidence of causal information (b, top); otherwise, they would not remain the nearest (b, bottom).Also notice that the directions of mapping and causal information are inverse.After detecting the causal flow from X(Mx) on Y(My), we proceed the same mapping to detect that from Y(My) to X(Mx) by reversing the direction.(c) and (d) Convergence, which is illustrated in the expected outcomes of standard Convergent Cross-mapping (CCM) between X and Y: the convergence of CCM results (blue line in panels c and d) indicates the existence of causality, whereas non-convergence (red line in panel c) indicates the non-existence of causality.When both red and blue lines are convergent (d), the causality in X → Y is mutual; whereas when only one line is convergent (c), the causality in X → Y is unidirectional (X → Y in panel (c).It is worth noting that these two possible results of standard CCM indicates that X and Y belong to the same one underlying DDS; otherwise, we will only have two non-convergent lines (like the red line in panel (c)).(e) The schematic of extended CCM, which is basically a collection of all results of standard CCM, only in different time-delays, and extended CCM focuses on the position of max-value of CCM skills of each standard CCM.(f) and (g) indicate the expected results of extended CCM with time-delays, from which we can infer whether the interaction between two variables is a true bidirectional causality (e) or merely a unidirectional strong-coupling case (f).

Figure 4 .
Figure 4. (a) Individual time series of each variables in Pearl River Basin.(b) Pearson correlation analysis with a sliding-window (12-month length) between P and SM1 in four research areas illustrates the "state-dependent" phenomenon, which is regarded as a hallmark for deterministic dynamic system.

Figure 5 .
Figure 5. Parts of instantaneous causality (SM1-P in four areas) are provided for a brief illustration of "convergence" in standard Convergent Cross-mapping (CCM).Red and blue lines in one subplot indicate the instantaneous causality between two variables with reverse directions respectively.It is worth noting that merely based on such signatures, it is unable to distinguish real bidirectional causality between two variables from unidirectional strong-coupling (i.e., one of the convergent lines might be a fake causality).Therefore, extended CCM is needed for the following analysis.

Figure 6 .
Figure6.All the results of extended Convergent Cross-mapping (CCM) with time-delays between P and soil moisture in each layer (1-4) in four areas.The blue lines indicate the causalities from P to SM(1-4), and the orange lines indicate the reverse causalities from SM(1-4) to P. We use Figures3f and 3gas templates for interpretation.Yellow and dark-blue points indicate biggest final convergent values among all time-delays, and according to their positions (positive or negative), it can be distinguished between the real bidirectional causality and unidirectional strong-coupling.Besides, the causal response time and its lasting time can be also detected.For example, in panel (a), the causality in SM1-P in Pearl River Basin is analyzed: the max-value of orange line lies in the +1 in positive time-delay, indicating that SM1 would render the maximum strength of causality to P 1 month after it, which means that the time-delay between SM1 and P is 1 month.Similarly, the max-value of blue line also shows up in +1 in positive time-delay, which suggests that the causality from P to SM1 also has a time-delay of 1 month.In general, extended CCM focuses on the position of max-value of CCM skills: negative or positive time-delay zone.

Figure 7 .
Figure 7. Results of extended Convergent Cross-mapping with time-delays between P and SM1 in four research areas on daily scale: (a) PRB; (b) WRB; (c) Nebraska;(d) Medjerda River.It is worth noting that these figures only provide an estimation for the potential causal response time (causal time-delay).Also notice that in Pearl River Basin, Wei River Basin, and Nebraska, the causal strength of SM → P is actually slightly larger than that of P → SM.

Figure 8 .
Figure 8. (a)-(f) Results of extended Convergent Cross-mapping (CCM) with time-delays among soil moisture in each layer (1-4) in Pearl River Basin.The interpretation of each lines in all figures are similar to those in Figure 6.The signatures of the extended CCM results illustrated in this figure are similar in all four areas.(g) Causalities between P and different-layer SMs (SM1-SM4) summarized from in four research areas.Blue arrows indicate the causalities from vertical variables (blue) toward horizontal variables (red); Red arrows indicate the causalities from vertical horizontal variables (red) toward variables (blue); The shades of color indicatethat to what extent it can be reckoned as a real causality, instead of a strong-coupling; for example, the color shades of red arrows are lighter for the causalities from SM3 to SM1/SM2, this is based on the fact that the orange lines in panels and (d) has its max-value in 0 time-delay instead of completely in positive time-delay zone, and there is a tendency that the max-value of orange line moves from positive to negative time-delay zone.Such patterns of causality are similar in all four research areas, based on which we can summarize: P always renders causal impacts to SMs, and SMs in shallow layers always render causal impact to SMs in deeper layers, whereas SMs in deeper layers can hardly render any causal impact to SMs in shallow layers or P.

Figure 9 .
Figure 9. Results of extended Convergent Cross-mapping (CCM) with time-delays between SM(1-4) and evapotranspiration (ET) in Pearl River Basin.It is worth noting that depending on λ of SM(s), there are two patterns of the extended CCM results between ET and SMs in four areas: when soil moisture (SM) has a λ > 0, its causality with ET would have a similar pattern with (a)/(b); whereas SM has a λ < 0, its causality with ET would have a similar pattern with (c)/(d).
, X tβ , X tγ on My: Y tα , Y tβ , Y tγ .If Y tα , Y tβ , Y tγ also remain as the nearest neighbors of Y Estimate an Ŷt′ using Y tα , Y tβ , Y tγ and the minimum Euclidean distance.4. Use the same timestamp t′ of X t′ to find the real Y t′ on My. 5. Calculate the correlation between the estimated Ŷt′ and the real Y t′ , which is the numerical value of mapping skill: ρ (Deyle & Sugihara, 2011), and conduct the statistical significance test.
Sugihara et al. (2012) from Mx, it means that there is causal information from Y(t) to X(t) (Figure3b, top); otherwise not (Figure3b, bottom).3.The best number of nearest neighbors is E + 1 according toSugihara et al. (2012); only three nearest neighbors are used in Figure

Table 1
Results of Optimal E and τ, CD, and λ for Each Research Area