The Augmented Jump Chain

Modern methods of simulating molecular systems are based on the mathematical theory of Markov operators with a focus on autonomous equilibrated systems. However, non‐autonomous physical systems or non‐autonomous simulation processes are becoming more and more important. A representation of non‐autonomous Markov jump processes is presented as autonomous Markov chains on space‐time. Augmenting the spatial information of the embedded Markov chain by the temporal information of the associated jump times, the so‐called augmented jump chain is derived. The augmented jump chain inherits the sparseness of the infinitesimal generator of the original process and therefore provides a useful tool for studying time‐dependent dynamics even in high dimensions. Furthermore, possible generalizations and applications to the computation of committor functions and coherent sets in the non‐autonomous setting are discussed. After deriving the theoretical foundations, the concepts with a proof‐of‐concept Galerkin discretization of the transfer operator of the augmented jump chain applied to simple examples are illustrated.


Introduction
The last decade of theoretical treatment of simulation methods was characterized by the analysis of autonomous Markov processes. The uniform concept of Markov operators and infinitesimal generators was investigated for these purposes which has led to a rich development of analysis tools in mathematics. In order to be able to benefit from these tools also in the non-autonomous case, a broader uniform theoretical framework is required to deal with non-autonomous as well as autonomous methods and processes (not only) in molecular simulation. tial equation with a first-order derivative of the time variable, might be highly nonlinear, the mathematical object that accounts for the transfer of probability densities of system states is a linear transfer operator (the "adjoint" continuous counterpart of a transition matrix of a Markov chain). The formulation of Markov processes in terms of transfer operators has proven to be a powerful tool for their analysis. Techniques like transition path theory, [3] reaction coordinates [4,5] and coarse graining, [2] clustering [6] and coherent set analysis [7,8] are just a few methods building on this formalism.
However, the computational cost of these approaches grows with increasing numbers of states and quickly becomes infeasible for high-dimensional problems. Their corresponding formulation in terms of infinitesimal generators or rate matrices [9,10] promises to alleviate computational costs by making use of the sparse structure in many real world problems, where instantaneous state changes are restricted by a locality assumption. We want to be able to exploit this sparsity also for non-autonomous processes. If one were to find a generator-like object for nonautonomous processes, then corresponding methods could be transferred directly. Physical models mostly refer to self-contained systems that can be isolated in the laboratory and which, therefore, allow for the analysis of autonomous processes, often after equilibration of the system. However, if we want to study the influence of external forcing (e.g., of external control), transient dynamics or the production rate of catalytic cycles, then non-autonomous (i.e., having a time-dependent, changing law of state evolution) and nonequilibrium systems play an important role.
While there are extensions to the non-stationary regimes [8,11] we do not know of any such approach inheriting the sparseness of the generator and thus facilitating the analysis of highdimensional complex systems.
In this article we focus on Markov jump processes which are memoryless stochastic processes continuous in time and discrete in space and have been successfully used in reaction kinetics, queueing theory, Markov state models and network analysis.
Aiming at a sparse approach to non-autonomous dynamics we develop a novel representation of time-dependent Markov jump processes. Inspired by recent developments in physics [12,13] which look at time emerging from the order of events rather than as a constantly evolving exogenous entity we look at the process as a series of jumps in space and time such that every change that takes place in the system is a change in the spatial and in the time domain. Formally this amounts to the extension of the ideas of the embedded Markov chain [14] or semi-Markov processes [15] to the time-dependent setting and will lead us to an autonomous process in space-time, the augmented jump chain. In contrast to semi-Markov processes, however, we have an explicit timedependence resulting in non-homogeneous exponential holding times, whereas semi-Markov processes become time-dependent implicitly due to (autonomous) non-exponential holding times. Despite these differences, recent large deviation results [16,17] might provide a link between metastable/coherent sets considered in this article and large deviations for the empirical measure.
A realization of the augmented jump chain, consisting of a sequence of space-time points, corresponds to the time-continuous trajectory of the original process. Imagine tracking an ensemble of particles, all starting at the same time in their individual spatial states. We observe their respective jumps which take place in space and time. The transfer operator (of the augmented jump chain) maps the distribution of such an ensemble to the distribution after its next jump while retaining the local nature of the original process: particle states still only jump to their "neighboring" states. Although this operator evolves the classical time in a concurrent manner, we can reconstruct the whole family of Perron-Frobenius operators (for each fixed time) by means of an iterative procedure which we will denote as synchronization. What is more interesting though is that we can compute the action of its dual, the Koopman operator, directly by solving a linear boundary value problem akin to the Chapman-Kolmogorov equation. This linear problem furthermore resembles the computation of classical committor functions and we show how it naturally leads to an extension of the committor framework to the non-autonomous regime with the Koopman operator being a special case of such a non-autonomous committor. We conclude by deriving a (sparse) finite-time Galerkin projection of the transfer operator and applying it to two illustrative examples.

Background
In this section, we will introduce the notation and recall some basic results needed for the subsequent sections.
Let the set = {x i } i=1,…,N denote a finite state space and {X t } t∈ a time-continuous Markov chain (also called Markov jump process) on with = ℝ + 0 denoting the time domain. It is well known [2] that this process can be described by means of its associated stochastic transition kernel denoting the conditional transition probabilities. This kernel gives rise to a family of important transfer operators, the prop-agator (or Perron-Frobenius operator)  : ( 2 ) and the adjoint Koopman operator  : These two are adjoint in the sense that <  s,t f, g >=< f,  s,t g > with < ⋅, ⋅ > denoting the corresponding dual pairing. This equality illustrates that evolving a density f forward in time via  and measuring the observable g in the future is the same as pulling the observable g back in time via  and applying it to the current state f . Therefore the propagator and Koopman operator are also called forward and backward transfer operator, respectively. Note that we are explicitly interested in time-dependent (nonautonomous) processes and as such the above objects in general depend on both, the starting time s and the end time t. In contrast to the time-independent (autonomous) regime, where the transfer operators merely depend on the elapsed time t − s and thus form a one-parameter semi-group  t−s :=  s,t , the nonautonomous pendant does not allow for such a simple construction.
We can nevertheless define the time-dependent infinitesimal generator at each time t by We can denote the generator as a matrix Q(t) = (q ij (t)) composed of the transition rates from state x i to x j , with denoting the indicator function. We furthermore introduce the shorthand notation for the outbound rate where the latter equality follows from the fact that our system is (probability) mass conserving. In the case of autonomous systems, that is, Q(t) ≡ Q, we will denote these quantities simply by q ij and q i . The generator is of special interest for systems which satisfy the so-called locality assumption, that is, states only interact with a few other states, as in that case the generator can be represented as a sparse matrix. This diminishes the computational cost in the analysis of many real-world systems, e.g. spatial diffusion processes, where particles can only jump to spatially neighboring cells.
The definition of the infinitesimal generator motivates the formal linear equation with I denoting the identity operator. For the autonomous case where Q(t) ≡ Q, we indeed know that  t = e tQ . This has very useful applications in practice: Since  t and Q are related via the exponential map, their eigenvectors are the same. Hence they share many statistics, such as their invariant distributions.
One might hope to extend this relationship to the nonautonomous regime by replacing the exponent tQ with its integrated analogue [18] but this does not hold for noncommutative Q(t). There exist perturbative approaches to the solution of this problem such as the Dyson and Magnus series adjusting for the noncommutativity by computing nested commutators, but these will in general not remain sparse. We will tackle the problem from the perspective of the jump chain (also called embedded Markov chain) and extend it to the time-dependent regime while still inheriting the sparse structure of Q(t).
Let us therefore recall the classical construction of the jump chain.

Definition 1. Let X t be a Markov jump process.
For n = 0, 1, 2, … define the jump times of X t to be if the infimum is attained or ∞ otherwise. The corresponding holding times are defined as Furthermore define the jump chain (also called the embedded chain) of X t to be Y n = X J n (11) This construction decomposes the original jump process X t in two components: the temporal component in form of the jump times J n , which amount to the times at which X t changes its state, as well as the spatial component in form of the jump chain Y n which keeps track of these states. The holding times, that is, the differences between the jump times, amount to the time each state remains in the same position.
The original process can then be reconstructed by with the jump count given by and J n = ∑ i≤n H n . The following theorem allows us to characterize both components explicitly in terms of the infinitesimal generator for the case of an autonomous process: Theorem 1 ([19]). Let X t be an autonomous Markov jump process with infinitesimal generator Q = (q ij ).

Then the jump chain Y n is a Markov chain with transition proba-
Furthermore the holding times H 1 , H 2 , … are independent exponential random variables with parameters q Y 0 , q Y 1 , …, respectively.
Using this decomposition for sampling, that is, drawing the next state from the Markov chain Y t and the exponentially distributed holding time H n leads to the well known Gillespie (Stochastic Simulation) Algorithm [20] for sampling from Markov Jump chains.

The Augmented Jump Chain
In this section we describe the construction of the main object of this study, the augmented jump chain for non-autonomous processes. Similar to the jump chain of autonomous processes, we decompose the process into its spatial and temporal parts, respectively, by conditioning either on a specific time or location. Unlike in the autonomous regime, however, both parts now explicitly depend on time. By combining both components, that is, augmenting the spatial with the temporal component, we arrive at an autonomous process on space-time, represented by a new transfer operator, the jump operator  , encoding the original process X t . We then show how to use this operator to reconstruct the classical, non-autonomous transfer operators  s,t ,  s,t and discuss a more general application for time-dependent committors.

Definition 2. Define the augmented jump chain to be the tuple
where the jump chain and jump times are defined as in Definition 1.
We call this the augmented jump chain since its state space is that of the original process X t (or its jump chain Y n ) augmented by the time component. Note, however, that unlike in classical augmentation schemes (e.g. the augmentation of nonautonomous differential equation) the "internal" time component J n does not evolve linearly with the "external" time n of the augmented jump chain, see Figure 1.
The augmented jump chain now gives us a tool to analyze the time-continuous spatially-discrete Markov process X t by means of a discrete-time Markov chain (Y, J) n on the product space × , that is, to look at the process on a per-jump basis. Analogous to the autonomous case we can transfer forth and back between the two representations, either by the definition of the augmented Markov chain (15) or the evaluation of the jump chain (12) at the time-corresponding jump counts (13).
Due to the time dependent structure of the process X t the transition rules change compared to the autonomous case (Theorem 1):

Theorem 2. The augmented jump chain (Y, J) n is a timehomogeneous/autonomous Markov chain on × with transition kernel
for s < t or k = 0 otherwise withq ij (t) being defined as the timedependent equivalents of Equation (14). The corresponding transfer operator is given by the jump operator  : and its adjoint  † : Proof. Since J n+1 > J n by definition we have k = 0 for s ≥ t. Let us therefore consider the case of s < t.
Since X t is Markovian, the jump location at a specific jump time depends solely on the generator at that time, so similar to the autonomous case we have Unlike in the autonomous case the jump times now depend on the time-dependent rates. We therefore replace the homogeneous exponential distribution with its non-homogeneous complement, which is also known as the risk of mortality/hazard function (c.f. appendix): Putting these together, we end up with the desired result □ The given theorem allows us to sample realizations of the augmented jump chain by successively generating samples from the probability density by drawing the jump time from the inhomogeneous exponential distribution followed by the jump location from the embedded Markov chain at that time. This procedure for sampling from time-dependent Markov Jump processes is also known as the temporal Gillespie algorithm. [21] Having the transition kernel it is natural to look at the associated transfer operators which in this case evolve space-time densities. In the following subsections we will show how they enable us to reconstruct the transfer operators ,  of the original process X t .
Let us denote all space-time distributions ∈ L 1 ( × ) which have all their mass at a single time-slice t 0 as spacelike. Given some spacelike initial distribution for the augmented jump chain (Y 0 , J 0 ) ∼ its subsequent space-time states are distributed according to

Reconstruction of the Propagator
The application of the jump operator  to a spacelike initial density returns the density of the locations of its next jump events in space-time. While the initial density's location in time was fixed by construction, its image under  , that is, the location of the next jump, is spread out in time; one may regard the result as desynchronized. This leads to the question what can be said about the distribution at a future fixed time-slice × {t}. Starting from the jump-activity, the superposition of all subsequent jumps, and accounting for the probability to remain in place (i.e., not jump) until the target time we return to the synchronized view by reconstructing the classical propagator  from the augmented jump chain.
Definition 3. The jump-activity E : × → is given by Starting with a spacelike distribution f , the corresponding jump-activity Ef is the density of all induced jump events, similar to the activity of a Geiger-counter over time. In the general www.advancedsciencenews.com www.advtheorysimul.com case Ef can be interpreted as the density of jumps induced by a superposition of spacelike distributions.
Note that E admits the form of a Neumann-series, that is, E = (Id −  ) −1 .

Definition 4. Define the survival probability from time t
Define the synchronization operator S t : The synchronization operator takes a space-time density and projects it onto a specific time by weighting each point with its probability to survive until that time. Starting from a space-like density we are now in the position of constructing all consequent jumps and synchronizing them to a specific time, thereby reconstructing the action of the classical propagator Perron-Frobenius operator: Proof. The probability to be in point x at time t is equal to the sum of the probabilities to jump to x just before time t for every jump time n: which can be further decomposed to jumping to x at time s and staying there

Reconstruction of the Koopman Operator
Instead of solving the propagator directly by computing all possible jumps, as done in the section above, we can solve for the transition kernel of the process X t with a single jump. Similar to the Kolmogorov backward equation we will transport the transition kernel k(x, s, y, t) for fixed y, t backwards in time. This enables us to obtain the propagator by solving a family of boundary value problems. Furthermore we can compute its adjoint, the Koopman operator, by solving just a single boundary value problem (BVP).

Theorem 4.
Let Then f y,t satisfies the inhomogeneous linear boundary value problem with xy denoting the Kronecker delta.
Proof. Define to be the last index of the jump chain before crossing time t. Using the law of total probability we see that we can decompose the probability f y,t into the cases of either jumping or staying The first part reduces to For the second part, since c(t) > 0, we can decompose the jump event as Adv. Theory Simul. 2021, 4, 2000274 www.advancedsciencenews.com www.advtheorysimul.com where the last equality follows from which holds due to (Y, J) being homogeneous.
Putting it all together and treating the special case of s = t implying c(t) = 0 we arrive at the stated boundary value problem. □ Since f y,t is just the transition kernel (1) of the original process for fixed (y, t), that is, we can represent the propagators  and the Koopman operators  in terms of f y,t as Note that the evaluation of the propagator requires the solution of the BVP (31) for each y, which corresponds to solving for the fundamental matrix of the system. The evaluation of the Koopman operator on the other hand can be computed by solving a single BVP: Proof. This follows immediately from by integration of the product of BVP (31) and g over y and the linearity of  † . □

Connections to Committor Functions
Formally the approach above is very similar to the computation of committor functions c(x) giving the probability to hit some set A before some other set B conditioned on starting in x. Classically the stationary committor function for the sets A, B ⊂ is the function c : → [0, 1] satisfying the boundary value problem with prescribed boundary values c| A ≡ 1 and c| B ≡ 0. [2] This approach was recently extended to non-autonomous dynamics for finite-time and periodic systems. [22] Generalizing furthermore to time-dependent target sets it may be useful to think of committor functions on space-time. Indeed the Koopman operator (39) applied to an indicator function of some set G ⊂ can then be interpreted as such a generalized committor function, that is, the probability to hit space-time set A = G × {t} before B = ∖G × {t} (see sketch 3 of Figure 2): By generalizing the BVP (39) to a wider class of boundary values, we may be able to compute such space-time committors, that is, committors of non-autonomous systems with time-dependent target sets A, B ⊂ × .
Depending on the choice of these targets this may then allow to compute many interesting quantities such as the stationary committor, finite-time hitting probabilities or arbitrary spacetime committors (Figure 2).
Besides extending the boundary to appropriate space-time domains special care has to be taken regarding the distinction of jumping or staying (33) as well as w.r.t. the dependence of the survival probabilities S on the boundary. Thus, while the jump chain formalism offers a sparse formulation (c.f. Section 4.2) and is therefore preferable to the classical time-augmentation for computational reasons, the derivation in this framework is more technical due to its asynchronous nature and it may therefore be beneficial to develop the space-time committor theory in the classically time-augmented setting first. This, however, is beyond the scope of this work and will be addressed in future work.

Connections to Coherence
In the context of stationary Markov processes, metastabilities, that is regions of space A ⊂ which are almost-invariant under time-evolution, have proven to be a very useful notion for gaining understanding as well as dimensionality reduction of the system. Extending this approach to the time-dependent regime the analogue to metastability is given by coherence. [8] A set A ⊂ is forward-backward coherent if there exists a set B ⊂ such that where  t,s − is the appropriately defined Koopman operator of the backward process. This definition asserts that A stays "coherent" under time-evolution from s to t in the sense that the spaceregions A and B at times s resp. t have an almost-certain one to one correspondence. Note that forward-backward coherence also implies that (almost) no mass in set B came from outside of set A.
The augmented jump chain naturally gives rise to a further possible notion of coherence in terms of almost-invariant spacetime regions: While this only implies what we would call forward coherence this notion may suffice for many applications and a similar construction involving a backward operator to study forwardbackward coherence should pose no difficulties.
The coherent function f allows for the interpretation as a probability density for a space-time region belonging to (observing) the coherent regime described by f . If a point has high density, that is, probably belongs to the coherent regime, this probably will not decrease with the temporal evolution, that is, it will likely stay in that coherent regime.
We easily see that these functions are not unique by adding a probability in "the future", e.g. f ′ (x, s) = f (x, s) if s < T and f ′ (x, s) = 1 otherwise. This, however, weakens the notion of the corresponding coherent regime, since from time T anything belongs to it. So there is a whole family of coherent functions and depending on the context they may allow to model many requirements leading to optimization problems such as for example finding the spatially "most concentrated" coherent function losing the least amount of mass per time or the "most certain" function coming from some source and hitting some target region in space-time and many more.
Moreover, due to its integral approach of time the augmented jump chain allows not only for coherence with respect to fixed starting and end times but may allow to find coherent regimes for the intrinsic time scales of the process. Finally it might be interesting to decompose the space-time into coherent regimes to obtain a coarse-grained description of the system.

Numerical Discretization
The jump operator acts on the space-time × which due to the continuity of time is an infinite space. In order to allow for numerical computations we will discretize the space-time × and the jump operator  . In the case of spatially sparse generators their sparsity will carry over to the matrix representation of  .
A straightforward approach would be to discretize time into M intervals T l := [t i−1 , t i ). One could then compute the transition probabilities.
Note, however, that we had to assume a fixed starting point (t k ), since we have lost the information about the distribution inside an interval. One can interpret this as shifting all the particles that jump into a time-interval to the beginning of that interval. In order to compensate for that error, we will work with an Galerkin discretization onto indicator functions of these intervals (also called Ulam discretization):

Definition 6. Partition the finite time-interval [0, T] into M disjoint intervals T k
Define : L 2 ( ) → L 2 ( ) to be the Galerkin projection of  onto = span{ il } 1≤i≤N; 1≤l≤M : These entries correspond to the assumption of a uniform prior  for the starting time of the particles inside the intervals: The following proposition shows how to compute the entries assuming a finite time horizon and a generator which is piecewise constant on each time interval:

www.advtheorysimul.com
Proof. We havê For k < l we decompose the integral in the exponent on the time intervals Furthermore computing and similarly for the ∫ T l part leads us tô In the case of k = l we have to take care that the arrival time must be larger than the initial time (t 0 > t 1 implies k(x i , t 0 , x j , t 1 ) = 0) and we hence computê For k > l it follows that ikjl = 0. □ Using a space-major indexing scheme we can rearrange the discretization to a matrix J = (J ab ) a,b∈{1,…,NM} via as illustrated in Figure 3. Since the Galerkin projection of the adjoint is the transpose of the Galerkin projection the matrix J corresponds to  as well as  † when applying the vectors from either the left resp. the right side.
We would like to note that this is a very crude proof-of-concept discretization providing the means to compute above posed problems numerically. The assumption of piecewise constant inho- The sparsity structure in each time-block corresponds to that of the generator at that time. We have a tridiagonal block structure since particles only move forward in time.
mogeneity Q(t) may be dropped when solving the corresponding integrals (53) either analytically or by quadrature. In the case of varying implicit timescales 0 < q i ≪ q j we expect adaptive timediscretizations to be of aid. Since the survival times are exponentially decaying a cutoff may reduce complexity for long timehorizon calculations. As always with Galerkin methods one can adapt this method with different ansatz functions. [23] Although these are important questions the discretization is not the focus of this manuscript and we defer them for later research.

Sparseness and Complexity
We constructed the augmented jump chain with the goal of sparsity in mind. We can see that the transition kernel (16) of the jump chain is given in terms of the ratesq ij (t). Therefore the sparsity of the infinitesimal generator, a property very common in many applications, is inherited by this representation. This concept is also reflected in our discretization: Whenever q ij (t l ) is zero,  ikjl and the corresponding entry in the matrix J is zero as well.
While the matrix J is much bigger (NM × NM) than e.g. the generator of an autonomous system (N × N), some increase in complexity is to be expected when going from the nonautonomous to the autonomous regime. We hence might compare our approach to the classical augmentation of the transfer operator. The classical augmentation leads to a band diagonal block matrix where the first off diagonal blocks are composed of the transition matrices between the individual time points t k . While the number of non-zero blocks, (M), is much smaller than in our suggested approach, (M 2 ), each of these blocks is dense. This difference becomes crucial when considering very big, sparse systems, such as e.g. diffusion or molecular dynamics on high-dimensional spaces: Using a regular grid of L subdivisions in each of the D space dimensions we end up with N = L D Markov states. However, since each of those only interacts with one of its 2D neighbors the generator has only 2DL D nonzero entries. Therefore, while the augmented transition matrix has ML 2D non-zero entries, the augmented jump chain matrix J has only O(M 2 DL D ) entries. Although regular grids are not feasible in high dimensions this example shows how the cost for the additional blocks is outweighed by their respective sparsity for higher dimensions, a property we expect to hold as well for more sophisticated discretizations.

Numerical Examples
In this section, we will first illustrate the developed concepts on a simple time-dependent two-state model and then compute basic error statistics for the jump operator discretization of the overdamped Langevin dynamics in a 2D potential landscape.

A Simple Two-State Model
For the first example, we consider two states, = {A, B} on the time interval = [0, 8]. The dynamics of the jump process at each time is fully determined by the respective rates of transitions from A → B and B → A, respectively. Aiming for a nonautonomous but simplistic example we define the process to consist of two phases. In the first half of the time interval it is possible to transition from A to B at rate 1 whereas B is absorbing and in the second half we reverse the roles: We then compute the Galerkin discretization of the jump operator as in section 4.1. Partitioning the time interval into M = 8 uniform intervals we obtain the jump matrix J depicted in the left of Figure 4. Since we used space-major ordering for space-time states each 2 × 2 block represents the transitions from and into a time-slice whereas the position inside the blocks determines the spatial start-and end-positions (see also Figure 3. Looking at the upper row of blocks we observe that for the initial two time blocks the dominant transitions are those from space-state A to B. This switches in the second half of the time interval, that is, for the blocks on the right half of the matrix. That is, trajectories that started at time 0 in state B will most likely jump after t = 4, when B is no longer absorbing. We can also recognize the exponential decay of the probabilities with time. The following rows of blocks encode the behavior for the jumps starting from later times and mimic the qualitative behavior of the top row although with different densities. Starting from an initial distribution we are now in the position to look at the induced jump activity, its synchronization and the resulting Koopman operator. Let us start with a space-time distribution f ∈ ℝ 2×8 with all mass in state A at the initial time interval, that is, f x,t = x,A t∈{0,1} . We then compute the jump activity from Equation (24) truncating the sum at n = 100 for reasons of computability. The top right of Figure 4 depicts the resulting activity Ef which can be understood as the amount of space-time jumps happening into each space-time cell akin to a Geiger counter. Looking at the first time-interval we can recognize the initial density (corresponding to the 0th jump) in state A as well as the following jumps to B. The intensity decays with time until t = 4 since less and less particles remain available for the transition from A → B whereas the other direction is inhibited by the 0 rate. This changes at t = 4 where we switch the reaction rates and observe a similar pattern in the reverse direction. Note the asymetry between the two temporal halves of the experiment. Due to the discretization we don't start at time t = 0 but uniformly in the first time-cell. This is why the activity in B during t ∈ (0, 1) is lower than in A during t ∈ (4, 5). This, however, could be alleviated by a finer temporal discretization of the initial condition.

Diffusion Process with Changing Temperature
In order to illustrate the applicability to molecular dynamics we now consider a diffusion process with drift induced by a potential. We reduce the temperature in time, akin to the process in simulated annealing. More precisely, we consider the overdamped Langevin equation in ℝ 2 , with a triple well potential V with 2 deep wells at (−1, 0), (1,0) and a shallow well at (0, 3 2 ) as in [22]. W t denotes standard Brownian motion and (t) is the varying inverse of the temperature / the coldness.
We discretize the state-space on the domain [−2, 2] × [−1, 2] by dividing it into a square grid of n x = 9 horizontal and n y = 7 vertical points. In order to obtain the spatially discrete jump process approximation to the originally space-continuous process, we use the square-root approximation (SQRA). [10] The SQRA estimates a generator matrix on the space of states identified with the grid points by linearly interpolating the potential between neighboring points and calculating the resulting rates for a given temperature. It is called SQRA since it can be expressed in terms of the square root of the Boltzmann weights as follows: where A ij denotes the adjacency matrix of the grid points, V i the potential at grid point i and the diagonal Q ii is set to satisfy row sum zero. The factor Φ amounts to the transition rate in a flat potential and depends on the as well as the spatial grid-size h by Φ = −1 h −2 . [24] For the time domain, we chose T = [0, 2] which we subdivide into n t = 6 uniform time cells of size ΔT = 1 3 and we impose an annealing protocol by starting with high temperature in the first half, (t) = 1 for t ∈ [0, 1), and decreasing it in the second half, (t) = 10 for t ∈ [1,2].
The left of Figure 5 shows the corresponding discretization of the space-time jump operator J. We can recognize the hightemperature regime on the left half of the matrix by the rather uniform distribution of transition probabilities inside each block, as well as by the fast timescale of the reactions indicated by a high amount of temporal self-transitions on the diagonal blocks, with quickly decaying transitions to the future time blocks (almost none for the second off-diagonal). On the other hand, the right half of the matrix encodes the behavior of the low temperatureregime. The distribution of transitions inside each block is more peaked as the potential-induced drift dominates the now small noise. We also see that the process slowed down since we have more transitions to the future blocks on the off-diagonal corresponding to particles that remain in place for longer times.
Whereas the matrix has (n x n y n t ) 2 = 142884 entries only 4620 of these are nonzero, leading to a sparsity factor of 3.1%. The sparsity pattern is depicted on the right of Figure 5.
The approximation error of the spatial discretization of the process by means of the SQRA is discussed in ref. [25]. We can analyze the approximation error of the temporal Galerkin approximation by comparing the reconstruction of the propagator  t (Section 3.2 [26] ) to the exact propagator  0,t of the Markov jump process obtained from the matrix exponential of Q (which is piecewise constant) by means of the L 2 operatornorm at the end-time t = 2: Figure 6 shows the resulting error for our example for temporal step sizes between 0.01 and 1 and we observe convergence close to order 1.
Although these examples mainly serve to display the concept of the space-time augmented jump chain and merely recompute already known quantities, they also illustrate its main strength, that is, dealing with non-autonomous processes in a sparse way www.advancedsciencenews.com www.advtheorysimul.com while encoding structural properties such as the mixing behavior and timescales of the underlying problem.

Conclusion
We extended the known representation of autonomous Markov jump processes as embedded Markov chain (Theorem 1) to the non-autonomous regime. Augmenting the state space with the time dimension allows us to encode the temporal dependence of the embedded chain in the new space-time state space. Therefore we end up with a time-independent representation for the system. While the augmentation is a common technique for nonautonomous systems, the novelty of our approach is that we only look at the jump events themselves. This allows us to move from a non-autonomous continuous-time Markov process to an autonomous discrete-time Markov chain (Theorem 2), albeit on a more complex state space. We call this Markov chain the augmented jump chain and characterize it through its transition kernel and evolution operator, the jump operator.
This approach leads to an interesting perspective on time: Whereas classically time progresses uniformly, we now have a description where the process jumps through time concurrently. While it is possible to revert to the classical picture through a synchronization, that is, by assigning a membership along each space-fiber of the augmented system towards a specific timepoint in uniform time, it is interesting to see that many problems can be tackled in the augmented regime directly. We showed how the evaluation of the Koopman operator, that is, the evolution of an observable through time, can be solved directly in the "desynchronized" regime in the form of an inhomogeneous linear boundary value problem on space-time (Corollary 5). This problem structurally resembles the one for the computation of committor functions in stationary systems.
We discuss connections of our representation to the computation of committors for time-independent target sets but nonautonomous dynamics. The time-augmented perspective furthermore allows for a natural extension to a wide class of timedependent targets and eventually a non-autonomous committor theory. We furthermore discuss the application of the augmentation to the theory of coherence where it seems to provide a promising view on capturing time-invariant structures.
The defining principles of our proposed approach are twofold. For one the well-known technique of augmentation allows us to treat non-autonomous systems and extend common notions of analysis (committors, metastability) in a unifying way to the timedependent regime. The other, however, is far less understood: By focusing on the jump events as main principle of evolution in contrast to the usual focus on time, we arrive at a description where the classical time evolves concurrently. We show how this leads to a representation inheriting the sparsity of the infinitesimal generator. This in itself may prove to be very useful for the computational analysis of (especially high-dimensional) nonautonomous systems. However, interpreting the concurrency as uncoupling of different time-scales requires further research and we believe that it becomes a cornerstone for the analysis of complex dynamics with multiple-timescales.
All in all, we hope for the augmented jump chain to enhance the numerical capabilities for complex systems on the ap-plied side as well as opening doors to new perspectives for timedependent jump processes on the theoretical side.

A.1. The Non-Homogeneous Exponential Distribution
Although what we call the non-homogeneous exponential distribution may very likely be already known, e.g. in the field of survival analysis, we could not find any published references. We therefore present a short derivation based on an answer on stackexchange [27] : Define the non-homogeneous exponential distribution (NED) with rate q : ℝ + → ℝ + by the cumulative distribution function (CDF) where T is the NED distributed random time. Note that F indeed is a CDF: Then its derivative is the probability distribution function (PDF) Now consider the conditional probability and its rate, that is, the limit for Δt → 0 The homogeneous exponential distribution (HED) with rate q is a special case of the NED with q ≡ q(t). Hence the NED has the same conditial rate as the HED for an event occurring at each time t, that is, is the consistent generalization to nonautonomous rates. Furthermore the survival probability satisfies