Time‐inconsistent contract theory

This paper investigates the moral hazard problem in finite horizon with both continuous and lump‐sum payments, involving a time‐inconsistent sophisticated agent and a standard utility maximizer principal: Building upon the so‐called dynamic programming approach in Cvitanić et al. (2018) and the recently available results in Hernández and Possamaï (2023), we present a methodology that covers the previous contracting problem. Our main contribution consists of a characterization of the moral hazard problem faced by the principal. In particular, it shows that under relatively mild technical conditions on the data of the problem, the supremum of the principal's expected utility over a smaller restricted family of contracts is equal to the supremum over all feasible contracts. Nevertheless, this characterization yields, as far as we know, a novel class of control problems that involve the control of a forward Volterra equation via Volterra‐type controls, and infinite‐dimensional stochastic target constraints. Despite the inherent challenges associated with such a problem, we study the solution under three different specifications of utility functions for both the agent and the principal, and draw qualitative implications from the form of the optimal contract. The general case remains the subject of future research. We illustrate some of our results in the context of a project selection contracting problem between an investor and a time‐inconsistent manager.

In this paper, we are interested in the moral hazard contracting problem between a principal and an agent with time-inconsistent preferences.A principal-agent problem pertains to the optimal contracting between two parties: the principal, who is interested in hiring the agent, offers a contract; provided the agent accepts, he can influence a random process, the outcome, via his actions.A key feature in these models is the amount of information available to the principal when designing the contract.There are three classical cases studied in the literature: risk-sharing with symmetric information, hidden action, and hidden type.We are only concerned with the first two in this work.
In the risk-sharing scenario, also referred to as the first-best, both parties have the same information and have to agree on how to share the underlying risk.The principal thus has all the bargaining power, i.e. she offers the contract and dictates the agent's actions-the agent is compelled to follow or else he would be severely penalised.In the case of hidden actions, the principal is imperfectly informed about the agent's actions.Either they are too costly to be monitored or simply unobservable.Consequently, the principal expects to receive a second-best utility compared to the risk-sharing case.As the agent is allowed to take actions that are not in the principal's best interest, this situation is also referred to as moral hazard, and incentives play a crucial role.Indeed, the principal hopes to influence the agent's actions by offering an appropriate contract.
In the case of a traditional (time-consistent) agent, a common feature of these models is that their resolution boils down to standard stochastic control theory.Indeed, in light of the principal's bargaining power, the first-best case is always cast as a stochastic control problem for a single individual-the principal-who chooses both the contract and the actions under the participation constraint.On the other hand, in the second-best problem, it being a two-stage Stackelberg game, one has to solve the agent's problem for any given fixed contract before moving to study the principal's problem.In principle, this creates a much more complicated structure on the problem.Since the introduction of the continuous-time model, it took time for the literature to present a general approach that arrived at the same conclusion for the second-best problem.
The study of moral hazard problems in continuous time has its roots in the seminal paper of Holmström and Milgrom [45].In this model, the principal and the agent have CARA utility functions, and the agent's effort influences the drift of the output process, the solution to a controlled diffusion, but not the volatility.The resulting optimal contract is a linear function of the aggregate output.The model in [45] drew great attention as the resolution of the, seemingly more complicated, continuous-time formulation was actually much more tractable, could be rigorously justified, and provided useful explicit solutions for the economic analysis.These were typically harder to reach in most of the discrete-time models that dominated the existing literature, see Laffont and Martimort [49] for an overview.Following upon [45], Schättler and Sung [71,72] studied the validity of the so-called first-order approach, while Sung [78,79] provided extensions to the case of diffusion control and hierarchical structures.The linearity of the optimal contract, a feature also present in [78], is further studied in Müller [62,63], Hellwig and Schmidt [40], Hellwig [39] and Sung [80,81] for the first-best problem, the interplay between the discrete-time and continuous-time models, and for a robust setting, respectively.Notably, Williams [90] and Cvitanić, Wan, and Zhang [16] characterise the optimal contract for general utilities by means of the so-called stochastic maximum principle and forward-backward stochastic differential equations (FBSDEs for short) 1 .
Nevertheless, it was not until the approach in Sannikov [68,69] was available that the study of the moral hazard problem was, once again, reinvigorated and arrived finally at the methodical program presented in Cvitanić, Possamaï, and Touzi [17,18].In a nutshell, this method leverages the dynamic programming principle and the theory of backward stochastic differential equations (BSDEs) to reformulate the principal's problem as a standard optimal stochastic control problem with an additional state variable, namely, the agent's continuation utility.This methodology has been extended to several scenarii including random horizon contracting Lin, Ren, Touzi, and Yang [55], ambiguity features from the point of view of the principal, as in Mastrolia and Possamaï [60] and Hernández Santibáñez and Mastrolia [44], a principal contracting a finite number of agents Élie and Possamaï [25], several principals contracting a common agent Mastrolia and Ren [61], a principal contracting a mean-field of agents [26], and applications in optimal electricity demand response contracting Aïd, Possamaï, and Touzi [2], or Élie, Hubert, Mastrolia, and Possamaï [27].The road map suggested by this approach is quite clear: (i) identify the generic dynamic programming representation of the agent's value process, (ii) express the contract payment in terms of the value process, (iii) optimise the principal's objective over such payments.
All in all, the previous literature is particular to the contracting problem between two (or more) standard utility maximisers, while there is a growing need for the development of models able to explain the behaviour of agents that fail to comply with classical rationality assumptions.Indeed, there is clear evidence of such attitudes in a number of applications, from consumption problems to finance, from crime to voting, and from charitable giving to labour supply, see Rabin [67] and Dellavigna [19] for detailed reviews.The distinctive feature in these situations is that human beings do not necessarily behave as perfectly rational decision-makers.In reality, their criteria for evaluating their well-being are, in many cases, a lot more involved than the ones considered in the classic literature.In light of the methodology introduced in [18], the recently available results in Hernández and Possamaï [43] unveil the possibility of extending this blueprint to cover the moral hazard problem between a principal and a sophisticated time-inconsistent agent.This is the task we seek to accomplish in this paper.Time-inconsistency is, in general terms, the fact that marginal rates of substitution between goods consumed at different dates change over time, see Strotz [77], Laibson [50], O'Donoghue and Rabin [64,65].For example, the marginal rate of substitution between immediate consumption and some later consumption is different from when these two dates were seen from a remote prior date.In many applications, this introduces a conflict between 'an impatient present self and a patient future self ', see Brutscher [11].In mathematical terms, this translates into stochastic control problems in which the classic dynamic programming principle, or in other words, the Bellman optimality principle is not satisfied.Time-inconsistency was first mentioned in [77] where three different types of agents are described: the pre-committed agent does not revise his initially decided strategy; the naive agent revises his strategy without taking future revisions into account; the sophisticated agent revises his strategy taking possible future revisions into account, and by avoiding such makes his strategy time-consistent.The comprehensive study of sophisticated agents started with Ekeland and Pirvu [23], see also Ekeland and Lazrak [21,22], which later became the starting point of the general Markovian theory developed by Björk, Khapko, and Murgoci [7].Nonetheless, none of these approaches could handle the typical non-Markovian problems that would necessarily arise in contracting problems involving a principal and a time-inconsistent agent.Hernández and Possamaï [43] provided a probabilistic formulation able At this point, we notice the first stark difference between the classic time-consistent case and ours: the problem of the agent is, in general, linked to the solution of an infinite family of equations, namely the BSVIE, as opposed to one, a BSDE.Nevertheless, the agent's preferences elucidate a connection at the terminal time t = T between the terminal values of the BSVIE and the terminal payment offered by any admissible contract.This is the crucial insight in order to restrict our attention from the family of admissible contracts to a carefully tailored family of contracts for which the agent's value process allows a dynamic programming representation capturing the Volterra nature of the agent's reward.Extrapolating from the time-consistent case, the restricted family of contracts is defined in terms of a family of first-order sensitivities of the agent's value process to the output.For this family of contracts, we show that the principal identifies the equilibrium action for the agent as the maximisers of the associated Hamiltonian.Nevertheless, echoing the agent's time-inconsistent preferences, the resulting principal's problem is, in general, far from being a standard stochastic control problem.
Our main contribution, namely Theorem 3.9, consists in a characterisation of the moral hazard problem faced by the principal and a sophisticated time-inconsistent agent.In particular, it shows that under relatively mild technical conditions on the data of the problem, the supremum of the principal's expected utility over the restricted family of contracts is equal to the supremum over all feasible contracts.Nevertheless, this characterisation yields, as far as we know, a novel class of control problems.These problems involve the control of a forward Volterra equation via Volterra-type controls, and stochastic target constraints.One of the novel features of our result is that the dynamics of this process involves the diagonal value of both the forward Volterra process and the Volterra control, see Definition 3.5.In addition, the stochastic target constraint arises due to the time-inconsistent preferences of the agent, see (3.6) and Remark 3.7.
Despite the inherent challenges of this class of problems, we study the solution to moral hazard problem under three different specifications of utility functions for both the agent and the principal.For instance, for nonseparable reward functionals, we find that if both the agent and the principal have exponential utilities functions and the agent's reward is given by the discounted value of his utility, the problem reduces to a standard control problem, see Section 4.1 and Proposition 4.3.This is a feature that we also see in the risk-sharing (or first-best) contracting examples between the principal and a time-inconsistent agent that we present in Section 2. The second example considers a risk-neutral principal and a risk-neutral agent with separable reward functional.In this case, our analysis shows that it is possible to reduce the complexity of the problem.Indeed, we can exploit the structure of the problem to formulate an ansatz to the principal's problem, for which we present a result in the spirit of a verification theorem, see Section 4.2 and Proposition 4.10.In the last example, we go back to the first setting, but in this case the agent's (exponential) utility is taken on the discounted income.This simple modification highlights the intrinsic difficulties of the general case, and we are able to solve the problem of the principal for a class of contracts smaller than the one prescribed by the restricted family of contracts in Theorem 3.9, see Section 4.3 and Proposition 4.17.The general case remains the subject of future research.
Regarding the qualitative implications of our results we can mention the following: (i) from a methodological point of view, unlike in the time-consistent case, the solution to the moral hazard problem does not reduce, in general, to a standard stochastic control problem.Nevertheless, the solution to the risk-sharing problem between a utility maximiser principal and a time-inconsistent sophisticated agent does, see Section 2. This suggest a dire difference between the first-best and second-best problems as soon as the agent is allowed to have time-inconsistent preferences; (ii) a second takeaway from our analysis is associated with the so-called optimality of linear contracts.These are contracts consisting of a constant part and a term proportional to the terminal value of the state process as in the seminal work of [45].This was also the conclusion of Carroll [12] in a two-stage time-consistent model in which the principal demands robustness, in the sense of evaluating admissible contracts by their worst-case performance, over unknown actions the agent might take.Similar results we obtained by [80; 81] and [60] in the continuous-time setting.Moreover, the results in Abi Jaber and Villeneuve [1] show that the optimal contract remains linear when the output is driven by a Gaussian Volterra process (instead of Brownian motion).We study two examples that can be regarded as (time-inconsistent) variations of [45], which we refer to as discounted utility, see Section 4.1, and utility of discounted income, see Section 4.3.In the former case, by virtue of the simplicity of the source of time-inconsistency, we find that the optimal contract is linear.In the latter case, we find that the optimal contract is no longer linear unless there is no discounting (as in [45]).Our point here is that slight deviations of the model in [45] seem to challenge the virtues attributed to linear contracts, and this suggests that they would typically cease to be optimal in general for time-inconsistent agents; (iii) lastly, we comment on the non-Markovian nature of the optimal contract.It is known that, beyond the realm of the model in [45], the optimal contract in the time-consistent scenario is, in general, non-Markovian in the state process X, see [18].Indeed, we find the same result, see Proposition 4.10, in the case of an agent with separable time-inconsistent preferences, see Section 4.2.As such, we believe this is a manifestation of the agent's time-inconsistent preferences.
Let us illustrate some of our results, see Section 1 for precise definitions.Fix a time horizon T > 0 and consider a sophisticated time-inconsistent agent with risk-neutral preferences.This is, if the agent enters into a terminal payment contract ξ with the principal, the agent seeks an equilibrium strategy α ⋆ ∈ E(C) according to The principal is a risk-neutral utility maximiser.This is, among all the admissible contracts ξ ∈ C of F X T -measurable random variables satisfying the agent's participation constraint V A 0 (ξ, α) ≥ R 0 , she maximises The agent is time-inconsistent in light of the general discounting function f appearing in his reward.We summarise and illustrate some typical discounting models next.In all the illustrations in this section we take T = 50.The most widely used discounting model in classic economics is the exponential discount function f e .This model captures the empirical evidence that future utils are worth less than current utils, yet the instantaneous discount rate (IDR), given by −f ′ (t)/f (t), is constant over time.To be able to accommodate the fact that consumers have both a short-run preference for instantaneous gratification and a long-run preference to act patiently, Ainslie [3] introduced the hyperbolic discounting model.Hyperbolic discounting generates the so-called self-control problem, e.g. it declines at a faster rate in the short run than in the long run, depending on the value of α, whereas γ plays the role of baseline discounting rate.This qualitative property is even more evident in the so-called quasi-hyperbolic discounting model f q introduced by [50].This model exhibits the short-run impatience of the hyperbolic model, but for long time horizons, its instantaneous discount rate resembles that of the exponential model.For this discounting, β measures the value the agent gives to future periods, whereas λ measures the agent's additional valuation for present/current periods.Once again, γ is the baseline discounting rate.We can see these observations in both the IDR column of the table to the left and the plot of the three models on the right.
Let us recall that the problem faced by an agent seeking to maximise his reward is time-consistent if and only if he discounts future utils/rewards with the exponential model.Thus, taking f h or f q in the above formulation leads to genuine time-inconsistent problems for which sophistication would have an inherent impact in the equilibrium actions followed by the agent under the optimal contract.In addition, it is easy to see that: For any sufficiently regular discounting model, including the ones just discussed, we find in Proposition 4.10 that the associated optimal contract is given by where C(R 0 ) is a constant depending on the agent's reservation utility.The second term, however, reflects the non-Markovian nature of the terminal payment mentioned in (iii) above, and it is inherently related to the nonexponential discounting structure.Conversely, whenever f = f e the term z ⋆ (t) f (T −t) becomes constant, leading to the well-known optimal contract that is linear in the terminal value of the output process.
We now turn our attention to the agent's equilibrium action under the optimal contract.We illustrate this in Figure 2 below.The three columns study the above model with f h , f q and f q for different values of α, β and λ, respectively.In light of the connection with the exponential discounting mentioned above, all columns include f e (t) in blue, which serves as a true time-consistent baseline comparison model.The first row presents the value of the discounting functions and confirms the limits pictorially.The second row presents the associated instant discounting rates, IDR, and the last row does so for the equilibrium actions.Let us first note that the shape of the equilibrium action under exponential discounting is intuitively expected.It is convex and increasing with a shape that is inversely proportional to the exponential discounting term, this reflects that the discounted optimal effort should remain constant under the optimal strategy.Let us first look at the case of hyperbolic discounting agents.We see that as α increases, along the equilibrium effort, the sophisticated agent increases its level of effort during the initial stages.In addition, the rate at which the equilibrium effort changes over time (convexity/concavity) is positive for small values of α and negative for large values, e.g.α 4 and α 2 , respectively.This means that as the time-inconsistency intensifies, sophistication causes the agent to exert larger levels of effort at the initial stages of the game.This reflects how sophistication can help overcome procrastination.
We now look at the quasi-hyperbolic agent in the centre and right column.As β decreases, the agent gives less weight to the future periods and values more present over future gratification.In other words, the time-inconsistency intensifies, and the sophisticated agent decides to postpone some effort to the future period.In this scenario, despite sophistication, the agent cannot overcome procrastination.Lastly, as λ decreases, the agent weighs less the present period, where his time-inconsistency is more acute so that the inconsistency lessens.We nevertheless find that even though the initial effort decreases, and procrastination dominates for these decreasing values of λ, namely λ 1 and λ 2 , as the agent valuation of the present gets significantly small, λ 3 and λ 4 respectively, his effort increases overcoming procrastination and reaching the time-consistent level of effort.We believe that the different behaviours on the equilibrium effort for the quasi-hyperbolic discounting can be reconciled when looking at the associated IDR plots.For t fixed, the IDR is monotonically decreasing in β, whereas there are values of t for which the IDR oscillates when λ decreases.
We leave the comprehensive study of these behaviours as the subject of future research.In particular, it would be interesting to study the extension of our results to the so-called instant gratification model in Harris and Laibson [37], which implements f ig given by f q −→ f ig , as λ −→ ∞, and which are beyond the scope of this document.The rest of this article is organised as follows.Section 1 takes care of the formulation of the problem, and Section 2 presents the solution to the first-best problem under three different specifications of preferences.The common feature in these examples is that the problem boils down to solving a standard stochastic control problem.Section 3 introduces our general approach to the second-best problem and presents the proof of Theorem 3.9.Section 4 is devoted to the analyses of three examples under different specifications of time-inconsistent preferences for the agent.Lastly, we include an Appendix section collecting some new results for time-inconsistent control problems with BSVIE rewards and other technical results.
Notations: R+ denotes the set of non-negative real numbers.Let (E, • E ) be an arbitrary finite dimensional normed space.Given a positive integer p and a non-negative integer q, Cq(E, R p ) will denote the space of functions from E to R p which are at least q times continuously differentiable.In the case q = 0, of continuous functions, we drop the dependence on q and write C([0, T ], R p ).By In we denote the identity matrix of R n×n .S + n (R) denotes the set of n × n symmetric positive semi-definite matrices.Tr[M ] denotes the trace of a matrix M ∈ R n×n .
Let (E, • E ) be an arbitrary finite dimensional normed space.For a σ-algebra F, Pmeas(E, F) denotes the space of Fmeasurable E-value functions.P 2 meas (E, F) denotes the space of Pmeas(E, F)) denotes the set of E-valued, F-predictable processes (resp.F-progressively measurable processes, F-optional processes, F-adapted and measurable).

Problem statement
We fix two positive integers n and d, which represent respectively the dimension of the process controlled by the agent, and the dimension of the Brownian motion driving this controlled process.We fix a time horizon T > 0, and consider the canonical space Ω := C([0, T ], R n ), with canonical process X, and whose generic elements we denote x.We reserve the notation x and x to denote R-valued variables.
We let F be the Borel σ-algebra on Ω (for the topology of uniform convergence), and we denote by F X := (F X t ) t∈[0,T ] the natural filtration of X.We let A be a compact subspace of a finite-dimensional Euclidean space (typically A is a subset of R k for some positive integer k), where the controls will take values.

Controlled state equation
We fix a bounded Borel measurable map σ : [0, T ] × Ω −→ R n×d , and an initial condition x 0 ∈ R n , and assume that there is a unique solution, denoted by P, to the martingale problem for which X is an (F X , P)-local martingale, such that X 0 = x 0 with probability 1, and d Enlarging the original probability space if necessary (see Stroock and Varadhan [76,Theorem 4.5.2]),we can find an R d -valued Brownian motion B such that We now let F := (F t ) t∈[0,T ] be the P-augmentation of F X which we assume is right-continuous.We recall that uniqueness of the solution to the martingale problem implies that the predictable martingale representation property holds for (F, P)-martingales, which can be represented as stochastic integrals with respect to X (see Jacod and Shiryaev [46,Theorem III.4.29]).We also mention that the right-continuity of F guarantees that (F, P) satisfies the Blumenthal zero-one law and consequently all F 0 -measurable random variables are deterministic.Let us note that these assumptions are standard in the existing literature on the continuous-time principal-agent problem.
We can then introduce our drift functional b : [0, T ] × Ω × A −→ R d , which is assumed to be Borel-measurable with respect to all its arguments.Let us recall that for any A-valued, F-predictable process α such that we can define the probability measure P α on (Ω, F T ), whose density with respect to P is given by Moreover, by Girsanov's theorem, the process dr is an R d -valued, (F, P α )-Brownian motion and we have Let us emphasise that we are working under the so-called weak formulation of the problem.This means that the state process X is fixed and, in contrast to the typical strong formulation, the Brownian motion, and the probability measure are not fixed.Indeed, the choice of α corresponds to the choice of probability measure P α and thus impacts the distribution of process X.

The agent's problem
We aim to cover various specifications of time-inconsistent utility functions for the agent.To motivate our formulation, let us start with an informal discussion on the typical nature of the reward functionals assigned to the agent in contract theory.A contract C consists of a tuple (π, ξ), where π belongs the set of F-predictable processes, and ξ is a F T -measurable random variable.At the intuitive level, a contract consists of a flow of continuous payments π := (π t ) t∈[0,T ) , and a terminal compensation ξ.The class of admissible contracts Ξ is introduced later in Section 1.3 after imposing some integrability requirements.
Given a contract C the value received by a time-inconsistent agent at the beginning of the problem from choosing an action α typically takes the form ) denotes the agent's utility function and C α t,T denotes the cumulative net cost functional.We highlight that the generic dependence of both U A and c on t accounts for the sources of time-inconsistency.In the classic literature, utilities are usually classified under two categories, namely For instance, in the separable case the agent's value takes the familiar form which, by the Blumenthal zero-one law, satisfies V A 0 (ξ, α) = Y 0,α 0 , for Y 0,α 0 the initial value of the first component of (Y 0,α , Z 0,α ) solution to the BSDE Moreover, in the (time-consistent) case in which the agent discounts exponentially with constant factor ρ, i.e.U A (t, x) = e −ρ(T −t) U A (x) and c t (s, x, p, a) = e −ρ(t−s) c t (x, p, a), it holds that The previous representation corresponds to a so-called recursive utility particularly known as standard additive utility, see Epstein and Zin [29].Let us remark that an analogous argument holds in the case of the non-separable exponential utility and refer to El Karoui, Peng, and Quenez [24] for more examples of recursive utilities.Intuitively, a recursive utility can be viewed as an extension of the classic separable or non-separable utilities in which the instantaneous utility depends on the instantaneous action α t and the future utility via Y 0,α t .Extrapolating these ideas, we may arrive at considering rewards functionals of the form V A 0 (ξ, α) = Y 0,α 0 where the pair (Y α , Z α ) satisfies the BSVIE By letting both U A and h depend on t we allow for general discounting structures and incorporate time-inconsistency into the agent's preferences.Moreover, the previous discussion shows that this formulation encompasses timeinconsistent recursive utilities too.
Remark 1.1.In a Markovian framework, time-inconsistent agents whose reward functional is given by (1.2) have been considered in Wei, Yong, and Yu [89], Wang and Yong [83] and Hamaguchi [35].In these works, the dynamics of the controlled state process are given in strong formulation and, following the game-theoretic approach, they considered a refinement of the notion of equilibrium in [23] that was suitable to each of their settings.In this work, we use BSVIEs to model the agent's reward and extend the non-Markovian framework proposed in [43].
Let us now present this formulation properly.We define the set of admissible actions, recall A is compact, as and assume we are given jointly measurable mappings h : x, y, z, p, a) is uniformly Lipschitz-continuous i.e. there exists some C > 0 such that ∀(s, t, x, p, y, ỹ, z, z, a, ã),

Remark 1.3. Let us comment on the previous assumptions. The first condition guarantees we can identify units of utility with terminal contract payments. Indeed, the utility U
, the payment ξ.The second assumption guarantees sufficient regularity, with respect to the variable source of inconsistency, of the data prescribing the agent's reward.
We assume the agent has a reservation utility R 0 ∈ R below which he refuses to take the contract.The agent is hired at time t = 0, and the contracts C offered by the principal, for which she can only access the information about the state process X, are assumed to provide the agent with a flow of continuous payments and a compensation at the terminal time T .Thus, we denote by C 0 , see Section 3.1 for the definition of the integrability spaces, as the collection of contracts C = (π, ξ) ∈ Π × Ξ for the families If hired, the agent chooses an effort strategy α ∈ A, and at any time t ∈ [0, T ], his value, from time t onwards, from performing α is given by where the pair (Y α , Z α ) satisfies the BSVIE (1.2).We recall V A (C, α) is commonly referred to in the literature as the continuation utility.We always interpret Given the choice of reward, the problem of the agent is time-inconsistent.We therefore assume that the agent is a so-called sophisticated time-inconsistent agent who, aware of his inconsistency, can anticipate it, thus making his strategy time-consistent.Consequently, the problem of the agent can be interpreted as an intra-personal game in which he is trying to balance all of his preferences and searches for sub-game perfect Nash equilibria.We recall the definition of an equilibrium strategy introduced in [43], see further comments in Remark 1.6.Let {α ⋆ , α} ⊆ A, t ∈ [0, T ], and ℓ ∈ (0, We say α ⋆ is an equilibrium if for any ε > 0, ℓ ε > 0, where Given a contract C, we call E(C) the set of all equilibria associated with C.
As such, the agent's goal is, given a contract C that is guaranteed by the principal, to choose an effort that aligns with his sophisticated preferences, i.e. to find α ⋆ ∈ E(C).In contrast to the case of a classic time-consistent utility maximiser, for a time-inconsistent sophisticated agent, there could be more than one equilibria with potentially different rewards, see for instance [51].In this work, we will restrict our attention to the set of contracts inducing a unique equilibrium.See additional comments about this point in the following remark.All in all, for C ∈ C o we can now define Remark 1.6.(i) In the non-Markovian framework, the strategy devised in [43] builds upon the approach in [7] to study rewards given by conditional expectations of non-Markovian functionals.This approach is based on decoupling the sources of inconsistency in the agent's reward and requires introducing the terms ∂ s U A and ∇h into the analysis, see Appendix B for details.The integrability condition in the definition of Π × Ξ guarantees that the BSVIE (1.2) is well-defined.We also mention that Theorem B.3 generalises the extended dynamic programming principle obtained in [43] for the case of rewards given by (1.2) and equilibrium actions as in Definition 1.4.
(ii) The previous definition of equilibrium can be regarded as a reformulation of the classic definition, in [23], via the lim inf.Indeed, it follows from Definition 1.4 that given (iii) Lastly, we also expand on the necessity to focus our attention on contracts that lead to a unique equilibrium.The need for said restriction is inherent to contract theory models involving a game-theoretic formulation at the level of the agent.Indeed, in either the case of a finite number of competitive interacting agents seeking a Nash equilibrium, see Élie and Possamaï [25], or a continuum of players seeking a mean-field equilibrium, see Élie, Mastrolia, and Possamaï [26], it is generally possible for multiple equilibria to exist.In such cases, the existence of a Pareto-dominating equilibrium, one for which all agents receive no worse reward if deviating from a current equilibrium, is by no means guaranteed.In the context of contract theory, this means that there is no clear rule at the level of the problem of the agent to decide which equilibria should be taken for any two equilibria providing different values to different players.As giving control of this decision to the principal makes little practical sense, one way to bypass this is to focus on contracts that lead to a unique equilibrium, as we did here.
Anticipating our analysis in Section 3.1, we mention that this assumption is intimately related to the well-posedness of a fairly novel class of BSVIEs.In the Lipschitz setting of this paper, we present conditions on the data of the problem under which this is the case for any C ∈ C o , see Assumption 3.2 and Remark 3.3.As such, this is not such a stringent assumption in our context.

The principal's problem
We now present the principal's problem.We therefore let C ⊆ C o be the set of admissible contracts, defined by In such manner, any contract C ∈ C is implementable, that is, there exists an equilibrium strategy, namely α ⋆ ∈ E(C), for the agent's problem.
The principal has utility functionals, U P : Ω × R −→ R, and u p : [0, T ] × Ω × R × A −→ R and solves the problem Remark 1.7.We point out that we have assumed the principal is a standard utility maximiser.This is because, in our opinion, the crux of the problem lies in identifying a proper description of the problem of the principal when contracting a time-inconsistent sophisticated agent.In the case of a time-consistent agent, [18] identifies this description as a standard stochastic control problem with an additional state variable.Therefore, in the case of a classic time-consistent agent and a time-inconsistent principal, following [18], one expects the problem of the principal to boil down to a non-Markovian time-inconsistent control problem with an additional state variable.As studied in [43], these problems are characterised by an infinite family of BSDEs, analogue to the PDE system in [7] in the Markovian case.

The first-best problem
In the first-best, or risk-sharing, problem, the principal chooses both the effort and the contract for the agent, and she is simply required to satisfy the participation constraint.To provide appropriate characterisations of the solution to several examples, we will focus on a particular class of reward functionals for the agent.We recall that our goal is to study the second-best problem introduced in the previous section.As such, despite its inherent interest, the results in the current section serve mainly as a reference point for the general analysis we conduct in Section 3.Moreover, the following specification is covered by the general formulation presented in Section 1, see Remark 2.1, and it is yet rich enough to cover examples of both separable and non-separable utilities.We highlight that in the next two examples, we consider contracts consisting of only a terminal payment, i.e.C = ξ.
Let us assume the agent has a given increasing and concave utility function U o A : R −→ R and Borel-measurable discount functions g, and f defined on [0, T ], taking values in (0, +∞), with g(0) = f (0) = 1, which are assumed to be continuously differentiable with derivatives g ′ , and f ′ .Lastly, we have Borel-measurable functionals k and c, defined on [0, T ] × Ω × A and taking values in R + .
We then specify the agent's continuation utility by where Regarding the principal, we assume she has her own utility function U o P : R −→ R, which we assume to be concave and strictly increasing so that V P = sup where Γ : R n −→ R denotes a mechanism by which the principal collects the values of the n different coordinates of the state process X.
Regarding the principal, our specification corresponds to U P (x, x) = U o P (Γ(x T )−x).Let us mention that, to facilitate the resolution of the following examples, we assumed that U P depends only on the terminal value of x T .This allow us to use the dynamics of X as given in Section 1.1.We highlight this assumption is not necessary in general analysis for the second best problem we present in Section 3.
We now move on to characterise the solution to the first-best problem in the case of a time-inconsistent agent with both separable and non-separable utility functions.Anticipating the result, we highlight that in the first-best problem, the problem of the principal reduces to solving a standard stochastic control problem.

Non-separable utility
We recall that the CARA utility function, commonly known as the exponential utility, constitutes the stereotypical example of non-separable utility.We then consider (2.1) under the choice c = 0, We then have that The value of principal is thus obtained through the following constrained optimisation problem Note that, the concavity (resp.convexity) of both U o A and U o P (resp.a −→ k o t (x, a)) and the fact (A, C) is a convex set, imply that V P FB is a concave optimisation problem.The Lagrangian associated to this problem, where ρ ∈ R + denotes the multiplier of the participation constraint, is For convenience of the reader, we recall that the dual problem V P,d FB , which is an unconstrained control problem, is in general an upper bound of V P Fb and is defined by where we used the convention sup ∅ = −∞.As it is commonplace for convex problems, the next result exploits the absence of duality gap, i.e.V P FB = V P,d FB , to compute the value of V P FB .It uses the following notations .
Moreover, if α ⋆ is an optimal control for V cont , then an optimal contract is given by ξ ⋆ (ρ ⋆ , α ⋆ ).

Separable utility
We consider the case k = 0 and g = 1 in (2.1), and assume a −→ c(t, x, a) is convex for any (t, x) ∈ [0, T ] × Ω.The agent's reward from time t ∈ [0, T ] onwards is given by The value of principal is thus obtained through the following constrained optimisation problem The Lagrangian associated to this problem is A and U o P are such that mapping ξ ⋆ (x, ρ) given as the solution to Moreover, suppose the pair α ⋆ (ρ ⋆ ), ξ ⋆ (X T , ρ ⋆ ) is feasible for the primal problem, where α ⋆ (ρ) (resp.ρ ⋆ ) denote the maximiser in V cont (ρ) (resp.the above problem), which we assume to exist.Then, there is no duality gap, i.e.
the optimal contract is given by ξ ⋆ (X T , ρ ⋆ ). (ii Then, the problem of the principal is given by the solution to the standard control problem Moreover, for α ⋆ ∈ A an optimal control of this problem, Ĉ(α ⋆ ) contains all the optimal contracts for the principal, e.g. the deterministic contract Remark 2.4.We remark that the assumption on the utility functions in Proposition 2.3 is relatively reasonable.Indeed, it is immediately satisfied, for instance, in either of the following scenarii is concave, strictly increasing and satisfies the following conditions 3 The second-best problem: general scenario In this section, we bring back our attention to the second-best problem faced by the principal We will exploit the theory of type-I BSVIEs.Consequently, we first introduce suitable integrability spaces.

Integrability spaces and Hamiltonian
Following [42, Section 2.2], to carry out the analysis we introduce the spaces To make sense of the class of systems considered in this paper we introduce some extra spaces.
• Given a Banach space (I, • I ) of E-valued processes, we define For instance, S 2,2 denotes the space of Lastly, we introduce the space Remark 3.1.The second set of these spaces are suitable extensions of the classical ones, whose norms are tailormade to the analysis of the systems we will study.Some of these spaces have been previously considered in the literature on BSVIEs, e.g.[42] and [83].Of particular interest is the space H 2,2 which allows us to define a good candidate for (Z t t ) t∈[0,T ] as an element of H 2 , see [35].

Characterising equilibria and the BSDE system
Building upon the results in [43], where only the case of an agent with separable utility was considered, we wish to obtain a characterisation of the equilibria that are associated to any C ∈ C. For this we must introduce the Hamiltonian functional Our standing assumptions on H are the following.
Remark 3.3.Let us comment on the previous set of assumptions.Even in the non-Markovian setting of this document, the problem faced by a sophisticated agent is related to a system of equations instead of just one, see [43].This raises many issues, among which is the possibility of multiplicity of equilibria with different values.
where the processes Y (C), Z(C) come from the solution to the following infinite family of BSDEs which for any s ∈ [0, T ] satisfies, P-a.s. ) Moreover, we have that Given that Assumption 1.2 guarantees that x −→ U A (s, x) invertible for every s ∈ [0, T ], we also have that We recall that the diagonal process (Z t t ) t∈[0,T ] is well-defined for elements in H 2,2 , see Section 3.1.(ii) Links between time-inconsistent control problems and a broader class of BSVIEs have been identified in the past.The first mention of this link appears, as far as we know, in the concluding remarks of Wang and Yong [87].The link was then made rigorous independently by [43] and Wang and Yong [83].In our setting, in light of (3.2), such an equation appears as the one satisfied by the reward of the agent along the equilibrium.As such, the pair Y s t (C), Z s t (C) (s,t)∈[0,T ] 2 solves a so-called extended type-I BSVIE, which for any s ∈ [0, T ] satisfies We highlight that this BSVIE involves the diagonal processes Y t t (C), Z t t (C) (s,t)∈[0,T ] 2 and that in light of [42,Theorem 4.4] the solutions of (3.1) are in correspondence to those of (3.4).

The family of restricted contracts
In light of our previous observation, namely (3.3), we will introduce next a family of restricted terminal payments, which we will denote Ξ, and C will denote the associated class of contracts.For any contract in this family, we can solve the associated time-inconsistent control problem faced by the agent.Moreover, we will show that any admissible contract available to the principal admits a representation as a contract in C. Consequently, the principal's optimal expected utility is not reduced if she restricts herself to offer contracts in this family and optimises.
In order to define the family of restricted contracts, we introduce next the process Y y0,Z,π , which for a suitable process Z will represent the value of the agent.This is a preliminary step based on the observation, see (3.3), that the value of the agent at the terminal time T coincides with the payment offered by the contract.To alleviate the notation let us set With this, it is natural to consider the class of contracts C := Π × Ξ where Ξ denotes the set of terminal payments of the form U The main novelty of our argument, compared to that in the time-consistent case, is the fact that (3.6) imposes a constraint on the elements Z ∈ H 2,2 .
Remark 3.6.(i) We highlight H 2,2 is independent of the choice of π ∈ Π and that establishing H 2,2 = ∅ is inherently associated with the existence of solutions to (3.4).For results on type-I BSVIEs we refer to [42] and [86].
(ii) The process Y y0,Z,π denotes a solution to a so-called forward Volterra integral equation (FSVIE, for short).However, this is not a classic FSVIE in the sense that, in addition to Y s,y0,Z,π , the diagonal processes Y t,y0,Z,π t t∈[0,T ] appears in the generator.For completeness, Appendix C includes a suitable well-posedness result.
(iii) As mentioned at the beginning of this section, we chose to work with a representation for the agent's value as opposed to the value of the contract itself.This determines the form of the terminal payments in the definition of Ξ and provides a quite general and comprehensive approach.For instance, one could have chosen to represent the value of ξ directly for an agent with a time-inconsistent exponential utility.This would have produced a version of (3.1) whose generators have quadratic growth in Z and whose analysis is more delicate than in the Lipschitz case.See for instance, Wang, Sun, and Yong [84], Fan, Wan, and Yong [30], Hernández [41] for the study of quadratic BSVIEs.We recall that taking that approach in the time-consistent scenario requires, at the very least, assuming the contracts have exponential moments of sufficiently large order.Our approach prevents this given our growth assumptions in Assumption 1.2.However, one cannot expect to avoid such restrictions for problems that are inherently quadratic.
Remark 3.7.We would like to highlight the nature of the constraint (3.6).Indeed, for any ξ ∈ Ξ satisfying (3.6), it holds that ξ = U Moreover, we emphasise that this constraint is there due to time-inconsistency.Indeed, going back to the timeconsistent, i.e. exponential discounting, scenario presented in Section 1.2, it is not hard to see that e −ρs Y s,α t = e −ρu Y u,α t , P-a.s., for any (u, s, t) ∈ [0, T ] 3 .Thus, (3.6) as well as the stochastic target constraint (3.7) are automatically fulfilled in the time-consistent, exponential discounting, scenario.
In light of our previous remarks, as a preliminary step, we must verify that (3.5) uniquely defines Y y0,Z,π .At the formal level, the following auxiliary lemma says that the integrability conditions on the pair (π, Z) guarantees this.Lemma 3.8.Let Assumption 1.2 and Assumption 3.2 hold.Given (π, y 0 , Z) ∈ Π × I × H 2,2 there exist unique processes (Y y0,Z,π , ∂Y y0,Z,π ) ∈ S 2,2 × S 2,2 such that Y y0,Z,π satisfies (3.5) and ∂Y y0,Z,π satisfies Proof.Let us first argue the result for Y y0,Z,π .Note that the integrability of (π, Z) ∈ Π × H The result follows from Proposition C.5.The second part of the statement is a consequence of Proposition C.6 and the integrability of π ∈ Π.
We are now ready to state our main result, in words it guarantees that there is no loss of generality for the principal in offering contracts of the form given by Ξ.
This shows ξ ∈ Ξ.Let us argue C ∈ C o as in Definition 1.5, i.e. that C leads to a unique equilibrium.In light of Assumption 3.2, Theorem B.5 and Theorem B.6, it suffices to establish C leads to a solution of (3.1).Let us recall that by [42,Theorem 4.4], the solutions of (3.1) are in correspondence to those of (3.4).We now simply note that (3.10) defines a solution.Thus To conclude C ∈ C, note that by Theorem B.6, V A 0 ( C) = y 0 0 , so that y 0 0 ≥ R 0 guarantees the participation constraint is satisfied.
In view of Theorem 3.9, the problem of the principal involves controlling, via (π, Z) ∈ Π × H2,2 , the processes (X, Y y0,Z,π ).The dynamics of X are given, in weak formulation, by where , Z r r , π r dr is a P ⋆ (Z, π)-Brownian motion, and those of Y y0,Z,π are given by We highlight that on top of the Volterra nature of both the state process Y y0,Z,π and the control Z, the constraint (3.6) must be satisfied.However, building upon the discussion in Remark 3.7, we see that the problem of the principal corresponds to a stochastic target control problem of FSVIEs with Volterra controls.Indeed, the principal (i) controls the forward Volterra process (Y s,y0,Z,π t ) (s,t)∈[0,T ] 2 ; (ii) with Volterra-type controls (Z s t ) (s,t)∈[0,T ] 2 , recall both (Z t t ) t∈[0,T ] and (Z s t ) t∈[0,T ] impacts the dynamics; (iii) the state process Y y0,Z,π is subject to the stochastic target constraint (3.7).
The literature on controlled FSVIEs began, to the best of our knowledge, with Chen and Yong [14] where the authors studied the control of FSVIE by means of a stochastic maximum principle. 2A recent milestone in the study of this problem is Viens and Zhang [82] where, via a dynamic programming approach, the authors arrive at a path-dependent HJB equation.Nevertheless, in all of these works the control consists of an unconstrained stochastic process.Thus the approach [82] is inoperable as it does not cover (ii) nor (iii) above.
Regarding the study of stochastic target control problems, the seminal works are due to Soner and Touzi [74,75] where the state process is a controlled SDEs.We also remark on the recent extension to targets in the Wasserstein space by Bouchard, Djehiche, and Kharroubi [10], which shows the possibility of extending the original approach to infinite dimensional target problems like the one faced by the principal, namely (iii) above.Particularly important to our analysis are the results in Bouchard, Élie, and Imbert [9] on optimal control problems with stochastic target constraints.Indeed, this work elucidates the blueprint that needs to be extended to the Volterra case to be able to obtain (infinite-dimensional) HJB-type PDEs that characterise the problem of the principal.As the reader might be able to notice, in general, this seems to be quite a challenging task.Therefore, we will, for now, concentrate our attention on simpler cases where we can actually transfer the stochastic target constraint on Y y0,Z,π into a more manageable constraint on the controls Z directly.The general case will be the subject of future research and will be studied in a separate paper.
As a motivation for our approach in the following examples, we recall that: first, the flow of continuous payments (π t ) t∈[0,T ] enters the reduced problem of the principal as a standard control on the drift which raises no major challenges in the analysis, and thus we will omit it from the following examples and consider contracts consisting of only a terminal payment, i.e.C = ξ.Second, for classic separable utilities with exponential discounting it is known, see Remark 3.7, that the Volterra nature of the state process Y y0,Z,π becomes redundant.Indeed, in this scenario is sufficient to describe (Y t t ) t∈[0,T ] to characterise the entire family.This motivates the study of H 2,2 under particular specifications of utility functions for both the agent and the principal, hoping to be able to (a) reduce the complexity of the set H 2,2 ; (b) exploit its particular structure to formulate an ansatz to the problem of the principal.This is exactly what we do in the following sections.

The second-best problem: examples 4.1 Agent with discounted utility reward
As an initial example, let us consider the scenario in Section 2.1 under the additional choice g = 1, which implies K s,α t,T does not depend on s ∈ [0, T ].Thus, we have Under this specification, (3.1) reduces significantly.Indeed Remark 4.1.(i) We highlight that the absence of accumulative cost in the agent's reward functional, i.e. c = 0, together with the choice g = 1 makes the driver in the second family of BSDEs independent of the variable s, i.e. ∇h = h.Moreover, it coincides with the functional maximised in the Hamiltonian H.
(ii) We remark that in this scenario, the non-exponential discount factor, i.e. the time-inconsistent preferences, does not add much to the problem.Even though the agent's continuation utility changes by a factor, the optimal/equilibrium control state pair coincides for both problems.Our aim in presenting it is to illustrate how the technique presented in Section 3.3 is compatible with the results known in the case of a time-consistent agent.
The next result provides a drastic simplification of the infinite dimensional system introduced in Section 3.2.This is due to the particular form of the reward of the agent (4.1).

Lemma 4.2. (i)
Let ξ ∈ C and the agent's reward be given by (4.1).Then, (3.1) is equivalent to the BSDE , where H 2 denotes the family of Z ∈ H 2 satisfying Y y0,Z S 2 < ∞ where, for any Proof.It is immediate from (3.1) that, P-a.s. Thus and, for any All together, this shows that (3.1) reduces to the equation in the statement.The result then follows as we can trace back the argument and construct a solution to (3.1) starting from a solution to the BSDE in the statement.

Principal's second-best solution
In the following, we will exploit the so-called certainty equivalent, i.e. the relation ξ = U o A (−1) V A T (ξ, α) between the contract and the terminal value of the value function.The benefits of this are twofold: it lays down an expression that can be replaced directly into the principal's criterion, and it removes Y y0,Z from the generator of the expression representing the contract in exchange for a term which is quadratic in Z.For this we need to introduce some extra notation.
x, z) ∈ A is defined, as before, by the relation H t (x, z) = h t (x, z, â⋆ (t, x, z)), and λ ⋆ t (x, z), k o⋆ t (x, z) are also defined.
Proposition 4.3.The problem of the principal can be represented as the following standard control problem T is given by the terminal value of Proof.We first note that in light Lemma 4.2, we may replace the optimisation over H (ii) In a Markovian setting in which the dependence of the data on the path X is via the current value, we see from the controlled dynamics for X and Y y0,Z that the problem boils down to computing V .Employing the standard dynamic programming approach we obtain that the relevant term for this problem is given for where In the following proposition, whose proof is available in Appendix D, we study the case n = 1, so that This result is equivalent to solving the HJB equation in Remark 4.4.
Proposition 4.5.Let principal and agent have exponential utility with parameters γ P and γ A , respectively.Let ), and assume that (i) the maps σ, λ ⋆ and k o⋆ do not depend on the x variable; is an optimal solution to principal's second-best problem and Remark 4.6.To close this section we present a few remarks: (i) comparing the results in Proposition 4.5 and Proposition 2.2 we see that, as expected, in general the solution to the second-best and first-best problem are not equal; (ii) if we bring ourselves back to the setting of [45], i.e. b t (x, a) = a/σ, σ t (x) = σ, k o t (x, a) = ka 2 /2, we have This recovers the result for the case of a risk-neutral principal, i.e. γ P = 0, presented in [45].The optimal contract and the respective rewards differ by a factor which depends on the discount factor and agent's risk aversion parameter; (iii) following upon the previous comment, we add that the optimal contract takes the form of a Markovian rule.Moreover, it is linear.This is consistent with the seminal work of [45] and the conclusion of [12] in which the robustness of these policies was studied.Nevertheless, as we will see in Section 4.3, this appears to be a consequence of the simplicity of the source of time-inconsistency considered in this section.

Agent with separable utility
We consider the scenario in Section 2.2, i.e.
The mappings H t (x, z), a ⋆ (t, x, z), λ ⋆ t (x, z), c ⋆ t (x, z), and the probability In this section, we are trying to get a deeper understanding of the family H 2,2 under the previous specification of preferences for the agent.In particular, we want to understand how the elements of the family (Y s,Z , Z s ) s∈[0,T ] are related to each other.In light of Assumption 1.2 and Assumption 3.2, for any Z ∈ H 2,2 we denote M s,Z the (F, P ⋆ (Z))-square integrable martingale where We also recall that P ⋆ (Z) is the unique solution to the martingale problem for which X has characteristic triplet (λ ⋆ , σσ ⊤ , 0).Thus, the representation property holds for (F, P ⋆ (Z))-martingales (see [46,Theorem III.4.29]) and we can introduce the unique F-predictable process Z s,Z such that sup s∈[0,T ] E P ⋆ (Z) 3 and, in light of (3.11), The next lemma, proved in Appendix D, presents relationships satisfied by the family (Y s,Z , Z s ) s∈[0,T ] and how we can use them to obtain another characterisation of H 2,2 and Ξ. and Remark 4.8.(i) In the exponential discounting case, i.e. f (t) := e −ρt for some ρ > 0, we have Thus, δ ⋆ = 0 and the result of Lemma 4.7 simplifies to Therefore, this implies that in the non-exponential discounting case, the term is exactly the correction due to time-inconsistency.
(ii) We also remark that the choice Z 0 in the constraint for the family Z is arbitrary.Indeed, it could be replaced by any other element Z u of the family Z ∈ H 2,2 .

Principal's second best solution
Thanks to Lemma 4.7, we have now proved that Proposition 4.9.The problem of the principal can be represented as the following control problem where We remark that contrary to the example in Section 4.1, Proposition 4.9 reduces the problem of the principal to a non-standard control problem.Indeed, we have to optimise over H • , a family of infinite-dimensional controls which has to satisfy a novel type of constraint, namely (4.2).Nonetheless, under additional assumptions on the model, we can proceed with the resolution.
As in Section 4.1.1,we focus on the case n = 1 so that Proposition 4.10.Let n = 1, the principal and the agent be risk-neutral, i.e.
where for any Moreover, assume the mapping x) with linear growth.Then, V P = x 0 − R0 f (T ) + U 0 where the pair (U, V ) denotes a solution to the BSDE In addition, let and suppose • define a solution to the second-best problem and the optimal contract is given by (ii) Suppose the maps λ ⋆ and c ⋆ do not depend on the x variable and for any t Then, a solution (y ⋆ 0 , Z ⋆ ) ∈ [R 0 , ∞) × H • for the second-best problem is given by Moreover, the associated optimal contract is given by Proof.Let us show (i).As both agent and principal are risk neutral, we have An upper bound V(y 0 ) is obtained by ignoring (4.2).In such scenario, the mapping H in the statement denotes the Hamiltonian and by classical arguments in control, see El Karoui, Peng, and Quenez [24], its value is given by U 0 where (U, V ) are as in the statement.We are left to show this bound is attained.For this we must verify Z ⋆ ∈ H • .
On the one hand, note that the integrability of Z together with Assumption 1.2 guarantee Therefore, by [86,Theorem 3.5], there exists a unique solution On the other side, under the integrability assumption on Z we have that for every s ∈ [0, T ] defines a P ⋆ (Z)-square integrable martingale.Thus, there exists a family of process ( Z s ) s∈[0,T ] such that Moreover, in light of (1.2) we have that Z ∈ H 2,2 .Therefore, by uniqueness of the solution From this, arguing as in Lemma 4.7 we obtain that Z ⋆ satisfies (4.2).
We now argue (ii).Note that we can find an upper bound for V P .Indeed, we have We now show that the pair (y ⋆ 0 , Z ⋆ ) given in the statement is a feasible solution that attains V P,⋆ .To verify feasibility note that, by assumption, z ⋆ (•) is deterministic, and so is Z ⋆ .Thus, it is straightforward from the definition that Z ⋆ ∈ H • .Lastly, it follows by definition that under (y ⋆ 0 , Z ⋆ ) the upper bound V P,⋆ is attained.

Remark 4.11. (i) Let us now present a formal argument regarding our choice
x in the previous result for solving (4.3).Suppose for simplicity the maps σ, λ ⋆ and c ⋆ do not depend on the x variable so that the dynamics of the state variables are given by Moreover, suppose the value function v(t, x, y) is regular enough so that Itô's formula yields, P ⋆ (Z)-a.s.
Let us highlight the presence of both Z t t and Z 0 t in the last term.From this we can see, formally, that for general U o A and U o P the process (Z t t ) t∈[0,T ] alone is not sufficient to obtain the solution of (4.3).Moreover, recall we can not take Z t t and Z 0 t independently due to the constraint (4.2).Lastly, under the assumptions of Proposition 4.10 one expects, intuitively, that Remark 4.12.We close this section with a few remarks.
(i) It is worth mentioning that even in the setting of Proposition 2.3.(ii) the optimal contract is neither linear nor Markovian.Moreover, from the expression describing the optimal contract we see that this is entirely related to the presence of the discounting structure which is the source of time-inconsistency.
(ii) It follows from Proposition 2.3 that for risk-neutral preferences, the utility of the principal is the same for both the first-best and second-best problem and that the optimal second-best contract is also optimal there.This is a typical result for time-consistent risk-neutral agents, and it would certainly be worth studying whether this remains true for more general specifications of U o P and U o A .In light of Remark 4.11, this question further motivates the study of the general class of non-standard control problems introduced by Theorem 3.9.

Agent with utility of discounted income
We now consider the scenario in Section 2.1 under the additional choice f = 1.We then have In the context of (3.1), this corresponds to Remark 4.13.(i) The problem introduce by (4.4) is time-inconsistent even in the case of exponential discounting, i.e. g(t) = e −ρt , t ∈ [0, T ], for some ρ > 0. This is due to the exponential utility U A .Indeed, the BSDE representation allows us to interpret the reward of the agent as a recursive utility in which the terminal value is discounted at a rate e g(T −s) whereas the generator discounts at a rate g(t − s).It is known, see Marín-Solano and Navas [59, Section 4.5], that even in the case of exponential discounting the problem becomes time-inconsistent as soon as the rates at which the terminal value and the running reward are discounted differ.We also recall that the case of no discounting, i.e. g(t) = 1, corresponds to the seminal work Holmström and Milgrom [45].
(ii) Let us note that h exhibits both of the features of the examples in Sections 4.1 and 4.2, this is, the second term includes the discount factor and the y variable. 4We highlight that a key element in Proposition 4.10 was the fact that the dynamics of Y s,y0,Z were given by (y 0 , Z) without Y y0,Z on the right hand side.Consequently, the presence of y in h forces us to begin by changing variables to the certainty equivalent for the problem of the agent, i.e. from Z to Z as we denote below.In this way, we remove y 0 in the dynamics of Y y0,Z at the expense of the mapping δ ⋆ , which we use to identify an auxiliary martingale, becoming quadratic in the new variable Z.On the one hand, this creates a subtle issue when trying to establish a correspondence between the natural integrability of the variables Z and Z, and will ultimately prevent us from obtaining a complete characterisation of the family H 2,2 .On the other hand, the quadratic term does not correspond to the diagonal values of the control variable Ẑ.This makes the approach in Section 4.2, namely Proposition 4.10, inoperable and forces us to restrict ourselves to a suitable subclass that is amenable to the analysis.
As we may probably expect after our analysis in Section 4.1, the process Y y0,Z in the definition of H 2,2 becomes more amenable to the analysis by working in terms of the certainty equivalent.For this, we introduce, for The maps â⋆ (t, x, z), , and the probability P ⋆ (z) are defined accordingly.
Moreover, inspired by Section 4.2, we introduce the mapping δ ⋆ given, for The following result is analogue to Lemma 4.7, we defer its proof to Appendix D. (i) There exists family of processes ( Y s,y0,Z , Z s ) s∈[0,T ] such that for every s ∈ [0, T ] is a square integrable (F, P ⋆ (Z))-martingale, then where Z s,Z denotes the term in the representation of M s,Z .
Remark 4.15.(i) We highlight that in contrast to the analysis presented in Sections 4.1 and 4.2, the previous result does not provide an equivalent representation of the set H 2,2 .This is intimately related to the square integrability condition on the process M s,Z required in (iii) above, and the fact that (z, z) (ii) As a sanity check at this point, let us verify the coherence of the previous system in terms of the analysis of the previous section.In the following we omit the dependence on X and assume ), we have that for any s ∈ [0, T ], P-a.s.
− −−− → y, we see the previous equation induces the corresponding one in Lemma 4.7.

Principal's second-best solution
Let us highlight that in contrast to Section 4.2, the analysis in the previous section does not provide a full characterisation of H 2,2 for rewards given by (2.4).This is principally due to the integrability necessary on the variable Z, induced by the certainty equivalent, in order to apply the methodology devised in 4.2, see Lemma 4.14.Nevertheless, given that the current example generalises the previous two, we build upon the structure of those optimal solutions to propose a family over which the optimisation in the problem of the principal can be carried out.
We will focus on the case n = 1 and we will pay special attention to the class H ⊆ H 2,2 of processes Z ∈ H 2,2 for which given the pair (y 0 , Z) ∈ I 0 × H 2,2 , and Y y0,Z given by (3.5), there exists a pair of predictable processes (η, ζ) such that Therefore, we have from Theorem 3.9 and (4.2) that We remark that the previous definition implicitly requires that for any t ∈ [0, T ] the mapping s −→ η s,t is differentiable.
(ii) In addition, provided k o⋆ (x, z) does not depend on x it is easy to verify that that H = ∅.In light of the previous lemma, we have that for Z ∈ H and This implies that H includes, in particular, all the processes Z that are induced by deterministic pairs (ζ, η).Indeed, for such class of processes we have that M s,Z is deterministic, Z s,Z = 0, and consequently, η s,t = g(T − s)/g(T − t) provides a non trivial element of H.The previous argument also holds in the case of exponential discounting, in which we recall that the agent's problem remains time-inconsistent.
The following result characterises the solution to V P .Its proof is available in Appendix D.
Proposition 4.17.Let principal and agent have exponential utility with parameters γ P and γ A , respectively.Let , and assume that: (i) the maps σ, λ ⋆ and k o⋆ do not depend on the x variable; (ii) for any (t, η) ∈ [0, T ] × R, the map G : R −→ R given by Then

Moreover
(i) let V P,o denote the restriction of V P to the subclass of H with deterministic η.Then the optimal deterministic contract is given by the family and (ii) in the case (γ A , γ P ) = (0, 0), i.e. the case of risk-neutral principal and agent, the solution to V P , and consequently of V P , agrees with the value given by Proposition 4.10 and the optimal family Z is deterministic.
Remark 4.18.We close this section with a few remarks.
(i) The solution to the problem of the principal for the general class of restricted contracts induced by H 2,2 escaped the analysis presented above.As detailed in Remark 4.13.(ii), this is due to subtle integrability issues when trying to identify an appropriate reduction of H 2,2 , and the quadratic nature of the generator when working in term of the certainty equivalent.We believe this echoes the intricacies of the non-standard class of control problem introduced in Theorem 3.9.
We highlight that: (a) in contrast to [45], for any type of discounting structure (including exponential discounting) the previous expression and consequently the optimal action is neither linear nor Markovian.This corroborates our comment in Remark 4.13.(i), in the sense that even in the case of exponential discounting the problem of the agent remains time-inconsistent; (b) in the case of no discounting, i.e. g = 1, when we bring ourselves back to the model of Remark 4.6, the previous expression coincides with the linear contract result specified by [45].This shows that, even if possibly not the best, the optimal contract in the class H at least captures the optimal contract when the problem becomes time-consistent again.

Appendix A Proofs of Section 2
Proof of Proposition 2.2.Let (ρ, α) ∈ R + × A be fixed and optimise the mapping C ∋ ξ −→ L(α, ξ, ρ) ∈ R.An upper bound of this problem is given by optimising x-by-x.This leads us to define, for any (α, ρ) ∈ A × R + fixed, the candidate To show the upper bound induced by ξ ⋆ (ρ, α) is attained it suffices to note that ξ ⋆ (ρ, α) ∈ C by assumption.
Replacing in L(ρ, α, ξ) we obtain as the above function is a strictly convex function of ρ, first order conditions gives ρ ⋆ as in the statement.
We are only left to show that V P FB = V P,d FB , i.e. that there is no duality gap.For this, it suffices to verify that (ξ ⋆ (ρ ⋆ , α ⋆ ), α ⋆ ) is primal feasible, i.e. that it satisfy the participation constraint.Indeed Proof of Proposition 2.3.We argue (i).Let (ρ, α) ∈ R + × A be fixed and optimise the mapping C ∋ ξ −→ L(α, ξ, ρ) ∈ R.An upper bound of this problem is given by optimising x-by-x.This defines the mapping ξ ⋆ (x, ρ).
As before, the fact that ξ ⋆ (ρ, α) ∈ C guarantees the upper bound is indeed attained.Replacing in L(ρ, α, ξ) we obtain V cont (ρ) and the corresponding equality for V P,d FB .Now, to obtain the absence of duality gap we must verify that there exists a solution to the dual problem that is primal feasible.This is exactly the additional assumption in the statement.
We now consider (ii).In this case, we can solve V P FB directly.In light of U P (x) = U A (x) = x, Let us note that for fixed α ∈ A the principal's reward is linear and strictly decreasing in E P α [ξ] and therefore she is indifferent between contracts that have the same expectation.Therefore, she optimises over the feasible contracts that have the same expectation.Now, for fixed α ∈ A any feasibility contract satisfies Therefore, our previous comment implies that for given α the principal is indifferent between contracts in Ĉ(α).
Note that Ĉ(α) = ∅.Indeed, take the deterministic contract ξ ⋆ (α) : Plugging this back into the principal's utility, we get the expression for V P FB in the statement.
Under Assumption B.1, [42, Lemma 6.1] guarantees that for any α ∈ A there exists which ultimately implies the absolute continuity of the mapping We begin stating the following auxiliary result.
It is then also clear, again from (B.4), that this property is preserved at every iteration and thus in the limit.
Step 1: From the definition of equilibria we have that for any α ∈ A Recall that for any ρ ∈ T 0,T , Y In light of the arbitrariness of α ∈ A we obtain Step 2: Let us note that in light of Step 1 where the second equality holds in light of [66, Lemma 3.5] as [66, Assumption 1.1] holds under Assumption 1.2.
Iterating the previous argument we obtain that P-a.s.
Step 3: Let i ∈ {0, . . ., n ℓ − 1}.In light of Assumption 1.2, the stability of the system of BSDE defined by (B.1) and (B.2), see [42,Proposition 6.4], yields there exists a constant C > 0 such that By choosing an appropriate partition Π ℓ and applying the dominated convergence theorem we obtain that Back in (B.5) we obtain Lastly, we show that the equality is attained by α ⋆ ∈ E. Indeed, note that (B.3) implies Let us recall that the Hamiltonian associated to h is given by Our standing assumptions on H are the following.
there is C > 0 such that for any (t, x, p, y, ỹ, z, z) With this we introduce the system defined for any s ∈ [0, T ] by We will say (Y, Z, Y, Z, ∂Y, ∂Z) ∈ H is a solution to the system whenever (H) is satisfied.In light of Theorem B.3, given α ⋆ ∈ E it is reasonable to associate the value along the equilibria with a BSDE whose generator is given, partially, by H.This is the purpose of the next result.

Theorem B.5 (Necessity
We now note that under Assumption B.
Moreover, the second part of the statement of Theorem B.3 implies that α Consequently, (Y α ⋆ , Z α ⋆ ) and (∂Y α ⋆ , ∂Z α ⋆ ) define a solution to the second and third equations in (H), respectively.
We close this section with a verification theorem for equilibria.

Theorem B.6 (Verification
Proof.We verify the definition of an equilibria.
It then follows that Let Y be a solution to (C.1), there exists a constant C > 0 such that Moreover, for Y i solution to (C.1) with data (Y i 0 , h i , Z i ) satisfying (C.1) for i ∈ {1, 2} there exists C > 0 such that Proof.Let us observe that the continuity of the application s −→ ∆Y s S 2 implies With this, the proof of both statements can obtained following the line of [93, Theorem 3.2.2 and Theorem 3. We note that Y n ∈ S 2,2 for n ≥ 0. Indeed, the result holds for Y 0 and the process (Y t,0 t ) t∈[0,T ] ∈ L 1,2 is welldefined.Inductively, in light of Assumption C.1, the fact that Z ∈ H We now establish a result regarding the differentiability of (C.1).Recall that for Z ∈ H We first note that, P-a.s.where the upper bound is given by Let us now show the upper bound is attained.Indeed, letting it is easy to verify that the integrability assumption on z ⋆ guarantees that Z ⋆ ∈ H 2 .We conclude that ξ ⋆ ∈ Ξ, where ξ ⋆ denotes the contract induced by R 0 and Z ⋆ , is optimal as it attains V P,⋆ .this concludes the proof.We now verify Y y0,Z ∈ S < ∞.
We also note that the continuity of s −→ f (t − s) implies the continuity of s −→ N s,Z S 2 .Moreover, Y 0,y0,Z 2 S 2 < ∞ guarantees, by definition, that Y y0,Z 2 where the third inequality follows from the fact Z ∈ H • .We conclude Z ∈ H 2,2 .

D.3 Proof of Lemma 4.14
Let us note that (ii) and (iii) are argued as in Lemma 4.7.We now argue (i).Let (y 0 , Z) ∈ I × H 2,2 .Note that given Y s,y0,Z , in light of the regularity of y s 0 and the generator, it is possible to define ∂Y s,y0,Z such that

Definition 1 . 5 .
C o denotes the family of contracts C ∈ C 0 that lead to a unique equilibrium, i.e.E(C) = {α ⋆ }.

Remark 2 . 1 .
(i) As commented above, the previous type of rewards are covered by the formulation via BSVIEs (1.2) and satisfy Assumption 1.2.It corresponds to the choice

Proposition C. 5 .
2.4].We are now ready to establish the well-posedness of (C.1).Let Assumption C.1 hold.There is a unique solution to (C.1).Proof.Uniqueness follows from Lemma C.4.We use a Picard iteration argument.Let Y s,0• = Y s 0 , s ∈ [0, T ] and Y s,n+1 t = Y s 0 + t 0 h r s, X, Y s,n r , Y r,n r dr + t 0 Z s r • dX r , t ∈ [0, T ]
Under this set of assumptions, we are able to show, see Appendix B, that for any C ∈ C 2,2with H 2 .Let Z ∈ H 2 .The result then follows from Lemma 4.2 by applying Itô's formula to U Let us highlight the main message behind Proposition 4.3.When the agent's reward is given by (4.1), the principal's second-best problem reduces to a standard control problem.This is a drastic simplification of the result in Theorem 3.9 and a consequence of the particular form of the agent's reward.
defined by ∇h t (s, x, u, v, y, z, p, a) := ∂ s h t (s, x, y, z, p, a) + ∂ y h t (s, x, y, z, p, a)u + The previous definitions hold x-by-x.
Let Assumption C.1 hold and Y ∈ S 2 be the solution to (C.1).There is a unique process ∂Y ∈ S 2,2 that satisfies • dX r , t ∈ [0, T ], P-a.s., s ∈ [0, T ].(C.4)∂Y u du = Y s − Y 0 , in S 2 (R)Proof.Note that given the pair (Y, Z) ∈ S 2,2 × H 2,2 , Assumption C.1 guarantees there is C > 0 such that We now note that Assumption C.1.(ii)guarantees u −→ ∇h t (s, x, u, y, u) is Lipschitz uniformly in (s, t, x, y, u).Therefore, Proposition C.5 guarantees there is a unique solution ∂Y ∈ S 2,2 .The second part of the statement, follows arguing as in [42, Lemma 6.1] in light of the stability result in Lemma C.4 and the fact Z ∈ H 2,2 .