Can starving start-ups beat fat labs? A bandit model of innovation with endogenous financing constraint

Is there any such thing as too much capital when it comes to the ﬁnancing of innovative projects? We study a principal–agent model in which the principal chooses the scale of the experiment, and the agent privately observes the outcome realizations and can privately choose the novelty of the project. When the agent has private access to a safe but non-innovative project, the principal starves the agent of funds to incentivize risk-taking. The principal quickly scales up after early successes, and can tolerate early failures. If the principal is equally informed about the outcome, then the agent is well-resourced, resembling a large research and development department.


I. Introduction
Is there any such thing as too much capital when it comes to the financing of innovative projects? Why do small start-ups make most breakthroughs, even though giant enterprises supply most research and development (R&D) spending (Baumol, 2010)? Anecdotal evidence suggests that more financial resources do not necessarily lead to more and better innovation. It is surprising how many projects fail, despite pursuing ideas that are eventually successful, with well-resourced and competent teams. Nokia, for example, had always been an adaptive company and hardly a technological laggard: its engineers built a prototype of an internet-enabled phone with a touch-screen at the end of the 1990s. 1 Nokia spent in R&D almost four times what Apple spent, 2 but saw its market share fall from 40 percent in 2007 to 11 percent in 2014. Similarly, because of hefty investment in research, Kodak developed one of the first digital cameras in 1975, and launched a photo-sharing web site in 2001. 3 However, in January 2012, the company filed for bankruptcy. Conversely, many successful innovative firms have started small, with limited resources, but enjoyed high growth rates after a few years. Airbnb, Dropbox, and Reddit are among the 1,000 startups funded by Y Combinator, an American seed accelerator: Sam Altman, the president, considers frugality "incredibly important for start-ups" 4 and Jessica Livingston, a co-founder, underlines that "you don't want to give the founders more than they need to survive". 5 Fred Wilson, a venture capitalist with early investments in Twitter and Tumblr, writes on his blog that "less money raised leads to more success". 6 According to Mr Altman, this is due to the "focusing effect of limited resources" (emphasis added; see the article cited in footnote 4), and Ms Livingston adds that "being lean forces you to focus" (emphasis added; see the article cited in footnote 5). In this paper, we are interested in understanding the mechanisms through which frugality can help the innovation process. When does an innovative project benefit from a lack of resources?
We offer a two-period principal-agent model of innovation investments, in which a representative investor can use the scale of an experiment to incentivize an entrepreneur, with limited liability, to be more innovative, and to learn the business potential of a new project. To study how to incentivize the agent to focus on innovation, while emphasizing that the principal might lack technical competence, we assume that the agent can privately choose between exploring a new technology that involves a small 1 See the article by James Surowiecki published in The New Yorker on 3 September 2013, "Where Nokia went wrong" (available online at http://www.newyorker.com/business/currency/ where-nokia-went-wrong). 2 See the article by Anton Troianovski and Sven Grundberg published in The Wall Street Journal on 18 July 2012, "Nokia's bad call on smartphones" (available online at http://www.wsj.com/ articles/SB10001424052702304388004577531002591315494). 3 See the article "The last Kodak moment?" published by The Economist on 14 January 2012 (available online at http://www.economist.com/node/21542796). 4 See the article by Eugen Kim published in Business Insider UK on 24 March 2015, "A warning to startups from the head of Silicon Valley's most important startup factory" (available online at http://uk.businessinsider.com/sam-altman-raising-too-much-money-early-is-bad-2015-3). 5 See the article by Tad Friend published in The New Yorker on 10 October 2016, "Sam Altman's manifest destiny. Is the head of Y Combinator fixing the world, or trying to take over Silicon Valley?" (available online at http://www.newyorker.com/magazine/2016/10/10/ sam-altmans-manifest-destiny). 6 See Fred Wilson's blog post of 18 September 2013, "Maximizing runway can minimize success" (available online at http://avc.com/2013/09/maximizing-runway-can-minimize-success/). chance of a breakthrough and a high risk of failure, and exploiting an old technology with a predictable outcome. To study the trade-off between increasing the scale of the project to reap higher profits, and starving the agent for incentive reasons, we assume that expected revenues increase in the amount of capital invested. Finally, because entrepreneurs usually have a better idea than investors of whether a new technology works, or whether a new product answers the customers' needs, we assume that the outcome realizations are not directly observable by the principal. We focus on three cases: (i) the full-information case, where the informed principal observes both the outcome and the technology used by the agent; (ii) the relationship-based financing case, where the hands-on principal has the technical competence to distinguish between different technologies but cannot observe the outcome realizations; (iii) the arm's length financing case, where the hands-off principal can observe neither the type of technology nor the outcome realizations.
We find that innovation is best incentivized by starvation contracts (i.e., contracts that entail a small scale initially but potentially high growth later on) when the agent is better informed than the principal about outcome realizations, and, at the same time, the agent has private access to a safe but non-innovative project. Indeed, providing the agent with less capital (a) minimizes the agent's incentives to embezzle revenues, and (b) incentivizes risk-taking. A hands-on investor only worries about motive (a): no matter the technology used, truthful reporting on the outcome realizations is incentivized by using a combination of financing constraints, punishments for failure, and rewards for success. Only a hands-off investor financing an innovative agent needs to consider motive (b) as well: because the predictable old technology is initially more likely to succeed, starving an innovative agent makes a deviation to the conventional work method less appealing.
We also find that starvation contracts can be welfare improving. As a measure of social welfare, we consider the total expected payoff of the match between the investor and the entrepreneur. By comparing the optimal innovation contract under arm's length and relationship-based financing, the following differences can be seen. On the one hand, an innovative entrepreneur faces initially stronger credit constraints under arm's length financing than under relationship-based financing, and this decreases social welfare. On the other hand, to prevent the agent from resorting to the safer project, the hands-off principal needs to tolerate early failures and reward long-term successes, as new technologies are more volatile and might take longer to become successful. As a consequence, entrepreneurs engaging in riskier projects retain access to capital even after a failure in the first period, whereas they would have been terminated under a handson principal; additionally, successful innovative ventures are significantly scaled up at the beginning of the second period. Both of these effects increase social surplus. We show that there exists a parameter space for which the positive effects are greater than the negative effect. Therefore, expected welfare can increase in the degree of informational asymmetry between entrepreneurs and financiers.
We derive some implications of our model for firm dynamics. The relationship between a firm's innovative activities and its growth rate is not straightforward in the empirical literature. Our model suggests that such a relationship is influenced by the type of financing the firm has access to, or equivalently by the degree of informational asymmetry between the principal and the agent, and by the riskiness of the project. In particular, we find that innovative firms start smaller and grow faster than conventional firms only under arm's length financing, and more significantly so for riskier projects. As far as we know, this finding has not been tested empirically yet.
Our results show that the best way to incentivize innovation varies across different economic environments. When the principal is equally (or better) informed about the outcome of the experiment, the agent is well resourced, resembling a large R&D department. Conversely, lean, fast-growing startups are optimal when the researchers can spend their time on safe but non-innovative alternatives and when they have a better idea than investors about whether a new technology answers the customers' needs. Thus, our model suggests a possible explanation for why Nokia failed to develop a modern mobile OS and subsequently lost the smartphone battle with Apple. "At Apple the top managers are engineers, [whereas] there was no real software competence in the top management team [at Nokia]". 7 Analogously for the demise of Kodak, there was a separation between research, finance, and business functions, each reporting up through their own hierarchies: the research ranks at Kodak "used to be a closed society, where some researchers kept their records in locked safes", and "researchers . . . never used to see [the business units] at all". 8 Both the "lack of technical competence among top managers [at Nokia]" (emphasis added; see the article cited in footnote 7), and the separation between researchers and business managers at Kodak meant that managers could not evaluate whether the projects being pursued were really aimed at long-term success, and thus hefty investments failed to give proper incentives to the research teams. 9 Perhaps scarcer resources, coupled with this hands-off management, or a large budget under the supervision of tech-savvy managers, would have worked better. Conversely, Apple rightly chose to develop the iPhone and iOS in-house, pairing a large budget for its R&D department with the technical competencies of its managers, but, for example, decided to buy an intelligent personal assistant software from the start-up Siri Incorporated.
The remainder of the article is organized as follows. Section II puts our model in the context of the relevant theoretical literature, and our theoretical predictions in the context of the empirical literature. Section III outlines the technical details of the model, and provides the full-information case. Section IV derives optimal arrangements under the relationship-based financing case. Section V derives the optimal contracts offered by a handsoff principal. Section VI uses numerical simulations to derive implications for firm dynamics and social welfare. Finally, Section VII concludes.

II. Related Literature
In this section, we first discuss how our model fits in the related theoretical literature on innovation investments, then we describe how our theoretical results find support in the empirical literature.
Many recent theoretical papers have studied the incentives for experimentation using repeated principal-agent models. In Hege (1998, 2005), Hörner andSamuelson (2013), andHalac et al. (2016), both sides learn about the unknown quality of the project, but the agent can distort the principal's perception of profitability by shirking or privately reducing the amount invested. 10,11 In these papers, parties wait for the 9 However, our model does not explain why agents were given the wrong incentives in the first place. 10 Bergemann and Hege (2005) also compare the provision of funds under arm's length (unobservable investments) and relationship-based financing (observable investments), and find that the funding volume is always higher under the former. In their model though, the benefit from increased informational asymmetry comes from the lack of commitment: when investment is unobservable, the principal becomes more pessimistic after a deviation and is thus able to commit to a finite stopping time, reducing the agent's ability to renegotiate at favourable terms and thus her incentives to delay investments into the project through deviations. Similar mechanisms appear in Crémer (1995) and Hörner and Samuelson (2013). 11 In Halac et al. (2016), the principal-agent model involves both adverse selection on agent's ability and dynamic moral hazard. Related are also the experimentation frameworks of Drugov and Macchiavello (2014), with both adverse selection and moral hazard, Bouvard (2014), who only consider adverse selection on the quality of the agent, Gomes et al. (2016), who introduce two-dimensional adverse selection, and Bobtcheff and Levy (2017), where the agent has private information on the quality of learning. Outside of a pure experimentation framework, see also Gerardi and Maestri (2012), who study dynamic incentives for information acquisition under arrival of a single success, and the agent has only one work method to produce it; moreover, the moral hazard is on the agent's actions (effort or investment). Conversely, in our model, a success is valuable but does not end the relationship and can be produced through different methods; also, we focus on moral hazard on the method pursued. 12 As a result, the terms of our optimal contracts vary depending on the method being incentivized.
More generally, private information on the method pursued allows us to focus on the tension between exploration and exploitation, similarly to Manso (2011). 13 In his two-period model, which is the closest to ours, the agent, uncertain about the true distribution of payoffs from the available actions, can choose between either shirking, using a new untested action, or a well-known process. An agency problem arises because the three actions are associated with different private levels of effort. The optimal contract can reward early failures, and compensation depends not only on total performance but also on its path.
Whereas the model of Manso (2011) and related models apply better to mature firms, our focus is on understanding which contracts are better suited to small independent start-ups. The main differences are that start-ups are characterized by minimal resources and hands-off management by investors. In Manso (2011), the size of the project is exogenously assumed away, and agents might be uninformed about their performance. In contrast, in our model, the scale of the project is endogenously chosen by the principal, and the termination decision is made partly based on information that the agent provides to the investors. 14 As a consequence, Manso-style contracts (i.e., well-resourced firms) are optimal when the principal and the agent both moral hazard and adverse selection, and DeMarzo and Sannikov (2017), who study private learning under moral hazard. None of these papers considers moral hazard on the choice between exploration and exploitation, which is one of the main ingredients of our model. 12 In Section V, we consider a dynamic two-dimensional moral hazard problem as the agent has private information on both the outcomes and the method pursued. The cash-flow diversion model we employ follows from Bolton and Scharfstein (1990) and Clementi and Hopenhayn (2006), but, more generally, our paper is also related to the fast-growing dynamic financial contracting literature, that uses the optimal contract framework to analyze and characterize theoretically the contractual relationship between firms and investors under a variety of agency problems. The main characteristic that we share with (most of) this literature is the fact that limited liability protects the agent, and thus inefficient downsizing and termination arise for incentive reasons. 13 Ederer (2016) extends this framework to a multi-agent setting and Klein (2016) studies a related model in continuous-time. 14 Drugov and Macchiavello (2014) consider the case where the firm can "start small", without this affecting the informativeness of the experimentation. Because the amount invested is smaller, this can be useful to avoid the embezzlement of funds, similarly to our Section IV. However, the size of the experiment is exogenous in their framework. Moreover, their agent has only one method to produce a success, whereas two methods are available in our model. As a consequence, in our model, the principal reduces (increases) the scale of the project to elicit the riskier (safer) option.
are equally informed about the outcome realizations, but resource-starved start-ups are optimal when the agent is better informed.
Many of our theoretical results find support in the empirical literature. First, evidence that the combination of tolerance for early failure and reward for long-term success is effective in motivating innovation is provided by Azoulay et al. (2011) and Tian and Wang (2014), using naturally occurring data, and Ederer and Manso (2013), by exogenously varying compensation schemes in an experimental setting. In the theoretical literature, similar results are reached by Holmström (1989) and Manso (2011). 15 Secondly, Atanassov (2016) finds that firms with a greater proportion of arm's length financing have more and better (i.e., more cited) patents, perhaps because this allows greater flexibility and tolerance to experimentation than relationship-based bank financing. Thirdly, while there is no general consensus, many empirical articles support the so-called "less money, better innovation" argument (Hall et al., 2016); that is, financing constraints can have a disciplinary effect on innovative investments, by limiting the moral hazard problem and forcing firms to focus on more productive and valueenhancing innovation. Almeida et al. (2013) find that firms that are more likely to be constrained generate more patents and citations per unit of R&D investment and per employee; in Li (2011), a positive relation between R&D investment and stock returns exists only among financially constrained firms. Dryden et al. (1997), Nickell and Nicolitsas (1999), and Musso and Schiavo (2008) show that financing constraints are positively related with productivity growth. 16 Keupp and Gassmann (2013) provide partial support for the hypothesis that financial constraints spur radical innovations. Schäfer et al. (2015) find that family businesses are more likely to be constrained but have the same level of innovation outcomes as non-family firms, perhaps due to more efficient resource utilization. Lahr and Mina (2018) find that 15 This is consistent with headlines such as "Fail to Succeed" (see the article by Matt Cowan published in Wired UK on 25 April 2011, available online at http://www.wired.co.uk/magazine/ archive/2011/05/features/fail-to-succeed), "Fail Often, Fail Well" (see the article published by The Economist on 14 April 2011, available online at http://www.economist.com/node/18557776), and "Google's Greatest Strength May Be the Luxury of Failure" (see the article by Steve Rosenbush published in The Wall Street Journal on 17 May 2013, available online at http://blogs. wsj.com/cio/2013/05/17/googles-greatest-strength-may-be-the-luxury-of-failure/). Indeed, the tolerance of failure has become a dominant theme in Silicon Valley, where, for example, FailCon, a conference about embracing failure, was launched in 2009. Whilst this has been an annual event for several years, it was recently cancelled because "failure chatter is now so pervasive in Silicon Valley that a conference almost seems superfluous" (see the article "Wearing your failures on your sleeve" by Claire Martin published in The New York Times in 2014, available online at http:// www.nytimes.com/2014/11/09/business/wearing-your-failures-on-your-sleeve.html? r=0). 16 In the theoretical literature, a similar argument is advanced by Jensen (1986) and Aghion et al. (1999). while innovation activities seem to cause financial constraint, the reverse effect appears negligible.
Finally, we find that the relationship between innovation investments and firm growth depends on the type of financing the firm has access to, or, equivalently, on the degree of informational asymmetry between the financiers and the entrepreneur. As far as we know, this theoretical prediction has not been tested empirically. The applied literature suggests that the relationship between innovation and firm growth is usually positive but not straightforward (Coad, 2009). For example, Coad and Rao (2008) note that the relationship between innovative activities and firm growth is positive among the fastest growing firms, but it can be negative for others; Demirel and Mazzucato (2012) find that R&D boosts growth only for a subset of small pharmaceutical firms; Segarra and Teruel (2014) and Mazzucato and Parris (2015) argue that the effect of R&D on firm growth differs between industries and competitive environments.

III. The Model
An entrepreneur (the agent) has the ability to operate risky projects but has no wealth of her own. A representative investor (the principal) has the necessary deep pockets but lacks the entrepreneurial ability. Together they form a firm to run the projects. Both the entrepreneur and the investor are risk neutral and discount future cash flows using the same discount factor, normalized to 1. Both are able to commit to a long-term contract, in the sense that if they both sign the contract, they abide by it in every circumstance. We assume that the entrepreneur has limited liability and a reservation payoff equal to zero, so that she will never voluntarily quit.
There are two periods and two possible outcomes in each period. At the beginning of each period, the principal provides a certain amount of working capital. 17 The collaboration terminates at the end of the second period. Once the firm is formed, the agent can decide whether to rely on a well-known project C (for conventional) or to explore a new technology N (for novel or new) whose probability of success is unknown. 18 Both methods yield revenues that are subject to shocks and that increase with the amount of working capital invested. In particular, the well-known project is successful with probability π C and fails with the remaining probability 1 − π C , yielding zero revenues. If successful, the outcome of the project 17 The results do not qualitatively change if, additionally, a fixed initial investment is needed (e.g., to acquire an enabling asset). 18 In Online Appendix B, we introduce a private cost incurred by the agent when employing the novel approach. This complicates the analysis but does not qualitatively change the conclusions. is positive and is given by R(k t ) = Ak α t , where A > 0, α ∈ (0, 1), and k t represents the amount of working capital invested in the project in period t (its scale). 19 Similarly, the new work method yields either R(k t ) or 0. However, following Manso (2011), we assume that the novel work method has an exploratory nature: its probability of success π N is unknown. Moreover, when the agent starts experimenting with this new project, she is less likely to succeed than when she relies on the well-known project. Nevertheless, if the experimentation leads to a success, the new method becomes perceived as better than the conventional work method. 20 This is formalized as follows.
Here, E[π i ] and E[π i |s, i] denote, respectively, the unconditional expectation of π i and the conditional expectation of π i given outcome s = {H, L} on action i = {C, N }. We indicate success with H (for high outcome) and failure with L (for low). Moreover, we assume that the probability of success of the conventional project is known.
Note that using project i only gives information on the probability of success of the same project, π i , that is, Following Manso (2011), we define an action plan as i j z , where i is the first-period action, j is the second-period action in the case of success, and z is the second-period action in the case of failure. The total expected payoff of the match between investor and entrepreneur by following action 19 Many of our theoretical results are proven for a general function R that is continuous, differentiable, strictly concave, strictly increasing, and uniformly bounded from above (see Online Appendix A.2). The parameter values used to produce the figures in the main paper are π C = 0.4 and α = 0.33, and A is chosen such that the first-best scale of the firm when using the conventional project is k F B = 10,000. The qualitative results do not hinge on this specific parametrization. Simulations are run using numpy (van der Walt et al., 2011); graphs are drawn in matplotlib (Hunter, 2007). We used Python 2.7. 20 Even if the unconditional probability of success is expected to be lower under the novel project, exploration might be valuable because it allows entrepreneur and investor to acquire information about the probability of success in the second period. Obviously, if E[π N ] ≥ π C , then the conventional project would never be pursued in the first period (but it might be pursued after a failure on the novel project). plan i j z is given by where k 1 i j z represents the scale of the project in period 1 under the action plan i j z , and k H i j z and k L i j z are the scales of the project in period 2 contingent on success or failure, respectively. Hereafter, we refer to W i j z as a measure of social welfare. We consider only two action plans, C C C and N N C , which are usually defined as "exploitation" and "exploration" in the bandit problem literature. 21 Exploitation refers to the repetition of the well-known project C, while exploration consists of trying the new project N in the first period, sticking to it in the case of success, but returning to the conventional project in the case of failure.
For ease of reading, we relegate the analytical steps and the proofs to Online Appendix A.2, together with the closed-form solutions.

Benchmark: The Full-Information Case
Before introducing the moral hazard problems, we solve the bandit problem in isolation. We assume that information is costless and readily available to a benevolent social planner who wishes to maximize the expected net amount of output produced. Thus, solving this single agent's problem will give us the first-best levels of investment and the efficient choice between exploration and exploitation. 22 At the optimum, the expected marginal benefit of providing one unit of working capital must be equal to its constant marginal cost. Under exploitation, the unique solution, k F B C C C , is pinned down by the following first-order condition (FOC) 21 In Online Appendix A.2, we show that any other action plan is dominated by either one of these. 22 The informed social planner's problem is equivalent to a setting in which there is no information asymmetry on the cash-flow realization between principal and agent, independent of the investor's ability to observe the action plan chosen by the agent (see Online Appendix A.3). In Online Appendix B.1, we tackle a problem similar to that in Manso (2011), where the project-selection stage is the private information of the agent, who incurs a private cost, but the outcome of the project is public information. We confirm the result of Manso (2011) that the agent will not be compensated for a success in the first period, but might be compensated for an early failure. We show that in such a setting, the first-best levels of capital are always provided. In general, when the contract can be made contingent on outcome realizations, there is no need to distort the capital provision.
where the prime indicates the first derivative. Similarly, under exploration, the first-best levels of capital must satisfy the following FOCs: Given Assumption 1 and the strict concavity of R, it follows that Under exploitation, the efficient scale of the project is the same across periods and outcomes. When it is socially optimal to explore, the lender provides a lower amount of capital than under exploitation in the first period because the expected probability of success is lower. If the project fails, the agent stops experimenting and reverts to the well-known project in the second period, using the unconstrained efficient amount of capital. Conversely, after a success, the novel approach becomes perceived as more likely to succeed than the well-known one, thus it is efficient to provide more capital.

IV. The Relationship-Based Financing Case
Here, we add one moral hazard problem to the model of the previous section. The agency problem arises because it is impossible to make the contract explicitly contingent on realized outcomes, as such outcomes are private information for the agent. 23 However, we assume that the investor has the technical competence to distinguish between the two technologies: the principal can observe and verify the work method employed by the entrepreneur, and thus the type of project is contractible. Because in this setting the principal acquires significant information about the firm, we refer to this situation as relationship-based financing, and to such an investor as hands-on. In Section V, we relax this assumption and compare the optimal contracts. 23 One possible interpretation is that only the entrepreneur can observe the outcome of the project. Another potential interpretation is that the cash-flow realizations are observable but not verifiable. However, while in our two-period model the two formulations have identical repercussions, in a multi-period model the former would lead to asymmetric information about the probability of success under N , whereas the second interpretation would not. Another formulation of the problem would be to consider an observable and contractible outcome but a non-verifiable use of investment. In Clementi and Hopenhayn (2006), the two formulations turn out to be identical. Also, see DeMarzo and Fishman (2007), who construct a model of firm dynamics that is able to encompass a variety of agency problems.
At time 0, the investor makes a take-it-or-leave-it offer to the entrepreneur. 24 This offer consists of a contract σ that specifies the capital advances k = {k 1 , k H , k L } and the agent's repayment at the end of period 1 after a reported success, τ H . 25 These terms can be contingent on all information provided by the entrepreneur, which consists of a series of reports on the outcome realizations. As these are privately observed by the agent, she can under-report them, diverting the excess cash flow for her own consumption. 26 Because limited liability protects the agent, The contracting problem is reminiscent of a truth-telling equilibrium of a direct mechanism, and indeed the contract must elicit truthful reporting. The relevant incentive-compatibility constraint is the one imposing truthful reporting in the good state at the end of the first period: 27 This condition requires the continuation payoff of the entrepreneur when she reports the outcome realization truthfully to be at least as high as the payoff from diversion. We also normalize to zero the outside option of the agent: the participation constraint is implied by the incentive-compatibility and limited-liability constraints. 24 Thus, all bargaining power lies in the hand of the investor. While this might not be too far from reality for an emerging firm requiring venture capital, it is more likely that neither party has all the bargaining power. We leave this interesting extension to future research. 25 In theory, there are six repayments to consider, two at the end of the first period and four at the end of the second period, each conditional on a particular history. However, in the second period, the entrepreneur will always report the outcome associated with the lower cost. This implies that the second-period repayment must be independent of the second-period realization, but it can be history-dependent. However, limited liability implies that, conditional on a low report, the agent pays nothing. It follows that the only non-negative repayment is τ H . Note that, similarly to Clementi and Hopenhayn (2006), we are excluding from the analysis the possibility that the repayments are negative: numerical simulations can be used to confirm that this is without loss of generality, as positive repayments would be used anyway for incentive reasons. 26 While we interpret the diversion as tunneling (or embezzling), other activities could fit the model, such as the agent receiving non-monetary benefits from projects that benefit her at the expenses of the principal. 27 Because the repayments in period 2 are independent of the outcome in period 2, we can disregard the incentive-compatibility constraint for period 2. We also note that there is no incentive for the agent to report the high outcome when the low outcome is actually realized, as the entrepreneur in that case does not have any fund to make the corresponding transfer to the principal. In Section V, the actions taken by the agent will not be observable. As a consequence, additional incentivecompatibility constraints will ensure than neither tunneling nor the alternative action are chosen by the agent.
The optimal contract under relationship-based financing maximizes the expected profits of the investor, (7) subject to the limited-liability constraint in equation (5) and the incentivecompatibility constraint in equation (6). Control variables are the levels of capital and the repayment in period 1 after a success. 28 We use V i j z to indicate the expected profits of the entrepreneur. The main characteristics of the optimal exploitation and exploration contracts offered by a hands-on principal are summarized in the following proposition.

Proposition 1 (Optimal contract under a hands-on principal). The optimal contract offered by a hands-on principal σ i j z is such that (a) the agent repays the entire outcome at the end of the first period, (b) the firm is terminated after a failure, (c) the firm is credit constrained in both periods, and (d) the firm grows after a success.
This proposition shows that the optimal contracts that motivate exploration and exploitation resemble a standard pay-for-performance incentive scheme, similarly to the contracts used to motivate an agent in costly state verification, cash-flow diversion, and repeated effort models. First, they both require back-loading of the agent's rewards. The agent just receives a utility equal to the first period return in case of success, and Indeed, the optimal contract under unobservable outcomes is such that, after a success in the first period, the expected payoff in period 2 will act as the carrot that persuades the agent to part with the first-period return, and in fact it is exactly equal to this value. Second, the firm is always terminated following a failure. The possibility to terminate the project for incentive reasons is well recognized in the literature, and it is a tool that is used by the principal in both Bolton and Scharfstein (1990) and Clementi and Hopenhayn (2006). Cornelli and Yosha (2003) emphasize that the option to abandon a project is essential, as an entrepreneur will most likely never quit a failing project as long as others are providing the capital. Both the credible threat to abandon a venture (Sahlman, 1990;Kerr et al., 2014) and ex ante staging Strömberg, 2003, 2004) are important components of the relationship between entrepreneurs and venture capitalists.
Finally, the amount of working capital provided when information is asymmetric is always lower than the first-best levels, so we define the firm as being credit constrained in both periods. As in Clementi and Hopenhayn (2006), credit constraints arise endogenously in response to the moral hazard problem, and are relaxed following a success. 29 The optimal contracts derived in this section might not be incentive compatible against alternative action plans if the investor can observe neither the output nor the actions taken by the agent. We investigate this issue in Section V.

V. The Arm's Length Financing Case
In this section, we consider the full double moral hazard model: we assume that the principal can observe neither the outcome realization nor the agent's work method. Compared with the model in the previous section, the investor has less information, so we refer to this case as arm's length financing, and to such an investor as hands-off. We show that the contracts derived in Section IV are usually not incentive compatible when the principal can observe neither the outcome nor the work method, as additional incentivecompatibility constraints might be binding. Thus, we derive the optimal arm's length exploration and exploitation contracts. Finally, we discuss the results.
It will prove useful to distinguish between two forms of exploration: we call exploration "moderate" if the probability of two consecutive successes is higher under the novel approach than under the conventional work method, and "radical" otherwise.

Optimal Exploration Contract
The optimal exploration contract under the hands-on principal of Section IV is not incentive compatible if the entrepreneur can rely on the well-known project without getting caught. Indeed, under the putative hands-on contract, the agent receives a utility equal to the first-period return in the case of success, but only with the probability of success in period 1. This probability of success is higher under C, and she can clearly obtainthe same payoff by tunneling the outcome if a success occurs, but with a higher probability. So, this has to be a profitable deviation.
The maximization problem of a hands-off principal wanting to implement exploration must thus pay cognisance to the following potential deviations. 30 First, the agent can decide to use the novel approach in period 1 but divert the funds in the case of success; the associated incentivecompatibility constraint is (9) Secondly, the following constraint ensures that the agent does not want to deviate by relying on the conventional approach at time 1, with the intention of tunneling the positive outcome in case of success: 31 (10) Finally, the investor avoids exploitation by imposing The following proposition summarizes the main characteristics of the optimal exploration contract offered by a hands-off principal.

The optimal exploration contract offered by a hands-off principal σ N N C is such that (a) the agent repays the entire outcome at the end of the first period, (b) the firm is terminated after a failure only if exploration is moderate, (c) the firm grows after a success, and (d) the firm is credit constrained in both periods. Moreover, (e) the firm can grow after a failure, (f) the firm is bigger after a success than after a failure, and (g) the hands-off principal provides less capital in the first period than the hands-on principal.
When actions are unobservable, front-loading the agent's reward would incentivize exploitation, as success is initially more likely under the conventional work method. Moreover, after a success in the first period, the second-period outcome provides additional information about the firstperiod action, as the expected probability of success with the new work method in the second period depends on the action taken by the agent in the first period. To incentivize exploration, it is therefore optimal to delay compensation to the second period.
Proposition 2(a) allows us to simplify the incentive-compatibility constraint in equation (11), which ensures that the agent does not want to exploit, to When exploration is moderate, the inequality in equation (12) is satisfied by any non-negative value of k L N N C and k H N N C . Given that k L N N C enters with a negative sign in the right-hand side of all other constraints, it is then optimal to terminate the firm after a failure. Conversely, when exploration is radical, a positive value of k L N N C is needed to satisfy constraint (12), the one associated with exploiting, but this comes at the cost of making constraint (9), the one associated with tunneling, stricter. On the one hand, a larger scale after a failure in the first period makes it more attractive for the agent to deceptively report a low outcome, thus incentivizing the diversion of funds. On the other hand, the probability of failure is higher initially when using the novel method, and thus rewarding an early failure dissuades the agent from exploiting.
Thus, to prevent the agent from resorting to the safer project, the hands-off principal rewards the agent for early failure and long-term success, balancing the use of k L N N C and k H N N C depending on the form of exploration. 32 When exploration is moderate, the principal prefers to incentivize it through k H N N C , because two consecutive successes are a clear signal of the use of the novel approach. When exploration is radical, the principal uses a combination of k H N N C and k L N N C . As the probability of two consecutive successes under the novel approach decreases (i.e., exploration becomes very radical), failure in the first period becomes a stronger and stronger signal that the agent explored -even stronger than two consecutive successes -and thus a bigger k L N N C isprovided. However, 32 As in Manso (2011), the agent's reward is contingent on the performance path and not only on the number of successes. In particular, if we compare total compensation when performance is LH rather than H L, we see that an agent who recovers from a failure receives a compensation at least as high as one who obtains a short-lived success. When performance is LH, the agent receives 0 under moderate exploration (because the firm is terminated) or R(k L ) under radical exploration. When performance is H L, the agent receives 0 under both moderate and radical exploration, as the agent repays the entire outcome in the first period, and produces zero in the second. In Manso (2011), it might be the case that even an agent who fails twice receives a higher compensation than one who succeeds only in the first period. Here, both receive zero as there is no outcome in the case of failure.
it is not optimal to use only k L N N C because it incentivizes the agent to tunnel the funds. In general, the firm is credit constrained in both periods but the scale of the firm increases following a success. Indeed, a successful firm is significantly scaled up: this prevents tunneling and also allows the principal to delay compensation to the second period.
Finally, the hands-off principal provides less capital in the first period than the hands-on principal: the agent is starved initially. The hands-off principal prefers to start small, not only to minimize potential losses but also because starving the agent of funds incentivizes risk-taking: because the conventional approach is initially more likely to succeed, providing less capital in the first period minimizes the opportunity cost of employing the novel method by making a deviation to the conventional work method less appealing. 33 The increased information asymmetry can be socially valuable, as the fact that the principal must provide additional incentives to the agent can increase social welfare. As explained above, there are two reasons for which the total amount of capital can increase in the informational asymmetry, and both arise because the hands-off principal needs to deter exploitation. First, the hands-off principal is less likely to inefficiently terminate a project after early failure, as harsh punishment for early failure discourages risky ventures. Second, the hands-off principal tends to reward long-term success more lavishly, through a bigger scale in the second period after a success in the first period. In the dark area of Figure 1, a larger scale after success and/or a positive scale after a failure more than compensate for the initially smaller scale under arm's length financing. Thus, this represents the region of the probabilities of success of the novel approach for which social welfare is higher when the informational asymmetry has increased, W N N C > W N N C . Figure 2 shows that a hands-off principal is always worse off than a hands-on investor, S N N C < S N N C , given the additional informational asymmetry. 34 The presence of additional deviations means that the agent might require a higher surplus under arm's length financing, V N N C , than under relationship-based financing, V N N C , to implement the action plan chosen by the principal. 33 As the principal provides the right incentives by reducing k 1 N N C and increasing k L N N C , an unsuccessful firm grows when exploration is very radical. See Online Appendix A.2 for a more formal analysis. 34 In our model, a principal would always like to have the additional information regarding the project selection stage, as this information is costless. In reality, the choice between becoming a hands-on rather than a hands-off principal is probably endogenous (e.g., it might depend on the willingness of the principal to invest in acquiring the technical competence).

Optimal Exploitation Contract
Consider the optimal exploitation contract derived in Section IV: following a success, the hands-on principal receives the entire outcome in period 1, and rewards the agent by providing a positive amount of capital to be invested in period 2; the firm is terminated following a failure in the first period. The expected payoff of the agent is π C π C R(k H C C C ). Conversely, an agent now can deviate in period 1 by using the novel approach. If she succeeds at the end of period 1, she can repay the investor with the outcome of the exploration and run N again in period 2, 35 to obtain an expected payoff of E[π N ]E[π N |H, N]R(k H C C C ). Clearly, the contract of Section IV is still incentive compatible when innovation is radical; the additional layer of information asymmetry does not matter because the agent does not want to explore anyway. Under moderate exploration, however, the optimal exploitation contract of Section IV cannot be enforced, as the additional incentive-compatibility constraints are not satisfied.
Formally, a hands-off principal proposing an exploitation contract needs to incentivize the agent to avoid three deviations. 36 First, the agent must weakly prefer to report a high outcome in the first period rather than tunneling it: Secondly, the principal needs to incentivize the agent to exploit, rather than explore: Lastly, the following constraint ensures that the agent does not want to use the novel approach in the first period and divert the funds if a success occurs: The following proposition summarizes the main characteristic of the optimal exploitation contract under a hands-off principal.

Proposition 3 (Optimal exploitation contract under a hands-off principal).
When exploration is radical, the optimal exploitation contract does not depend on the observability of the work method, and thus the same exploitation contract is offered under arm's length and relationship-based financing, σ r ad C C C = σ C C C . When exploration is moderate, the optimal exploitation contract offered by a hands-off principal σ mod C C C is such that (a) part of the agent's compensation is front-loaded to the first period, (b) the firm is 35 Note that the agent could, alternatively, tunnel the outcome, foregoing any payoff in period 2. However, given Assumption 1, this is associated with a lower payoff. 36 The agent actually has access to other combinations of the projects, and thus more incentivecompatibility constraints should be considered. However, given Assumption 1, we can disregard them from the maximization problem as they will never bind.

terminated after a failure, (c) the firm might shrink after a success, and (d) the firm is credit constrained in both periods. Moreover, (e) the hands-off principal provides less capital in the second period than the hands-on principal.
Below, we focus on the moderate exploration case. Two observations simplify the maximization problem. First, note that an increase in R(k 1 C C C ) − τ H C C C relaxes the two constraints associated with exploration, (14) and (15). As a consequence, the limited-liability constraint does not bind. Indeed, when the principal wants the agent to exploit, he must pay the agent an extra premium in case of success in the first period, as the conventional approach is initially more likely to succeed. Secondly, k L C C C enters with a negative sign in the left-hand side of all constraints, as rewarding the agent for a first-period failure incentivizes both tunneling and exploration: threat of termination following a failure is a common feature of the optimal exploitation contract, independent of the assumptions on the degree of information asymmetry.
As before, the firm is credit constrained in both periods. However, it might be the case that it receives more capital from the hands-off investor, who incentivizes the agent to exploit by providing an extra premium for early success. This is done by increasing the level of working capital in the first period. Conversely, reward for late success is always discouraged, as it incentivizes exploration: the hands-off principal provides less capital in the case of success than a hands-on principal, but still a positive amount, which, paired with the threat of termination, is used to avoid tunneling of the outcome after a success. 37 Depending on which of these two effects dominates, social welfare can increase in the degree of informational asymmetry. This is summarized in Figure 3: when exploration is moderate, there is a region of the probabilities of success of the novel approach for which welfare is higher under arm's length financing, W mod C C C > W C C C . When the probability of two consecutive successes under the novel approach is very high, this does not happen, as exploration becomes very attractive, and incentives to exploitation are provided by reducing the total amount of capital invested.
The increase in social welfare is not a Pareto improvement. Figure 4 shows that, as one would expect, losing information is costly for the principal, and thus the increase in social welfare is just a consequence of the increase in the agent's surplus. 37 As exploration becomes more and more moderate, the principal needs to provide less k H C C C and more k 1 C C C , with the consequence that a successful firm might actually shrink. See Online Appendix A.2 for a more formal analysis.
, where darker shades are associated with higher values. In the white area, V C C C ≤ V C C C . Dashed lines represent zero contours. In both panels, the dotted line separates moderate exploration (above) from radical exploration (below).

VI. Implications: A Numerical Example
This section complements the above analysis with some implications of the model for firm dynamics and social welfare. We mostly rely on numerical results, but some of these implications are established formally.

Firm Dynamics
We start by highlighting some of the implications of our model for firm size and growth. As said before, we refer to the level of working capital invested as a measure of the scale of the firm. We compare the scale of the firm in the first period and in the second period after a success under different contracts and types of exploration. 38 Corollary 1. Under a hands-on principal, a conventional firm grows faster than an innovative firm. Conversely, an innovative firm grows faster than a conventional firm under a hands-off principal. Riskier innovative firms under hands-off principals grow the fastest. Figure 5 shows that, under a hands-on principal, the dynamics of the two types of firm, conventional and innovative, are substantially similar to one another. Both moderate and radical innovative projects start small, and are substantially scaled up only after a success is reported. Similarly, the optimal exploitation contract (which does not depend on the probabilities of success of the novel approach) involves a relaxation of the financing constraints following a success. Indeed, stricter credit constraints at the beginning of the relationship, together with a reward for success and a punishment for failure, minimize the agent's incentives to embezzle revenues. Interestingly, a conventional project under a hands-on principal starts smaller and grows faster than an innovative venture. Figure 6 shows that the implications for firm dynamics are substantially different if both the outcomes and the action plan are the private information of the agent. Under both radical and moderate exploration, an innovative firm receives less capital in the first period than the corresponding conventional firm: starving an innovative agent not only limits losses and disincentivizes stealing, but also minimizes the agent's incentive to resort to the conventional project (thus incentivizing risk-taking). Moreover, following a success, growth rates are substantially greater for small, successful, and innovative firms than for conventional firms.
As far as we know, the theoretical prediction that the effect of innovation investments on firm growth differs depending on the type of financing the firm has access to has not been tested empirically. However, the empirical literature has recognized that the impact of innovation on growth is indeed different for different types of firms (e.g., Del Monte and Papagni, 2003;Coad and Rao, 2008;Demirel and Mazzucato, 2012;Segarra and Teruel, 2014;Mazzucato and Parris, 2015). 38 The unconditional probability of success of the novel approach used to generate the figures in this subsection is E[π N ] = 0.3. For "moderate", the conditional probability of success of the novel approach following a success in the first period is E[π N |H, N ] = 0.7, while for "radical" it is E[π N |H, N ] = 0.5.

Contract Offered and Welfare Implications
Because we have given all the bargaining power to the principal, the contract offered in equilibrium will be the one that, for given probabilities of success, will maximize the principal's surplus. Figure 7(b) shows that a hands-off principal's surplus from offering an exploration contract exceeds the surplus from an exploitation contract only when the probability of two consecutive successes under N is quite high. By comparing this with Figure 7(a), we can see that the region of probabilities for which an exploration contract is offered by the principal is smaller when the informational asymmetry increases. The reverse is obviously true with regards to the exploitation contract. This suggests that under arm's length financing it is relatively more difficult to finance innovations with respect to the relationship-based financing case, as only those innovative projects perceived as more likely to succeed (less risky) have access to funds. 39 However, when offering an exploration contract, a hands-off principal might need to provide more capital to prevent the agent not only from tunneling but also from resorting to the more predictable old technology. Therefore, on the one hand, innovation is harder to finance when financiers have less information, but on the other, the increased degree of informational asymmetry means that entrepreneurs whose projects are perceived as more productive not only are allowed to explore but also might be less constrained in the amount they can borrow in the long run, and this is socially efficient. This is summarized in Figure 8: the set on the right is 39 In Online Appendix A.2, we also show that under relationship-based financing, the surplus of the principal is a constant share (1 − α)/(2 − α) of total welfare. As a consequence, the hands-on principal always offers the exploration contract when it is constrained efficient to do so. Conversely, this is not true under arm's length financing.
It represents probabilities for which the principal always offers the exploration contract and welfare is higher under a hands-off principal. The set on the left is It represents probabilities for which the principal always offers the exploitation contract and welfare is higher under a hands-off principal. The dotted line separates moderate exploration (above) from radical exploration (below). the intersection between the region of probabilities for which an exploration contract delivers higher social welfare under a hands-off principal than under a hands-on principal (Figure 1), with the region of probabilities for which an exploration contract is always offered (Figure 7(b)). Thus, it shows that there exists a region of probabilities for which an exploration contract is offered in equilibrium, and for which social welfare is higher when the informational asymmetry increases. We can also see that an equivalent set exists when the principal wants the agent to exploit: for the set on the left, the equilibrium exploitation contract is such that social welfare is higher when the informational asymmetry is greater.

VII. Conclusions
When does an innovative project benefit from a lack of resources? In this paper, we have offered a two-period principal-agent model, where innovation is modeled as experimentation of untested actions, riskier than a conventional approach. We have studied the relationship between an entrepreneur and a potential financier under different degrees of information asymmetry. We have considered and compared three cases: the fullinformation case, in which the entrepreneur and the investor have access to the same information; the relationship-based financing case, in which the entrepreneur is better informed than the principal on the outcome of the production process; and the arm's length financing case, in which the entrepreneur privately chooses the novelty of the project and can embezzle revenues for her own consumption. We focused on the trade-off between increasing the scale of the project to reap higher profits and starving the agent for incentive reasons.
In the full-information case, the entrepreneur is well resourced, and experimentation occurs unless the agents are sufficiently pessimistic about the probability of success of the novel approach. An innovative project starts smaller than a conventional one, given that the initial probability of success is lower. In the relationship-based financing case, the amount of working capital that the principal is willing to provide is reduced, as financing constraints arise endogenously to minimize the agent's incentives to divert the outcome realizations. In the arm's length financing case, fewer innovative projects are funded, and the principal further starves an innovative agent, as this incentivizes risk-taking and minimizes the agent's incentive to resort to the safer conventional approach. Moreover, under the optimal contract, the innovative firm is significantly scaled up after a success in the first period, and an innovative entrepreneur can retain its access to capital even after a failure. This has the counterintuitive consequence that decreasing the principal's information (while keeping the agent's information constant) can potentially increase social welfare: resource-starved innovative start-ups can be socially optimal when researchers are better informed.
Our results show that innovation can be incentivized both in small independent start-ups and in-house by mature firms, but the best way to do so varies across different economic environments. The agent is well resourced (perhaps resembling a large R&D department) when the investor is equally (or better) informed about the outcome of the innovation process; conversely, starvation contracts (perhaps resembling fast-growing start-ups) are optimal when the researchers can focus on safer but noninnovative alternatives and have a better idea than investors of whether a new technology answers the customers' needs. Consider, for example, the hugely successful Google Maps, and the now scrapped Google Wave. Both were conceived by Lars and Jens Rasmussen. Where 2, the start-up that would become Google Maps, was based in the spare bedroom of one of the co-founders (Copeland and Savoia, 2011) and had minimal resources. According to an article published on CNN Labs, 40 the Rasmussen brothers only had $16 between them when they sold their app to Google. After joining Google, they insisted on creating a start-up-like team within Google to develop Wave; moreover, differently from other in-house projects, they were allowed almost limitless autonomy, secrecy, and plentiful resources (Copeland and Savoia, 2011). Nevertheless, the project was scrapped after a long runway. Our model suggests that one of the reasons for the success of Maps and the failure of Wave is in the contrasting incentives. The frugal Where 2 provided the optimal high-stakes situation for the development of Maps (see the article cited in footnote 40). Conversely, the combination of secrecy and abundant resources did not work well for Wave: perhaps a large budget coupled with a hands-on management by Google, or the same level of independence but with scarcer resources, could have worked better.
In this paper, we have neglected many potential distortions in order to maintain tractability, and future work could try to incorporate them (e.g., the presence of limited commitment, and the possibility that principal and agent could have different discount factors or degrees of risk aversion). 41 More interestingly, one could remove the assumption that the functional form for the outcome in the case of success is the same between conventional and novel projects: a perhaps more natural way of modeling the problem would be to let the successful outcome increase with the riskiness of the project. This is particularly relevant when studying the incentives for truly radical innovations, but one can argue that, in such cases, innovation should be perceived as ambiguous. Furthermore, in our model, the scale of the project both influences the expected revenues of the firm, and shapes the incentives to the agent, but it does not affect learning; more realistically, one could assume that more is learned from larger experiments. Moreover, our twoperiod model is not well suited to study the optimal number of experiments. One could try to develop a fully dynamic problem, where the firm operates for multiple periods, capital is long-lived and potentially irreversible, and the innovative projects evolve through different phases, with research at each step depending on the outcomes of previous phases. Additionally, by including adverse selection, one could provide the entrepreneur with the possibility to signal her ability, perhaps allowing firms' past patenting 41 Moreover, we have taken as given the distinction between hands-on and hands-off principals. Future works could, conversely, allow the principal to observe the project selection stage for a cost. Venture capitalists, for example, are often able to gather more information about the firm's prospect than traditional investors, but this requires time, effort, and expertise (Gompers and Lerner, 2004;Kaplan and Strömberg, 2004). Perhaps, another important element that is missing in the model is the crucial advisory role that venture capitalists provide to start-ups (Gompers and Lerner, 2004), but this comes with additional agency problems that one should consider (see Casamatta, 2003). activities to alleviate the presence of financing constraints. Finally, following Kiyotaki and Moore (1997), there has been an increasing interest in incorporating the optimal contract framework into a general equilibrium model. We leave these interesting extensions to future research.

Supporting Information
Additional supporting information may be found online in the Supporting Information section at the end of the article.