Lifting the veil: Using a quasi-replication approach to assess sample selection bias in patent-based studies

Research summary Patent data is a valued source of information for strategy research. However, patent‐based studies may suffer from sample selection bias given that patents result from within‐firm selection processes and hence do not represent the full population of inventions. We assess how incidental and nonincidental data truncation resulting from firm‐level and inventor‐level selection processes may result in sample selection bias using a quasi‐replication approach, drawing on rich qualitative data and a novel, proprietary dataset of all 40,000 invention disclosures within a large multinational firm. We find that accounting for selection both reaffirms and challenges past work, and discuss the implications of our findings for work on the microfoundations of exploratory innovation activities and for strategy research drawing on patent data. Managerial summary Much of what is known about innovation in general, and in particular about what makes inventors prolific, comes from studies that use patent data. However, many ideas are never patented, meaning that these studies may not in reality talk about ideas or inventions, but only about patents. In this paper, we examine the question of whether patent data can accurately be used to represent inventions by using data on all inventions generated within a large multinational firm to explore how and to what degree the selection processes behind firms' patenting decisions may lead to important differences between the two. We find that accounting for selection changes many previously given managerial implications; for example, we show how junior inventors may often not get the credit they deserve.

Research summary: Patent data is a valued source of information for strategy research. However, patent-based studies may suffer from sample selection bias given that patents result from within-firm selection processes and hence do not represent the full population of inventions. We assess how incidental and nonincidental data truncation resulting from firm-level and inventor-level selection processes may result in sample selection bias using a quasi-replication approach, drawing on rich qualitative data and a novel, proprietary dataset of all 40,000 invention disclosures within a large multinational firm. We find that accounting for selection both reaffirms and challenges past work, and discuss the implications of our findings for work on the microfoundations of exploratory innovation activities and for strategy research drawing on patent data. Managerial summary: Much of what is known about innovation in general, and in particular about what makes inventors prolific, comes from studies that use patent data. However, many ideas are never patented, meaning that these studies may not in reality talk about ideas or inventions, but only about patents. In this paper, we examine the question of whether patent data can accurately be used to represent inventions by using data on all inventions generated within a large multinational firm to explore how and to what degree the selection processes behind firms' patenting decisions may lead to important differences between the two. We find that accounting for selection changes many previously given managerial implications; for example, we show how junior inventors may often not get the credit they deserve.
In general, sample selection may be the result of two issues, both of which may independently lead to sample selection bias. The first is data truncation in the dependent variable, meaning the inability to observe its full range. For patents, we may think for example of the "novelty of an invention" as the dependent variable, with both low and high novelty inventions potentially being selected out (as illustrated by the solid black and grey boxes, respectively, in Figure 1). Second, incidental truncation implies that observations are selected into or out of the sample given the value of another variable (as illustrated by the shaded grey boxes in Figure 1) (e.g., Certo et al., 2016;Clougherty, Duso, & Muck, 2016;Heckman, 1976).
Data truncation from below should be considered the norm for all inventive activities, which mainly comprise first attempts, sketches, drafts, or prototypes, most of which the firm will not pursue further (Cooper, 2001). Hence, when patent-based studies theorize, for example, on the topic of learning from failure, most past inventive attempts would actually be unobserved. Also, truncation from below may lead to an overestimation of the benefits of adopting risky practices; that is, those practices that exhibit higher variability in outcomes (Denrell, 2005), like the adoption of crossdisciplinary inventor teams (Fleming, 2004) in which inventions may well have lower average quality but higher variance (see also Fleming, 2001). If we are unable to observe their failed attempts (i.e., ideas not leading to patents), we may erroneously conclude that cross-disciplinary teams are more likely to produce breakthroughs (i.e., highly cited patents). Moreover, there is the issue of truncation from above. Indeed, sample selection bias may be particularly severe if high-quality inventions were filtered out, as when industry-wide appropriability regimes speak against patenting (Teece, 1986) but patent data is still used to try to explain the origins of highly valuable inventions.
Incidental truncation, in turn, may not only result from a firm-specific appropriability strategy, but also from biases of agents tasked with determining which ideas to put forward for patenting. Here, research across a variety of domains, including the United States Patent and Trademark Office (USPTO) patent examiners, has shown how agents purposively or unconsciously adjust their selection processes depending on factors such as the novelty of ideas, their links or personal similarity to the inventors, or inventors' track record (see, e.g., Boudreau, Guinan, Lakhani, & Riedl, 2016;DiPrete & Eirich, 2006;Ferguson & Carnabuci, 2017;Franke et al., 2006;Goldin & Rouse, 2000;Merton, 1973;Reitzig & Sorenson, 2013).
In sum, without accounting for selection and its potential drivers, we cannot know if studies drawing on patents as proxies for inventions or ideas are valid. Worse, a multitude of different factors FIGURE 1 How selection in the patenting process may lead to sample selection bias in patent-based studies may simultaneously lead to data and incidental truncation, meaning that even if researchers theorize that a specific selection mechanism may not be at play in their setting, another type of selection may still be present. For example, in samples that include patent filings with close to zero novelty, which could be argued to be less affected by the issue of truncation from below, researchers still cannot make any inferences about how varying firm-internal strategies may have led to the selection of those particular inventions for filing.
Hence, in this paper, we undertake a quasi-replication (Bettis, Ethiraj, Gambardella, Helfat, & Mitchell, 2016;Tsang & Kwan, 1999) of one patent-based study representing a larger stream of literature drawing on patent data and potentially suffering from the issues we have identified above. By testing whether, how, and why this paper's findings change once we control for sample selection, we hope to help clarify threats to the robustness and generalizability of the use of this data source in strategy research and to shed light on some of the sources of selection bias in patent studies. Audia and Goncalo's (2007) (hereafter: AG) study of inventors' past successful creative efforts as a driver or barrier for the subsequent generation of ideas is an ideal candidate for our endeavor. It is a fieldspecific exemplar of work inquiring into the microfoundations of exploratory research (i.e., when, why, and how inventors produce novel inventions) using patent data. As such, AG shares the core assumptions that this stream of literature makes when sampling on patents and the issues these will create: not only does this study center on learning from past activity (issue 1: potential undersampling of low-novelty inventions), it also specifically looks at the production of inventions that depart from familiar knowledge (issue 2: potential under-sampling of high-novelty inventions), a process which may further be influenced by individual preferences or firm strategy (issue 3: incidental truncation). In quasi-replicating this paper, we hope to shed light on the issue of sample selection in patent-based studies more generally while looking at key questions this specific literature is grappling with, such as whether failed attempts lead individuals to create more novel breakthroughs; whether and how (if at all) individuals learn from failure; when, why, and how individuals produce more novel inventors; and which role their past success plays in these contexts.
Our attempt to quasi-replicate AG's study rests on our ability to observe the full population of all ideas from which a set is selected to be filed for patent protection. We managed to obtain access to the full set of inventions (more than 40,000) from a company in the information and communication technology (ICT) area, which we refer to as Venus. We further supplement this data with interviews as well as observational data on Venus' patent production process.
To our own surprise, even when controlling for sample selection our results largely corroborate key findings of prior work in that we find no major changes in signs or significance of AG's theorized main effects. However, accounting for selection leads to substantial changes in coefficient magnitudes: once selection is accounted for, cohort effects become as important as past success in predicting future patenting and increase in importance when predicting divergent inventions. In our post-hoc analyses, we further highlight that these effects increase substantially when specifically focusing on subsamples of our data in which data or incidental truncation is likely to be strongest. We discuss what these findings imply for research using patent data, derive suggestions for practitioners, and provide recommendations for how future work may further alleviate the sample selection problem.

| POTENTIAL SAMPLE SELECTION ISSUES IN WORK STUDYING THE MICROFOUNDATIONS OF EXPLORATORY RESEARCH
Patents are the outcomes of at least two selection processes: not only do patent examiners judge whether a patent filing meets the requirements of novelty, inventive step, and industrial application, but prior to that, many firms have set up internal selection processes to decide which of their inventions to put forward as patent applications (Cooper, 2001;Tanimura, 2018). Each of these decisions may introduce sample selection in patent data.
With examiner decisions, we can control for potential biases of selecting agents when we know which individual examiner worked on a patent (Ferguson & Carnabuci, 2017). We may also simply refer to patent filings rather than granted patents as measures of firms' inventive activity and circumvent this specific source of sample selection altogether. We argue, however, that patent filings, one of the main data sources used in studies of the microfoundations of exploratory research, may already feature significant sample selection resulting from data truncation and incidental truncation. In turn, if this selection were systematic (i.e., if the relationship of interest differs in the truncated sample from that in the full population), sample selection would bias regression coefficients.
How should these issues affect the study we quasi-replicate as an exemplar of the vast literature enquiring into the microfoundations of exploratory research using patent data? The AG study suggests that inventors' past successful creative efforts act as a barrier for the subsequent generation of divergent ideas. They found that (a) inventors with a strong track record develop more inventions because they become faster and more efficient at generating new ideas. However, (b) their past experience also becomes an obstacle for the generation of divergent ideas, as successful inventors tend to reapply the same heuristics and to draw from familiar knowledge sets.
The authors themselves put forward how data truncation may threaten the validity of their findings, stating: "we were not able to observe creative ideas that were not patented. Many of these unpatented ideas probably did not exceed the threshold of novelty necessary to obtain a patent. However, firms are also known to protect important inventions by using trade secrets and copyright. This feature of the data should suggest caution in the interpretation of the results. However, unless there is a systematic bias, the results should be unaffected" (p. 13, emphasis added). Yet, as we argued, patent data may be systematically biased on novelty because of data truncation from below as well as above: inventions with low and high novelty may not be observed. Indeed, we often think of the filing of a patent as a simple threshold function: the more novel the invention, the more likely the firm will file it as a patent. Then, truncation from below results from inventors exploring new technological areas being more likely to fail (initially) or producing inventions that do not pass the novelty threshold required for patenting. Accordingly, AG, like most work studying the effect of inventors' past experience on their future output (e.g., Carnabuci & Operti, 2013;Conti, Gambardella, & Mariani, 2014), can only observe learning from patent filings but not from all inventive activity. Truncation from above can instead be attributed to the fact that when inventors engage in research activities that deviate from what they have done in the past, they might produce inventions not only new to them, but also new to the company. Yet, as evaluators of inventions have been documented to be generally (i.e., irrespective of firm strategy) and systematically biased against novelty (Criscuolo, Dahlander, Grohsjean, & Salter, 2016;Ferguson & Carnabuci, 2017), research drawing on patent data will likely under-sample divergent creative efforts.
Regarding incidental truncation, both companies as well as inventors may engage in systematic selection. In particular, firms should patent strategically, implying they would increasingly file patents in pre-established areas to enable continuous exploitation (Gambardella, Giuri, & Luzzi, 2007), to exploit existing complementary assets (Teece, 1986), to participate in industry-wide crosslicensing (Hall & Ziedonis, 2001), or to build patent thickets that would deter others' entry (von Graevenitz, Wagner, & Harhoff, 2013). As a result, firms should select against ideas that do not fit their strategy.
Yet, in technology areas new to them, firms may be unable to make informed decisions about which appropriability mechanism to use. For example, if firms do not possess the knowledge needed to properly evaluate whether an invention fulfils the three criteria for patenting (novelty, inventive step, and industrial applicability), we would not expect to find a systematic bias in the resulting patented inventions: patent filing decisions would be taken almost at random. Indeed, in their comparative assessment of hobbyist and firm-based inventors, Dahlin, Taylor, and Fichman (2004) found that hobbyist inventors are overrepresented in both tails of the patent quality distribution. We would argue that this is because they are less strategic and less knowledgeable in their patent filings.
Regarding inventor attributes driving incidental truncation, we expect that, similar to firms' adoption of risky practices, individuals with a higher risk propensity may (incorrectly) be viewed as disproportionally successful if only a share of the distribution of inventions they produce appears in patent data: most of their actual failures may never appear in patent filings. This consideration may affect a large series of studies on what makes for a prolific inventor or how they learn over time (e.g., Kaplan & Vakili, 2015;Singh & Fleming, 2010). In turn, individual risk propensity should be of particular importance in the context of AG if it were to affect inventors' likelihood of pursuing divergent ideas.
In sum, data truncation from above and below as well as incidental truncation may potentially introduce sample selection into work drawing on patent data to study the microfoundations of exploratory research. In what follows, using the example of AG, we first want to establish whether we can identify actual selection bias. Then, we will try to explore where potential selection bias may have come from; that is, whether it is the result of data truncation, incidental truncation, or both.

| RESEARCH CONTEXT, DATA, AND REPLICATION RESULTS
To this aim, we collaborated with a large, multinational, non-U.S.-based ICT company, which we will call Venus for reasons of confidentiality. Venus provided us access to their entire range of inventions made by members of the organization and allowed us to examine how these inventions were generated, evaluated, and managed.

| Qualitative insights into the invention screening process
We carried out 28 exploratory interviews with inventors, internal experts who assess new invention disclosures and maintain Venus' patent portfolios ("patent engineers"), technology experts, managers of legal and IP departments, and directors of R&D sites, and took part in numerous formal and informal meetings. One of the authors also spent 40 days observing a team of patent engineers.
In conducting our qualitative investigation, we learned that, as is common in many technologybased companies (e.g., Tanimura, 2018), all employees in Venus were legally obliged to document all their inventions and to submit them to an electronic repository system. Inventors had further incentive to submit all their ideas because their performance was evaluated on the basis of both the number of invention disclosures submitted as well as the number selected for patenting.
Once an inventor submitted an invention, it was assigned, depending on its technological domain, to one of 12 patent boards, teams of accomplished patent engineers and dedicated IP staff. The head of that patent board then assigned the invention to a specific patent engineer based on the underlying core technology (usually inferred from the title or abstract) and current patent engineer workload, without making a substantive assessment of the complexity or patentability of the invention disclosure. The selected patent engineer performed prior art searches to establish whether the invention contained a novelty step and assessed whether the invention could be useful to the firm either by incorporation into a (new) product or service or as a means of production or service provision. Patent engineers combined their insights into a recommendation to the patent board as to what to do with the invention. These recommendations were usually accepted. Cases in which patent engineers were overruled almost exclusively related to situations in which patent board members held superior knowledge about potential synergies across technological areas.
In turn, for each invention, the respective patent board made one of following four decisions: A. The invention is deemed not novel or does not contain an inventive step; therefore, Venus does not acquire the rights to this invention and the invention is thus "given" to the inventor. B. The invention contains an inventive step but is not seen as currently useful to Venus, which does not file a patent but keeps the rights to the invention for potential future patenting. C. The invention has been judged to be novel and useful and Venus proceeds with applying for patent protection in one or several patent offices. D. The invention is considered novel and useful, but Venus will keep the invention secret.
This classification scheme should already lead to data truncation. Researchers drawing on patent data will not be able to observe the failure cases captured by class A, which the company believes do not present an inventive step (almost 50% of cases). Similarly, we see truncation from above by Venus deciding not to patent some inventions despite considering them novel and potentially even of high quality, as captured by classes B and D. While only a very small proportion of inventions are kept secret in Venus (less than 1% in D), almost 10% of inventions fall under class B, in that they do contain an inventive step but they are not (yet) deemed useful for Venus. 2 Eventually, only 15% of these class B inventions are patented by the firm, so most of these inventions will never appear in patent data. Yet, when inventions in class B are patented, they are equally likely to be of high quality, proxied by the number of forward citations received by USPTO granted patents as those inventions in class C, which are all patented.

| Quantitative data, sample comparison, and variables
To test our hypotheses on potential sources of sample selection, we gathered a series of variables from Venus' firm-internal invention database. Specifically, we were given full access to roughly 40,000 invention disclosures 3 -all inventions ever made by the firms' employees or contractors.
Compared with AG's study, our sample covers a similar number of years (≈20) but ends in 2010 rather than 1998; thus, there is only an overlap of 5 years between the data used in the two studies. 4 While AG examined the effect of the past successful experience of a sample of 372 inventors active in the hard-disk drive industry working for different firms, our sample of inventors are all Venus employees working across a diverse range of technologies connected to the ICT industry. Finally, our experience variables relate only to the inventor's experience within Venus.
Following recommendations for high quality replication (Bettis et al., 2016), we first need to show that we can reproduce AG's key findings despite these differences in sample composition by limiting our data to only those inventions that resulted in a granted USPTO patent. Then, to test for 2 The use of secrecy in Venus (i.e., the sum of inventions classified as "D" and those shelved (category "B") due to difficulties of detecting infringement) is similar to other firms in the same industry, as we established for example by the reported frequency and importance of this protection mechanism in the 2005 UK Innovation Survey. selection, we adopt Heckman's (1976) two-step approach. We then observe (a) whether selection matters (i.e., whether there is correlation between the error terms in the selection and the outcome equations) and (b) whether accounting for selection changes the outcomes (i.e., whether the coefficients capturing hypotheses in the AG's study change in direction, magnitude, or level of significance). In implementing the two-stage Heckman approach, were we to employ Cox and Poisson models in the second stage, their respective error terms would not be normally distributed. We therefore used a full information maximum likelihood (FIML) estimator, which allows us to control for selection in an exponential Weibull survival model, for the analysis of the effects of past successful experience on future success (see Boehmke, Morey, & Shannon, 2006 and the DURSEL routine in Stata). To analyze the effect of past successful experience on the generation of divergent ideas, we log-transformed the count dependent variable and estimated a Heckman model with a second stage ordinary least squares (OLS). As a robustness test, we employed Lee's (1983) generalization of the Heckman model and found consistent results to the ones reported here.
To identify strong exclusion restrictions for the selection equation, we draw on our qualitative insights to identify three candidate variables: First, each patent board has an annual budget to pay for new patent applications and renewal fees for existing ones, with priority given to annuity payments. Thus, invention disclosures evaluated near the end of the financial year may less likely be patented because of the lack of funding, regardless of quality (see also Balasubramanian, Lee, & Sivadasan, 2018). Indeed, some evaluation comments by the patent boards stated this reasoning explicitly (i.e., inventions not being selected for patenting "due to current low filing budget," "due to limited filing budget," "[due to] budget limitation," and "due to budget constraints."). 5 Second and third, we include patent engineers' workload and diligence, both of which should be independent of invention quality, but affect the decision to file a patent. As stated above, invention disclosures were allocated to patent engineers independently of their quality or complexity, but solely based on patent engineers' technical expertise and current workload (a process that Galasso & Schankerman, 2015, see in the context of US Surpreme Court judges as akin to random assignment). Patent engineers with a very high workload or less diligence in performing their work should be more likely to recommend patent protection for low quality inventions or fail to recommend that high-quality patentable inventions be filed (as shown by Frakes & Wasserman, 2017 for USPTO examiners). 6

| Results: Testing for the presence of sample selection bias
We begin by reproducing AG's original estimations as closely as possible and compute the main independent and control variables following AG's description (see their Table 1). A key variable is individual inventors' past success in their creative endeavors, suggesting inventors compare their past performance with others specialized in the same technology area. We adapted AG's approach in deriving inventor performance. First, we extracted all USPTO patents granted to each inventor since 5 To assess whether the time at which the invention was evaluated is independent of its quality, we compared the proportion of highquality patents (i.e., those in the 95th percentile of the distribution of forward citations received by USPTO granted patents) in each of the 12 months of the financial years in our sample (with respect to the month in which the invention submission was evaluated). This proportion varies between 43.3% in month seven and 30.3% in month four and it does not display a clearly decreasing or increasing trend. Furthermore, we compared the proportion of high quality patents in the first and last quarter of the financial year developed by the same team of inventors and in the same technology class (a sample of 780 USPTO patents) in order to control for some unobserved characteristics that could affect both the timing of invention evaluation and the quality of the invention. We found that the proportion of patents in the 95th percentile of the distribution of forward citations generated by invention disclosures evaluated in the first quarter is not statistically different from the share in the last quarter (z = −1.51, p = 0.13). 6 Notably, when testing for equality of the patent engineers' fixed effects in a logit regression predicting the filing of a patent, we find that equality is strongly rejected (p < 0.001), supporting our assumption that patent engineers differ in their propensity to recommend inventions to be filed for patent protection. Venus' inception and identified the main area of specialization of an inventor as the USPTO threedigit primary technology class with the highest share in that inventor's patent portfolio. This enables us to identify, for each inventor, a reference group which is represented by all other Venus' inventors specialized in the same technology area. Second, we derived the relative success measure for each inventor calculating the number of USPTO patents granted to each inventor in the 2 years preceding the focal invention and compared this with the average of the reference group. AG's second key variable is the divergence of a current idea with respect to past innovative outcomes. We used their operationalization of this variable: for each inventor, we measured the number of USPTO technology subclasses in which a focal patent has been classified that have not appeared in previous patents granted to that inventor. We also include other control variables as described in Table 1. Table 2 displays the results of our replication of their survival model predicting the probability of patenting (AG's first finding). 7 Model 1 shows the exponential Weibull survival model estimations using the sample of USPTO granted patents, confirming what AG predict and find: inventors' past success has a positive impact on the likelihood of patenting. Using the minimum and maximum value of their success variable (min = −1.31, max = 0.99), we computed that the least successful inventors are 48.7% less likely (vs. 31% in AG) to patent than the most successful inventors. 8 In Model 2, we report the estimates of the FIML Weibull model controlling for selection. It is interesting to observe that all of our exclusion restriction variables in Model 3 have the predicted sign and appear to influence the patenting decision. This model generates a positive correlation (ρ = 0.177, p < 0.001) 9 between the error terms of the selection and the Weibull duration models, indicating selection bias. This bias does not affect the sign of inventor success coefficient, which remains positive (β = 0.097, p < 0.001). However, we see a change in its magnitude: less successful inventors are now 22.1% less likely (instead of 48.7%) to patent than the most successful inventors. Hence, contrary to AG's assumption, our findings suggest that sample selection may have affected their results: drawing on patent data leads to an overestimation of the impact of past success on the likelihood of generating new ideas such that the biased coefficient is more than double the size of the unbiased coefficient. 10 This corresponds to a difference in expected time to patenting of 93 days when selection is accounted for, comparing inventors who are in the 10th and 90th percentiles by past success. 11 Lower performers are actually more productive than the naïve model results suggest: instead of taking 275 days longer to produce a patent than high performers, the difference is only 182 days. Another important change is that the effect of membership in the latest cohort of inventors becomes equal in magnitude to the difference between low and high performers (10th and 90th percentile by past success, respectively). Thus, while in the naïve model the cohort effect made little substantive difference relative to the effect of past success, the estimates from Model 2 of Table 2 imply that, for poor performers, membership in the latest cohort of inventors decreases the expected time to patenting by 7 Due to the high number of inventors in our sample, we could not estimate the inventor-specific fixed-effect model because the models do not converge. 8 Our estimates for the control variables are consistent with AG, who use a Cox survival model ( Table 2 in their paper), apart from the cumulative inventor patent variable (β = −0.021, p < 0.001) but also highly correlated with the inventor's success variable (0.69). When we exclude that variable, the cumulative inventor patent coefficient switches sign (β = 0.028, p < 0.001). 9 In cases where the exact p-value of a coefficient is less than 0.001, here and elsewhere in the paper we will report that p < 0.001 rather than having to use four or more decimal places to report the exact p-values.

10
To exclude the possibility that our results are driven by the presence of extreme outliers, we repeated our analyses while winsorizing the inventor success variable using the 1th and 99th percentiles as cut-offs. All estimates are consistent with the ones reported here. 11 Calculated as difference in the differences in median time to patenting between the naïve Weibull and the FIML Weibull models, for high and low performing inventors (10th and 90th percentiles by past success), with other covariates held at their mean values. 121 days. This is almost exactly the same estimated difference in time to patenting between low and high performers within the latest cohort of inventors (122 days).
Our results hence suggest that by considering potential learning from failure, differences in past performance are less meaningful as predictors of future success than suggested by patent-based studies, with cohort effects becoming equally important. This result has implications for the allocation of resources among different cohorts of inventors and between more or less prolific inventors: managers should not simply favor their most successful inventors (e.g., those with highest number of patents) but they should be equally supportive of newly hired inventors lacking a strong track record of successful inventions.
Next, we turn to AG's analysis of inventing in areas new to the inventor (AG's second finding). The results of OLS and Heckman models are shown in Models 1-3 of Table 3. In Model 1, we again successfully reproduce AG's key findings on our subsample of patented inventions: the innovative performance of the focal inventor is a negative (β = −0.045, p < 0.001) predictor of future divergent efforts. To compare the magnitude of our and AG's coefficient estimates, we calculate the difference in the probability of developing patents in new subclasses between the most and least successful inventors in their sample (inventor success min = −1.31, max = 0.99). Following from Model 4, this difference is equal to −14.4%, which is almost half the effect found by AG (−30%).
In Model 2, we report the results of the OLS model controlling for selection; Model 3 shows the estimates of the respective first-stage probit model. The Heckman model confirms the presence of sample selection: the correlation between the error terms in the probit and the OLS regressions (ρ) is negative and statistically different from zero. The estimated coefficients on exclusion restriction variables in Model 3 are in line with our expectations. The high value of the Pseudo R 2 (0.35) also suggests that these exclusion restrictions are relatively strong. The inventor success variable is again negative (β = −0.050, p < 0.001). To assess the extent of a potential bias, we again calculate the difference between the most and least successful inventors, which only changes by less than two percentage point (to −16%). Thus, although there is sample selection, it does not seem to substantially affect the impact of the main independent variable. However, cohort effects again become substantively more important once selection is accounted for, with the estimated magnitude of the effect of being part of the newest inventor cohort increasing by over 65% (from β = 0.096 and p = 0.003 in Model 1 to β = 0.159 and p < 0.001 in Model 2). 12 Overall, our quasi-replication of AG suggests two key insights: on the plus side, our theories seem to hold even when accounting for selection, the sign of all key variables remain unchanged. On the negative side, our interpretations of effects may need to be reconsidered fundamentally, with the magnitude of the effect of past success estimated when using patent data alone on future success appearing to be more than double the unbiased coefficient, making cohort effects equally important once selection is accounted for. Also, while the effect of past success on the generation of divergent ideas seems less affected by sample selection, we cannot yet rule out that this is simply an averagingout effect, in which sample selection would only affect certain groups of inventions or inventors but not others. To scrutinize this further, below we will try to delineate more clearly the origins of sample selection in our quasi-replication. 12 Again, a robustness check using winsorizing (cut-off: 1th and 99th percentile) produces qualitatively identical results.

TABLE 3
Audia and Goncalo replication of predicting number of ideas in new subclasses (finding 2) and sample selection corrected estimates using two-stage Heckman model

| Post-hoc analyses: Identifying the origins of sample selection bias
We begin by testing for the presence of data truncation. Following our initial logic, data truncation would exist if non-novel (left truncation) and highly novel (right truncation) inventions had a lower chance of being selected for patenting compared with moderately novel inventions. Hence, we test whether invention novelty has a curvilinear effect on the likelihood of an invention's selection for patent filing. To do so, we use a linear probability model that allows us to include fixed effects for leading inventors without having to exclude any observations. The definitions of the variables used in this analysis can be seen in Table 4 and their descriptive statistics and pair-wise correlations in Table 5. Table 6 presents the results. Model 1 shows positive coefficient for the main effect of novelty (β = 0.493, p < 0.001) and a negative coefficient for the square term (β = −0.432, p < 0.001), with the maximum of the inverted U-shaped curve at a value of novelty equal to 0.57. While only a minority of observations lie beyond the tipping point, the Fieller (1954) confidence interval around the maximum confirms it is still within the range of our novelty variable (t = 6.09, p < 0.001). We also tested that the slope of the curve is positive when novelty is equal to its mean value (0.06) (F = 79.39, p < 0.001) and negative when this variable is equal to its 95th percentile (0.66) (F = 9.73, p = 0 .002). Moreover, we divided the observations into two samples, one with observations with values of the novelty variable below the turning point of the inverted-U shaped curve and one with observations with novelty above the maximum of this curve. We then estimated two models with just the linear term of the novelty variable and found that the coefficient of this variable is positive for the first subsample suggesting an upward slope for observations below the turning point (β = 0.317, p < 0.001), while the coefficient for the second sample is negative (β = −0.338, p = 0.053) indicating a downward slope for the sample above the turning point. This additional robustness check confirms the presence of an inverted U-shaped relationship between novelty and the probability of filing a patent. As a final check, we

Evaluation of prior patented inventions (log) Number of prior patented inventions by team of inventors evaluated by patent engineer
Evaluation of prior failed inventions (log)

Number of prior failed inventions by team of inventors evaluated by patent engineer
Patent engineer workload (log) Number of invention disclosures awaiting decision assigned to patent engineer

End of financial year evaluation
Equal to 1 if decision on focal invention made in the last month of the financial year  TABLE 5 Descriptive statistics of variables used to predict patent filing and share of patent filings granted winsorized the novelty variable by replacing the values above its 95th percentile with this value (0.667) to assess the robustness of our findings to the presence of outliers. Also in this case, our results are confirmed (β = 0.556, p < 0.001 for novelty and β = −0.645, p < 0.001 for the square term). Overall, these results suggest that we find negative rather than just decreasing marginal effects of novelty on the probability of patenting, but this negative marginal effect sets in only for high values of novelty. To rule out that these effects are driven by unobserved features of inventions, in additional analyses, we restricted our sample to inventions for which patent engineers consulted experts within the company to help them judge an invention's novelty and usefulness. This sample represents borderline cases that are neither clear-cut prior art nor patenting cases. Indeed, we see that for these approximately 10,000 ideas, a higher percentage of inventions (16.5% vs 5.7%; p < 0.001) are classified as containing an inventive step but not currently of use to the company, but also a higher average novelty value. The results of these estimations, which buttress our above results, are presented in Model 2.
The fact that we find evidence of data truncation leads to several possible interpretations of the outcomes of our quasi-replication of AG. For example, for AG's first study, where we show that the effect of past patenting experience is overestimated, the presence of data truncation would suggest that inventing experience per se (not merely past inventing success) has an important influence on future patent production. Notably, such experience need not even be a measure of an inventor's actual competence, but could merely be a proxy for inventors' better understanding of the firms' patenting efforts or emerging personal ties to the evaluating agents.
We now proceed to examine potential drivers of incidental truncation. Drawing on our theorizing, we focus on Venus' ability to make accurate decisions on what inventions to file for patent protection and the risk propensity of individual inventors.
First, if the firm lacks the capability to assess the quality of an invention, for example in a technology area further from the firm's core competences, we expect its patent filings to be closer to a random draw from the pool of inventions. In contrast, we expect (stronger) sample selection bias if the firm or selecting agents have (more) expertise in a given technological area and may hence execute a deliberate patent filing strategy. To see whether this difference in evaluation capability matters for selection, we focused on two subsamples of inventions, evaluated by 2 of the 12 Venus patent boards: PB1 and PB2. Here, invention submissions to PB1 are on average of lower quality and have less strategic importance than those submitted to PB2. 13 If our arguments for the presence of sample selection bias hold, we would expect to find stronger evidence of selection bias in the sample of inventions evaluated by PB2 vs. PB1. And indeed, for the sample of PB2 inventions, the coefficient for the inventor success variable is more than two thirds smaller in the duration model predicting the probability of patenting (see Models 8 and 7 in Table 2) and almost one-and-a-half times larger in the OLS model predicting the number of diverging ideas (see Model 8 and Model 7 in Table 3) once we correct for sample selection. In contrast, the sample selection bias corrected estimates for PB1 inventions do not differ much from the uncorrected estimates (see Models 6 and 5 in Tables 2 and 3).
In our second post-hoc analysis, we inquire into whether inventors' risk propensity may drive incidental truncation. Risk propensity should matter in particular regarding AG's second finding: 13 We find that PB1 has a higher percentage of prior art inventions (PB1 76.3% vs PB2 36%), a smaller share of shelved inventions (2.7% vs 10.6%), of inventions related to an industry standard (0.5% vs 29.9%), and of inventions which are linked to specific R&D projects (47.7% vs 69.2%). Patent filings put forward by PB1 are less likely to be granted by any patent office (11.5% vs 33.1%) and by the USPTO in particular (52.9% vs 71.7%). Also when comparing the average success rate of USPTO applications (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) in the four-digit IPC technology class where most of PB1 (PB2) filings appear, we see that Venus is significantly underperforming in the IPC class mostly related to PB1 (Venus: 21% grant rate; overall: 64%) but is almost matching the overall average for PB2 (61% vs 70%).
More risk-loving inventors would not fear the risk of failure arising from attempting to invent in technology areas they never patented in before, leading to a higher propensity to engage in more explorative research, despite such research being less likely to succeed. Therefore, omitting risk propensity could introduce a negative correlation between the error terms in the selection and outcome regressions because more risk-loving inventors will produce a higher number of divergent ideas. As the relationship between past success and the production of divergent ideas is expected to be negative, this omitted variable may thus result in an attenuating bias on the coefficient of inventor success, leading to an underestimation of the magnitude of this effect. 14 To explore this conjecture, we build on existing research (e.g., D'Acunto, 2015; Powell & Ansic, 1997) indicating that women are less risk-seeking than men and run our analyses by splitting the sample into female and male inventors. 15 Although we acknowledge that this is an imperfect way to measure this individual trait and that our sample of female inventors is relatively small, the findings are in line with our expectations for AG's second finding. Results reported in Table 3, Models 10 and 11, confirm that sample selection bias is only present for the sample of male inventors and that it leads to the magnitude of the effect of past success being underestimated. However, there does not appear to be a difference in the extent of sample selection bias between male and female inventors when considering the effect of past success on future success (Table 2, Models 10 and 11), which in both cases is overestimated if sample selection bias is not considered. This would suggest that for the "risky practice" of divergent invention inherent to AG's second finding, risk propensity is a crucial selection variable, rendering incidental truncation a real issue. In contrast, in the "learning" account behind AG's first finding, incidental truncation brought by firms' appropriability capabilities as well as data truncation from below seem to matter more.

| DISCUSSION
In this paper, we set out to shed some light on whether and how academic work drawing on patent data may suffer from patents being a nonrandom subsample of the population of inventions due to data and incidental truncation. To do so, we performed a quasi-replication of an exemplary study on the microfoundations of exploratory research, an area we considered particularly liable to issues of sample selection, to capture whether and how sample selection might affect past results.
With respect to this specific literature, the findings from our quasi-replication lead us to two insights. First, existing work has probably overestimated the effect of learning from past patenting success on future patenting success. Rather, it is the gathering of positive and negative experiences of the inventive process that leads individuals to become more successful at patenting. We suggest that this finding may direct future work not only to a different and more detailed study of how individuals learn, but also of what they learn: are inventors really producing better inventions, or are they simply becoming better at beating the filters set up by their own organizations? Second, even though at first look results on individuals' past success on their likelihood to explore new technological areas seemed largely unaffected by selection bias, our post-hoc analyses clearly showed that this is not the case. Rather, as we had predicted, accounting for differences in inventors' risk propensity showed how this path dependency is even stronger among risk-loving male inventors. To us, the importance of such individual level factors suggests a need for further in-depth, field-based (e.g., Hargadon, 2003) or experimental work (e.g., Gino, Argote, Miron-Spektor, & Todorova, 2010) to study the actual microfoundations of exploratory research, including the role of gender (see, e.g., Jeppesen & Lakhani, 2010). 14 The likely direction of bias due to unobserved inventor risk propensity is less clear for AG's first finding on the relationship between inventors' prior and future success. This would depend on whether risk propensity increases the rate at which an inventor produces inventions by an amount that is sufficient to offset the lower likelihood of any particular invention of hers being patented. 15 We used the R package genderizeR to predict gender based on inventors' first name. We consider an inventor male if the algorithm attributes a probability higher than 70% to a first name being associated to a man. We used native speakers to code the gender of the remaining inventors. However, even after manual coding we were left with 2,090 inventors for which we do not know their gender.
For strategy literature drawing on patent data more broadly, first, and, admittedly, to our own surprise, we provide some reassurance regarding selection bias as to the theories we have built using patent data: controlling for selection never led to major changes in the signs in the coefficients AG had hypothesized. As such, we provide some corroboration for the robustness and generalizability of patent-based studies, but also highlight the benefits of obtaining additional evidence to buttress these results. At the same time, our study must not be read as a carte blanche for using patent data, let alone because we solely focused on problems of selection, but not, for example, on potential measurement error, which is another issue plaguing many articles drawing on patent data (Tanimura, 2018).
Our second, and arguably most crucial finding is that selection matters. The selection process we described in the introduction, in which firms decide against patenting to halt idea development or to opt explicitly for another appropriation mechanism, already showcased how issues of data and incidental truncation may pervade work drawing on patent data. To exacerbate this issue, we saw, as expected, both data and incidental truncation impacting our results in a number of different ways. Hence, we think that patent-based studies should find it difficult not only to theorize a priori which of these factors should dominate over another, but also to reasonably exclude any other factor from being at play, making prediction of the overall effect of sample selection (i.e., over-or underestimation) difficult. In addition, we found selection to cause significant changes in point estimators or marginal effects that lead to (a) new pathways of interpretations of existing strategy research and (b) questions concerning whether existing results are precise enough to inform managerial decisionmaking.
Regarding extant theorizing in strategy, work around exploration and exploitation in particular (March, 1991) has built strong priors thanks to work on patent data. For example, actively fostering knowledge recombination is one of the prime recommendations for successful exploration (e.g., Arts & Veugelers, 2015;Fleming, 2001;Kaplan & Vakili, 2015). While we are not challenging that this approach is per se beneficial, we make it clear how, for example, the cost of inventors branching out into areas new to them may even be higher than others before us, including AG, have indicated (e.g., Fleming, 2007;Teodoridis, Bikard, & Vakili, 2018). Here, given the underestimation of the value of recent hires and concomitant overestimation of inventors' experience revealed in our results, we may speculate that successful branching out into new knowledge fields may be driven by the mechanism of firm-level hiring as well as by individual-level, team-level, or firm-level learning or experimenting. Such reflections further our above call for micro-studies of inventive activities and their selection to scrutinize what team compositions lead to successful exploration (e.g., on the diversity of existing and new hires in knowledge, seniority) and whether the overall importance of teams for exploration (e.g., Singh & Fleming, 2010) may have been overestimated.
Similarly, work on exploitation, such as on firm-level learning, in which firms' hone, graft, and extend capabilities (Helfat & Peteraf, 2003) may also be affected. Here, our insights suggest that learning to be excellent at innovation may, even more strongly than suggested by previous work (Conti et al., 2014;Fleming, 2007), involve a greater number of attempts rather than relying on individuals who have succeeded more often in the past. Importantly, a precise explanation for the above mechanism is still lacking, and may include, for example, innovation simply being uncertain, individual knowledge having a certain half-life (Teodoridis et al., 2018), or unobserved firm-internal competition happening about what is being learned (Kaplan, 2008). In addition, we find evidence for how work using patents to study firm capabilities may have an inherent incidental truncation problem resulting from the correlation between firms' current capability level and their abilities to select the right ideas for filing (see also Knudsen & Levinthal, 2007;Kruger & Dunning, 1999).
For practice, our insights call for greater modesty in the managerial recommendations we give. While our field more broadly may well need to focus less on significance levels and more on effect sizes to carry meaning beyond academe (Bettis et al., 2016), our results make clear that, at this point, we should be careful in pursuing such endeavors, at the least when it comes to drawing on patent data to try to explain what leads inventors to produce more, more novel, or better inventions.
Still, we see numerous ways in which to tackle these issues in future strategy research. The first is to conduct more quasi-replications like ours, which, do however require data access that is hard to come by. Yet, ways exist to supplement or extend patent data. First, researchers may focus on limited sets of inventions for which they gather additional historical or contextual information, as in inventor surveys  or studies of university inventors (Bercovitz & Feldman, 2011;Kotha, George, & Srikanth, 2013;Nelson, 2015). Second, the corporate spawning literature has begun to track the movements of individuals on platforms, such as LinkedIn, rather than using patents (e.g., Avnimelech & Feldman, 2010;Ge, Huang, & Png, 2016). Third, researchers interested in highvariance outcomes may focus on settings where it is possible to observe the entire range of inventive efforts, like open source projects (von Hippel & von Krogh, 2003), idea submissions sites (Bayus, 2012), suggestions boxes (Dahlander & Piezunka, 2014), or problem-based challenges (Jeppesen & Lakhani, 2010). Similar settings would further include crowdfunding campaigns (Mollick, 2014) or the study of orphan drugs and abandoned compounds (Chesbrough & Chen, 2013). All these sources may also be linked to patent records, allowing for a greater appreciation of the different pathways to creative output. Finally, approaches embedding patent data into rich case narratives to endogenize patenting as an outcome of firm-internal strategy or behavioral norms hold great potential in our view (e.g., Bhaskarabhatla & Hegde, 2014). Following this pathway, future work may well identify additional internal processes, which are exogeneous to invention quality but pivotal to the patenting outcome beyond known ones such as time pressure (Balasubramanian et al., 2018), evaluator workload (Frakes & Wasserman, 2017;Galasso & Schankerman, 2015), or evaluators' ties to the inventor (Reitzig & Sorenson, 2013). Such work will be instrumental to improving our understanding of what is really going on inside firms: similar to the search literature (Maggitti, Smith, & Katila, 2013), we know much more about the inputs to and outcomes of firms' patenting decision-making process, but very little about this decision-making process itself. Our insights reaffirm that this process is unlikely to be the machine-like, neutral selection algorithm we often assume it to be when drawing on patent data.

| Limitations and conclusion
It is clear that even if Venus' patent production process is similar to that of many other technologyoriented firms (Tanimura, 2018), working with internal data of just one company may limit the generalizability of our findings. While some may be quick to suggest that our study may exaggerate the issue of sample selection bias, the opposite may well be the case. For example, given the lower prevalence of secrecy in Venus' industry, truncation from above may not have played the same effect it may have in many other industries in which this appropriability mechanism is used.
Still, we believe that our study represents an important step in highlighting the existence, magnitude, and drivers of sample selection problems, especially when one tries to model what makes inventors successful. Although this problem has been commented upon previously, our quasi-replication is the first study we know of that tries to indicate the scale of this issue and to present recommendations to tackle these shortcomings in the future. We expect that patent data will remain a key tool for strategy researchers, as patents provide a powerful lens into the nature of the people and technologies that underpin change in the economic system. At the same time, we need to better understand the social, economic, and organizational processes that give rise to patents and to complement this information with alternative sources of data that allow us to more fully understand the entire innovative process.