Search-based Crash Reproduction using Behavioral Model Seeding

Search-based crash reproduction approaches assist developers during debugging by generating a test case which reproduces a crash given its stack trace. One of the fundamental steps of this approach is creating objects needed to trigger the crash. One way to overcome this limitation is seeding: using information about the application during the search process. With seeding, the existing usages of classes can be used in the search process to produce realistic sequences of method calls which create the required objects. In this study, we introduce behavioral model seeding: a new seeding method which learns class usages from both the system under test and existing test cases. Learned usages are then synthesized in a behavioral model (state machine). Then, this model serves to guide the evolutionary process. To assess behavioral model-seeding, we evaluate it against test-seeding (the state-of-the-art technique for seeding realistic objects) and no-seeding (without seeding any class usage). For this evaluation, we use a benchmark of 124 hard-to-reproduce crashes stemming from six open-source projects. Our results indicate that behavioral model-seeding outperforms both test seeding and no-seeding by a minimum of 6% without any notable negative impact on efficiency.


INTRODUCTION
The starting point of any debugging activity is to try to reproduce the problem reported by a user in the development environment [1,2]. In particular, for Java programs, when a crash occurs, an exception is thrown. A developer strives to reproduce it to understand its cause, then fix the bug, and finally add a (non-)regression test to avoid reintroducing the bug in future versions.
Manual crash reproduction can be a challenging and labor-intensive task for developers: it is often an iterative process that requires setting the debugging environment in a similar enough state as the 4 P. DERAKHSHANFAR, X. DEVROEY, G. PERROUIN, A. ZAIDMAN, A. VAN  Previous studies [3,13] show that such kind of test cases are helpful for the developers to debug the application.
For Java programs, the information reported from the operations environment ideally includes a stack trace. For instance, Listing 1 presents a stack trace coming from the crash XWIKI-13372. ‡ The stack trace indicates the exception thrown (NullPointerException here) and the frames, i.e., the stack of method calls at the time of the crash, indexed from 1 (at line 1) to 26 (not shown here).
Various approaches use a stack trace as input to automatically generate a test case reproducing the crash. CONCRASH [7] focuses on reproducing concurrency failures that violate threadsafety of a class by iteratively generating test code and looking for a thread interleaving that triggers a concurrency crash. JCHARMING [4,5] applies model checking and program slicing to generate crash reproducing tests. MUCRASH [6] exploits existing test cases written by developers. MUCRASH selects test cases covering classes involved in the stack trace and mutate them to reproduce the crash. STAR [3] applies optimized backward symbolic execution to identify preconditions of a target crash and uses this information to generate a crash reproducing test that satisfies the computed preconditions. Finally, RECORE [14] applies a search-based approach to reproduce a crash using both a stack trace and a core dump, produced by the system when the crash happened, to guide the search.

Search-based crash reproduction
Search-based approaches have been widely used to solve complex, non-linear software engineering problems, which have multiple and sometimes conflicting optimization objectives [15]. Recently, Soltani et al. [8] proposed a search-based approach for crash reproduction called EvoCrash. EvoCrash is based on the EvoSuite approach [16,17] and applies a new guided genetic algorithm to generate a test case that reproduces a given crash using a distance metric, similar to the one described by Rossler et al. [14], to guide the search. For a given stack trace, the user specifies a target frame relevant to his debugging activities: i.e., the line with a class belonging to his system, from which the stack trace will be reproduced. For instance, applying EvoCrash to the stack trace from Listing 1 with a target frame 2 will produce a crash-reproducing test case for the class BaseStringProperty that produces a stack trace with the same two first frames.
An overview of the approach is shown at the right part of Figure 2 (box 5). The first step of this algorithm, called guided initialization, is to generate a random population. This random population 5 is a set of random unit tests where a target method call (i.e., the method in the target frame) is injected in each test. During the search, classical guided crossover and guided mutation are applied to the tests in such a way that they ensure that only the tests with a call to the target method are kept in the evolutionary loop. The overall process is guided by a weighted sum fitness function [9], applied to each test t: The terms correspond to the following conditions when executing the test: (i) whether the execution distance from the target line (d l ) is equal to 0.0, in which case, (ii) if the target exception type is thrown (d e ), in which case, (iii) if all frames, from the beginning up until the selected frame, are included in the generated trace (d s ). The overall fitness value for a given test case ranges from 0.0 (crash is fully reproduced) to 6.0 (no test was generated), depending on the conditions it satisfies.

Seeding strategies for search-based testing
In addition of guided search, a promising technique is seeding. Seeding strategies use related knowledge to help the generation process and optimize the fitness of the population [18,19,20]. We focus here on the usage of the source code and the available tests as primary sources of information for search-based testing. Other approaches, for instance, search for string inputs on the internet [21], or use the existing test corpus [22] to mine relevant formatted string values (e.g., XML or SQL statements).
2.2.1. Seeding from the source code Three main seeding strategies are exploiting the source code for search-based testing [18,10,23]: (i) constant seeding uses static analysis to collect and reuse constant values appearing in the source code (e.g., constant values appearing in boundary conditions); (ii) dynamic seeding complements constant seeding by using dynamic analysis to collect numerical and string values, observed only during the execution of the software, and reuse them for seeding; and (iii) type seeding is used to determine the object type that should be used as an input argument, based on a static analysis of the source code (e.g., by looking at instanceof conditions or generic types for instance).

Seeding from the existing tests
Rojas et al. [10] suggest two test seeding strategies, using dynamic analysis on existing test cases: cloning and carving. Dynamic analysis uses code instrumentation to trace the different methods called during an execution, which, compared to static analysis, makes it easier to identify inter-procedural sequences of method calls (for instance, in the context of a class hierarchy). Cloning and carving have been implemented in EvoSuite and can be used for unit test generation.
For cloning, the execution of an existing test case is copied and used as a member of the initial population of a search process. Specifically, after its instrumentation and execution, the test case is reconstructed internally (without the assertions), based on the execution trace of the instrumented test. This internal representation is then used as-is in the initial population. Internal representation of the cloned test cases are stored in a test pool.
For carving, an object is reused during the initialization of the population and mutation of the individuals. In this case, only a subset of an execution trace, containing the creation of a new object and a sequence of methods called on that object, is used to internally build an object on which the methods are called. This object and the subsequent method calls are then inserted as part of a newly created test case (initialization) or in an existing test when a new object is required (mutation). Internal representations of the carved objects § are stored in an object pool.
The integration of seeding strategies into crash reproduction is illustrated in Figure 2, box 5. As shown, the test cases (respectively objects) to be used by the algorithm are stored in a test case (respectively object) pool, from which they can be used according to user-defined probabilities. For instance, if a test case only contains the creation of a new LinkedList (using new) that is filled using two add method calls, the sequence, corresponding to the execution trace <new, add, add>, may be used as-is in the initial population (cloning) or inserted by a mutation into other test cases (carving).

Challenges in seeding strategies
The existing seeding techniques use only one resource to collect information for seeding. However, it is possible that the selected resource does not provide enough information about class usages. For instance, test seeding only uses the carved call sequences from the execution of the existing test cases. If the existing test cases do not cover the behavior of the crash in the interesting classes, this seeding strategy may even misguide the search process. Additionally, if the number of observed call sequences is large, the seeding strategy needs a procedure to prioritize the call sequences for seeding. Using random call sequences as seeds can sometimes misguide the search process. Existing seeding strategies do not currently address these issues.

Behavioral model-based testing
Model-based testing [11] relies on abstract specifications (models) of the system under test to support the generation of relevant (abstract) test cases. Transition systems [24] have been used as a fundamental formalism to reason about test case generation and support the definition of formal test selection criteria [25]. Each abstract test case corresponds to a sequence of method calls on one object: i.e., a path in the transition system starting from the initial state and ending in the initial state, a commonly used convention to deal with finite behaviours [26]. Once selected from the model, 7 abstract test cases are concretized (by mapping the transition system's paths to concrete sequences of method calls) into executable test cases to be run on the system. In this paper, we derive abstract test cases (called abstract object behavior hereafter) and concretize them, producing pieces of code creating objects and invoking methods on such objects. Those pieces of code serve as seeds for search-based crash reproduction. Figure 1 shows an example of a transition system representing the possible sequences of method calls on java.util.List objects. Figure 1 illustrates usages of methods in java.util.List objects, learned from the code and tests, in terms of a transition system, from which sequences of methods calls can be derived. The obtained transition system subsumes the behavior of the sequences used to learn it but also allows for new combinations of those sequences. These behaviors are relevant in the context of seeding as the diversity of the objects induced is useful for the search process. Also, generating invalid behaviors from the new combinations is not a problem here as they are detectable during the search process.

Abstract object behavior selection
The abstract object behaviors are selected from the transition system according to criteria defined by the tester. In the remainder of this paper, we use dissimilarity as selection criteria [27,28]. Dissimilarity selection, which aims at maximizing the fault detection rate by increasing diversity among test cases, has been shown to be an interesting and scalable alternative to other classical selection criteria [28,29]. This diversity is measured using a dissimilarity distance (here, 1 -the Jaccard index [30]) between the actions of two abstract object behaviors.

Model Inference
The model may be manually specified (and in this case will generally focus on specific aspects of the system) [11], or automatically learned from observations of the system [31,32,33,34,35,36]. In the latter case, the model will be incomplete and only contain the observed behavior of the system [37]. For instance, the sequence <new, addAll > is valid for a java.util.List object but cannot be derived from the transition system in Figure 1 as the addAll method call has never been observed. The observed behavior can be obtained via static analysis [38] or dynamically [39]. Model inference may be used for visualization [32,36], system properties verification [40,41], or generation [31,33,34,42,43,38] and prioritization [26,44] of test cases.

BEHAVIORAL MODEL AND TEST SEEDING FOR CRASH REPRODUCTION
The goal of behavioral model seeding (denoted model seeding hereafter) is to abstract the behavior of the software under test using models and use that abstraction during the search. At the unit test level (which is the considered test generation level in this study), each model is a transition system, like in Figure 1, and represents possible usages of a class: i.e., possible sequences of method calls observed for objects of that class.
The main steps of our model seeding approach, presented in Figure 2, are: the inference of the individual models 3 (described in Section 3.1) from the call sequences collected through static  Figure 2. General overview of model seeding and test seeding for search-based crash reproduction analysis 1 performed on the application code (described in Section 3.1.1), and dynamic analysis 2 of the test cases (described in Section 3.1.2); and for each model, the selection of abstract object behaviors 4 , that are concretized into Java objects (described in Section 3.2), stored in an object pool from which the guided genetic algorithm 5 (described in Section 3.3) can randomly pick objects to build test cases during the search process.

Model inference
Call sequences are obtained by using static analysis on the bytecode of the application 1 and by instrumenting and executing the existing test cases 2 . We use n-gram inference to build the transition systems used for model seeding. N -gram inference takes a set of sequences of actions as input to produce a transition system where the n th action depends on the n − 1 previously executed actions.
A large value of n for the n-gram inference would result in wider transition systems with more states and less incoming transitions, representing a more constrained behavior and producing less diverse test cases. In contrast, a small value of n enables better diversity in the behavior allowed by the model (ending up in more diverse abstract object behaviors), requires less observations to reach stability of the model, simplifies the inference, and results in a more compact model [33,34]. For these reasons, we use 2-gram inference to build our models.
For each class, the model 3 is obtained using a 2-gram inference method using the call sequences of that class.
For instance, in the transition system of Figure 1, the action size(), executed from state s 3 at step k only depends on the fact that the action add(Object) has been executed at step k − 1, independently of the fact that there is a step k − 2 during which the action iterator() has been executed.
Calls to constructors are considered as method calls during model inference. However, constructors may not appear in any transition of the model if no constructor call was observed during the collection of the call sequences. This is usually the case when the call sequences used to infer the model have been captured from objects that are parameters or attributes of a class. If an abstract object behavior does not start by a call to a constructor, a constructor is randomly chosen to initialize the object during the concretization.

9
For one version of the software under test, the model inference is a one time task. Models can then be directly reused for various crash reproductions.

Static analysis of the application
The static analysis is performed on the bytecode of the application. We apply this analysis to all of the available classes in the software under test. In each method of these classes, we build the control flow graph, and for each object of that method, we collect the sequences of method calls on that object. For each object, each path in the control flow graph will correspond to one sequence of method calls. For instance, if the code contains an if-then-else statement, the true and false branches will produce two call sequences. In the case of a loop statement, the true branch is considered only once. The static analysis is intraprocedural, meaning that only the calls in the current method are considered. If an object is passed as a parameter of a call to a method that (internally) calls other methods on that object, those internal calls will not appear in the call sequences. This analysis ensures collecting all of the existing relevant call sequences for any internal or external class, which is used in the project.

Dynamic analysis for the test cases
Since the existing manually developed test cases exemplify potential usage scenarios of the software under test, we apply dynamic analysis to collect all of the transpired sequences during the execution of these scenarios. Contrarily to static analysis, which would require an expensive effort and produce imprecise call sequences, dynamic analysis is interprocedural. Meaning that the sequences include calls appearing in the test cases, but also internal calls triggered by the execution of the test case (e.g., if the object is passed as a parameter to a method and methods are internally called on that object ). Hence, through dynamic analysis, we gain a more accurate insight into the class usages in these scenarios.
Dynamic analysis of the existing tests is done in a similar way to the carving approach of Rojas et al. [10]: instrumentation adds log messages to indicate when a method is called, and the sequences of method calls are collected after execution. In similar fashion to static analysis, we collect call sequences of any observed object (even objects which are not defined in the software under test). The representativeness of the collected sequences depends on the coverage of the existing tests.

Abstract object behaviors selection
Abstract object behaviors are selected from the transition systems and concretized to populate the object pool used during the search. To limit the number of objects in the pool, we only select abstract object behaviors from two categories of models: models of internal classes (i.e., classes belonging to packages of the software under test) and models of dependency classes (i.e., classes belonging to packages of external dependencies) that are involved in the stack trace. Since we do not seek to validate the implementation of the application, the states are ignored during the selection process.

Selection
There exist various criteria to select abstract object behaviors from transition systems [11]. To successfully guide the search, we need to establish a good ratio between exploration (the ability to visit new regions of the search space) and exploitation (the ability to visit the neighborhood of previously visited regions) [45]. The guided genetic operators which are introduced in EvoCrash approach [8] guarantee the exploitation by focusing the search based on the methods in the stack trace. However, depending on the stack trace, focusing on particular methods  To improve the exploration ability in the search process, we use dissimilarity as the criterion to select the abstract object behaviors. Compared to classical structural coverage criteria that seek to cover as many parts of the transition system as possible, dissimilarity tries to increase diversity among the test cases by maximizing a distance d (i.e., the Jaccard index [30]): . . > are two abstract object behaviors.

Concretization
Each abstract object behavior has to be concretized to an object and method calls before being added to the objects pool. In other words, for each abstract object behavior, if the constructor invocation is not the first action, one constructor is randomly called; and the methods are called on this object in the order specified by the abstract object behavior with randomly generated parameter values. Due to the randomness, each concretization may be different from the previous one. For each abstract object behavior, n concretizations (default value is n = 1 to balance scalability and diversity of the objects in the object pool) are done for each abstract object behavior and saved in the object pool. For instance, Listing 2 shows the concretized abstract object behavior <add(Object), add(Object)> derived from the transition system model of Figure 1. The type of the parameters (EuclideanIntegerPoint) is randomly selected during the concretization and created with required parameter values (an integer array here).

Guided Initialization and Guided Mutation
Classes are instantiated to create objects during two main steps of the guided genetic algorithm: guided initialization, where objects are needed to create the initial set of test cases; and guided mutation, where objects may be required as parameters when adding a method call. When no seeding is used, those objects are randomly created (as in the concretization step described in Section 3.2.2) by calling the constructor and random methods. Finally, to preserve exploration in model seeding, objects are picked from the object pool during guided initialization (resp. guided mutation) according to a user-defined probability P r[pick init] (resp. P r[pick mut]), and randomly generated otherwise. In our evaluation, we considered four different values for P r[pick init] ∈ {0.2, 0.5, 0.8, 1.0}, to study the effect of model seeding on the initialization of the search process. Furthermore, we fixed the value of P r[pick mut] = 0.3, As an example of object picking in action, test case generation with model seeding generated the test case in Listing 4 for the second frame of the stack trace from the crash MATH-79b from the Apache commons math project, reported in Listing 3. The target method is the last method called in the test (line 10) and throws a NullPointerException, reproducing the input stack trace. The first parameter of the method has to be a Collection<T> object. In this case, the guided genetic algorithm picked the list object from the object pool (from Listing 2) and inserted it in the test case (lines 2 to 7). The algorithm also modified that object (during guided mutation) by invoking an additional method on the object (line 9).

Test seeding
As described in Section 2.2.2, test seeding starts by executing the test cases ( Figure 2 box A ) for carving and cloning, and subsequently populating the test and object pools. Like for model seeding, only internal classes and external classes appearing in the stack trace are considered.
For crash reproduction, the test pool is used only during guided initialization to clone test cases that contain the target class, according to a user-defined P r[clone] probability. If the target method is not called in the cloned test case, the guided initialization also mutates the test case to add a call to the target method. The object pool is used during the guided initialization and guided mutation to pick objects. As described by Rojas et al. [10], the properties of using the object pool during initialization (P r[pick init]) and mutation (P r[pick mut]) are indicated as a single property called p object pool in test seeding.

IMPLEMENTATION
Relying on the EvoCrash experience [8,13,46], we developed Botsing, a framework for crash reproduction with extensibility in mind. Botsing also relies on EvoSuite [47] for the code 12 P. DERAKHSHANFAR, X. DEVROEY, G. PERROUIN, A. ZAIDMAN, A. VAN DEURSEN instrumentation during test generation and execution by using evosuite-client as a dependency. Our open-source implementation is available at https://github.com/STAMP-project/ botsing. The current version of Botsing includes both test seeding and model seeding as features.

Test seeding
Test seeding relies on the implementation defined by Rojas et al. [10] and available in EvoSuite. This implementation requires the user to provide a list of test cases to consider for cloning and carving. In Botsing, we automated this process using the dynamic analysis of the test cases to automatically detect those accessing classes involved in a given stack trace. We also modified the standard guided initialization and guided mutation to preserve the call to the target method during cloning and carving.

Model seeding
As mentioned in Section 3, Botsing uses a combination of static and dynamic analysis to infer models. The static analysis ( 1 in Figure 2) uses the reflection mechanisms of EvoSuite to inspect the compiled code of the classes involved in the stack traces, and collect call sequences. The dynamic analysis ( 2 in Figure 2) relies on the test seeding mechanism used for cloning that allows inspecting an internal representation of the test cases obtained after their execution and collect call sequences. The resulting call sequences are then used to infer the transition system models of the classes using a 2-gram inference tool called YAMI [26] ( 3 in Figure 2). From the infered models, we extract a set of dissimilar (based on the Jaccard distance [30]) abstract object behaviors ( 4 in Figure 2). For abstract object behavior extraction, we use the VIBeS [48] model-based testing tool. Abstract object behaviors are then concretized into real objects. For this concretization, we rely on the EvoSuite API.

EMPIRICAL EVALUATION
Our evaluation aims to assess the effectiveness of each of the mentioned seeding strategies (model and test seeding) on search-based crash reproduction. For this purpose, first, we evaluate the impact of each seeding strategy on the number of reproduced crashes. Second, we examine if using each of these strategies leads to a faster crash reproduction. Third, we see if each seeding strategy can help the search process to start more often. Finally, we characterize the impacting factors of test and model seeding.
Since the focus of this study is using seeding to enhance the guidance of the search initialization, we examine different probabilities of using the seeded information during the guided initialization in the evaluation of each strategy. Hence, we repeat each execution of test seeding with the following values for P r[clone]: 0.2, 0.5, 0.8, and 1.0. Likewise, we execute each execution of model seeding with the same values for P r[pick init] (which is the only property that we can use for modifying the probability of the object seeding in the initialization of model seeding). 13

Research questions
In order to assess the usage of test seeding applied to crash reproduction and our new model seeding approach during the guided initialization, we performed an empirical evaluation to answer the two research questions defined in Section 1.
RQ1 What is the influence of test seeding used during initialization on search-based crash reproduction? To answer this research question, we compare Botsing executions with test seeding enabled to executions where no additional seeding strategy is used (denoted no seeding hereafter), from their effectiveness to reproduce crashes and start the search process, the factors influencing this effectiveness, and the impact of test seeding on the efficiency. We divide RQ1 into four sub-research questions: RQ1.1 Does test seeding help to reproduce more crashes? RQ1.2 Does test seeding impact the efficiency of the search process? RQ1.3 Can test seeding help to initialize the search process? RQ1.4 Which factors in test seeding impact the search process?
RQ2 What is the influence of behavioral model seeding used during initialization on search-based crash reproduction? To answer this question, we compare Botsing executions with model seeding to executions with test seeding and no seeding. We also divide RQ2 into four sub-research questions: RQ2.1 Does behavioral model seeding help to reproduce more crashes compared to no seeding? RQ2.2 Does behavioral model seeding impact the efficiency of the search process compared to no seeding? RQ2.3 Can behavioral model seeding help to initialize the search process compared to no seeding? RQ2.4 Which factors in behavioral model seeding impact the search process?

Crash selection
In a recent study about the evaluation of search-based crash reproduction approaches, Soltani et al. [46] introduced a new benchmark, called JCrashPack, containing 200 realworld crashes from seven projects: JFreeChart, Commons-lang, Commons-math, Mockito, Jodatime, XWiki, and ElasticSearch. We use the same benchmark for the empirical evaluation of modelseeding and test-seeding on crash reproduction.
To use test and model seeding for reproducing the crashes of JCrashPach, first, we needed to apply static and dynamic analysis on different versions of projects in this benchmark. We successfully managed to run static analysis on all of the classes of JCrashPack. On the contrary, we observed that dynamic analysis was not successful in the execution of existing test suites of ElasticSearch. The reason for this failure stemmed from the technical difficulty of running ElasticSearch tests by the EvoSuite test executor. Since both of the seeding strategies need dynamic analysis, we excluded ElasticSearch cases from JCrashPack for this experiment. JCrashPack contains 124 crashes after excluding ElasticSearch cases. Table I provides more details about our dataset.
We used the selected crashes for the evaluation of no seeding and model seeding. Since test seeding needs existing test cases that are using the target class, we filtered out the crashes which contain only classes without any using tests. Hence, we used only 59 crashes for the evaluation of test seeding. More information about average number of used test classes for test seeding is available in Table II.   More information about the inferred models is available in Table II.

Configuration parameters
We used a budget of 62,328 fitness evaluations (corresponding on average to 15 minutes of executing Botsing with no seeding on our infrastructure which is introduced in section 5.2.4) to avoid side effects on execution time when executing Botsing on different frames in parallel. We also fixed the population size to 100 individuals as suggested by the latest study on search-based crash reproduction [9]. All other configuration parameters are set at their default value [10], and we used the default weighted sum scalarization fitness function (Equation 1) from Soltani et al. [9]. For test seeding executions, as we described at the beginning of this section, we execute each execution with four values for P r[clone]: 0.2 (which is the default value), 0.5, 0.8, and 1.0. Also, we used the default value of 0.3 for p object pool.
We also use values 0.2, 0.5, 0.8, and 1.0 for P r[pick init] for model seeding executions. The value of P r[pick mut], which indicates the probability of using seeded information during the mutation, is fixed at 0.3. In addition to model seeding configurations, we fix the size of the selected abstract object behaviors to the size of the individual population in order to ensure that there are enough test cases to initiate the search.
For each frame (951 in total), we executed Botsing for no seeding (i.e., no additional seeding compared to the default parameters of Botsing) and each configuration of model seeding. Since test seeding needs existing test cases which are using the target class, we filtered out the frames that do 15 not have any test for execution of this seeding strategy. Therefore, we executed each configuration of test seeding on the subset of frames (171 in total).

Infrastructure
We used 2 clusters (with 20 CPU-cores, 384 GB memory, and 482 GB hard drive) for our evaluation. For each stack trace, we executed an instance of Botsing for each frame which points to a class of the application. We discarded other frames to avoid generating test cases for external dependencies. We ran Botsing on 951 frames from 124 stack traces for no-seeding and each model-seeding strategy configuration. Also, we ran Botsing with test-seeding on 171 frames from 59 crashes. To address the random nature of the evaluated search approaches, we repeated each execution 30 times. We executed a total of 186,560 independent executions for this study. These executions took about 18 days overall.

Data analysis procedure
To check if the search process can reach a better state using seeding strategies, we analyze the status of the search process after executing each of the cases (each run in one frame of a stack trace). We define 5 states: (i) not started, the initial population could not be initialized, and the search did not start; (ii) line not reached, the target line could not be reached; (iii) line reached, the target line has been reached, but the target exception could not be thrown; (iv) ex. thrown, the target line has been reached, and an exception has been thrown but produced a different stack trace; and (v) reproduced the stack trace could be reproduced. Since we repeat each execution 30 times, we use the majority of outcomes for a frame reproduction result. For instance, if Botsing reproduces a frame in the majority of the 30 runs, we count that frame as a reproduced.
To measure the impact of each strategy in the crash reproduction ratio (RQ1.1 and RQ2.1), we use the Odds Ratio (OR) because of the binary distribution of the related data: a search process either reproduces a crash (the generated test replicates the stack trace from the highest frame which is reproduced by at least one of the other searches) or not. Also, we apply Fisher's exact test, with α = 0.05 for the Type I error, to evaluate the significance of results.
Moreover, to answer RQ1.2 and RQ2.2, which investigate the efficiency of the different strategies, we compare the number of fitness function evaluations needed by the search to reach crash reproduction. This metric indicates if seeding strategies lead to better initial populations that need fewer iterations to achieve the crash reproducing test. Since efficiency is only relevant for the reproduced cases, we only applied this comparison on the crashes which are reproduced at least once by no seeding or the seeding strategy (test seeding for RQ1.2 and model seeding for RQ2.2). We use the Vargha-Delaney statistic [49] to appraise the effect size between strategies. In this statistic, a value lower than 0.5 for a pair of factors (A, B) gives that A reduces the number of needed fitness function evaluations, and a value higher than 0.5 shows the opposite. Also, we use the Vargha Delaney magnitude measure to partition the results into three categories having large, medium, and small impact. In addition, to examine the significance of the calculated effect sizes, we use the non-parametric Wilcoxon Rank Sum test, with α = 0.05 for Type I error. Moreover, we  Table III. Odds ratios of model/test seeding configurations vs. no seeding in crash reproduction ratio. This table only shows the crashes, which reveal statistically significant differences (p-value < 0.05). An Odds ratio value higher than 1.0 gives that the seeding strategy is better than no seeding, and a value lower than 1.0 shows the opposite. do note that since the reproduction ratio of each strategy is not 30/30 for each crash, executions that could not reproduce the frame simply reached the maximum allowed budget (62,328).
To measure the impact of each strategy in initializing the first population (RQ1.3 and RQ2.3), we use the same procedure as RQ1.1 and RQ2.1 because the distribution of related data in this aspect is binary too (i.e., whether the search process can start the search or not).
For all of the statistical tests in this study, we only use a level of significance α = 0.05. Since the model inference (in model seeding) and test carving (in test seeding) techniques can be applied as one time processes before running any search-based crash reproduction, we do not include them in the efficiency evaluation.
To answer RQ1.4 and RQ2.4, we performed a manual analysis on the logs and crash reproducing test case (if any). We focused our manual analysis on the crash reproduction executions for which the search in one seeding configuration has a significant impact (according to the results of the previous sub-research questions) on (i) initializing the initial population, (ii) crash reproduction, (iii) or search process efficiency compared to no-seeding. Based on our manual analysis, we used a card sorting strategy by assigning keywords to each frame result and grouping those keywords to identify influencing factors.

EVALUATION RESULTS
We present the results of the evaluation and answer the two research questions by comparing each seeding strategy with no-seeding.  Figure 3 demonstrates the comparison of each seeding strategy (left-side of the figure is for test seeding and right-side is for model seeding) with the baseline (no seeding). Figures 3a and 3b show the overall comparison, while Figures 3c and 3d illustrate the per project comparison. In each of these figures, the yellow bar shows the number of reproduced crashes in the majority of the 30 executions, and the orange bar shows the nonreproduced crashes. According to Figure 3a, test s. 0.8 reproduced the same number of crashes. However, the other configurations of test-seeding reproduced fewer crashes in the majority of times. Moreover, according to Figure 3c, test seeding reproduces one more crash compared to no seeding. Also, some configurations of test seeding can reproduce one extra crash in XWiki and commons-lang projects. On the contrary, all of the configurations of test seeding missed one and two crashes in JFreeChart and commons-math, respectively. Finally, we cannot see any difference between test seeding and no seeding in the Joda-Time project. Table IV demonstrates the impact of test-seeding on the crash reproduction ratio compared to noseeding. It indicates that test s. 0.2 & 0.5 have a better crash reproduction ratio for one of the crashes, while they perform significantly worse in 4 other crashes compared to no-seeding. The situation is almost the same for the other configurations of test seeding: test s. 0.8 & 1.0 are significantly better in 2 crashes compared to no-seeding. However, they are significantly worse than no-seeding in 5 other crashes. The other interesting point in this table is the standard deviation crash reproduction ratio. This value is slightly higher for all of the test seeding configurations compared to no seeding. The values of odds ratios and and p-values for crashes with significant difference is available in Table III.
The underlying reasons for the observed results in this section are analyzed in RQ1.4. Table V demonstrates the comparison of test-seeding and no-seeding in the number of needed fitness function evaluations for crash reproduction. The average number of fitness function evaluations increases when using test-seeding. It means that test-seeding is slower than no-seeding on average. test s. 0.8 has the highest average fitness function evaluations. Moreover, the standard deviations of both no seeding and test seeding are high values (more than 20k evaluations). This notable variation is explainable due to the nature of search-based approaches. In some executions, the initialized population is closer to the objectives, and the search process can achieve reproduction faster. Similar variations are reported in the JCrashPack empirical evaluation 18 P. DERAKHSHANFAR, X. DEVROEY, G. PERROUIN, A. ZAIDMAN, A. VAN   as well [46]. According to the reported standard deviations, we can see that this value increases for all of the configurations of test seeding compared to no seeding. Also, the values of the effect sizes indicate that the number of crashes that receive (large or medium) positive impacts from test s. 0.2 & 0.5 for their reproduction speed is higher than the number of crashes that exhibit a negative (large or medium) influence. However, this is not the case 19  Table V, test s. 1.0, which always clones test cases, is considerably and largely slower than no-seeding in 13 crashes. In these cases, cloning all of the test cases to form the initial population can misguide the search process to reach the crash reproducing test. As an example, Botsing needs to generate a simple test case, which calls the target method with an empty string and null object, to reproduce crash LANG-12b. But, test s. 1.0 clones tests which use the software under test in different ways. To summarize, the overall quality of results of our test seeding solution is highly dependent on the quality of the existing test cases in terms of factors like the distance of existing test cases to the scenario(s) in which the crash occurs and the variety of input data.

Crash reproduction efficiency (RQ1.2)
Crash-Object Proximity For the second factor, we observe that (despite the fixed value of P r[pick mut] for test seeding), the objects with call sequences carved from the existing tests and stored in the object pool can help during the search depending on their diversity and their distance from the call sequences that we need for reproducing the given crash. For instance, for crash MATH-4b, Botsing needs to initialize a List object with at least two elements before calling the target method in order to reproduce the crash. In test-seeding, such an object had been carved from the existing tests and allowed test seeding to reproduce the crash faster. Also, test-seeding can replicate this crash more frequently: the number of successfully replicated executions, in 30 runs, is higher with test-seeding.
In contrast, the carved objects can misguide the search process for some crashes which need another kind of call sequence. For instance, in crash MOCKITO-9b, Botsing cannot inject the target method into the generated test because the carved objects do not have the proper state to instantiate the input parameters of the target method.
In summary, if the involved classes in a given crash are well-tested (the existing tests contain all of the usage scenarios of these classes), we have more chances to reproduce by utilizing test-seeding.

Test Execution Cost
The third factor points to the challenge of executing the existing test cases for seeding. The related tests for some crashes are either expensive (time/resource consuming) or challenging (due to the security issues) to execute. Hence, the EvoSuite test executor, which is used by Botsing, cannot carve all of them.
As an example of expensive execution, the EvoSuite test executor spends more than 1 hour during the execution of the related test cases for replicating frame 2 of crash Math-1b. Also, as an example for security issues, the EvoSuite test executor is not successful in running some of the existing tests. It throws an exception during this task. For instance, this executor throws java.lang.SecurityException during the execution of the existing test cases for CHART-4b, and it cannot carve any object for seeding.
In some cases, test-seeding faces the mentioned problems during the execution of all of the existing test cases for a crash. If test seeding cannot carve any object from existing tests, there will be no useful call sequence in the object pool to seed during the search process. Hence, although the project contains some potentially valuable test scenarios for reproducing the given crash, there is no difference between no seeding and test seeding in these cases.
6.1.5. Summary (RQ1) Test seeding (for any configuration) loses against no-seeding in the search initialization because some of the related test cases of crashes are expensive or even impossible to execute. Also, we observe in the manual analysis that the lack of generality in the existing test cases prevents the crash reproduction search process initialization. In these cases, the carved objects from 21 the existing tests mismatch the search process in the target method injection. Moreover, this seeding strategy can outperform no seeding in the crash reproduction and search efficiency for some cases (e.g., LANG 6b), thanks to the call sequences carved from the existing tests. However, these carved call sequences can be detrimental to the search process in some cases, if the carved call sequences do not contain beneficial knowledge about crash reproduction, overusing them can misguide the search process.
6.2. Behavioral model seeding (RQ2) 6.2.1. Crash reproduction effectiveness (RQ2.1) Figure 3b draws a comparison between modelseeding and no-seeding in the crash reproduction ratio according to the results of the evaluation on all of the 124 crashes. As mentioned in Section 5.2.1, since model seeding collects call sequences both from source code and existing tests, it can be applied to all of the crashes (even the crashes that do not have any helpful test). As depicted in this Figure, all of the configurations of model-seeding reproduce more crashes compared to no-seeding in the majority of runs. We observe that model s. 0.2 & 0.5 & 1.0 reproduce 3 more crashes than no-seeding. In addition, in the best performance of model-seeding, model s. 0.8 reproduces 70 out of 124 crashes (6% more than no-seeding). Figure 3d categorizes the results of Figure 3b per application. As we can see in this figure, model seeding replicates more crashes for XWiki, commons-lang, and Mockito. However, no-seeding reproduces one crash more than model-seeding for commons-math. For the other projects, the number of reproduced crashes does not change between no-seeding and different configurations of model-seeding.
We also check how many crashes can be reproduced at least once with model seeding, but not with no seeding. In total, model-seeding configurations reproduce nine new crashes that no-seeding cannot reproduce. Table IV indicates the impact of model-seeding on the crash reproduction ratio. As we can see in this table, model s. 0.2 has a significantly better crash reproduction ratio in 3 crashes. Also, other configurations of model-seeding are significantly better than no seeding in 4 crashes. This improvement is achieved by model-seeding, while 2 out of 4 configurations of model-seeding have a significant unfavorable impact on only one crash. The values of odds ratios and and p-values for crashes with significant difference is available in Table III. Table VII compares the number of the needed fitness function evaluations for crash reproduction in model-seeding and no-seeding. As we can see in this table, the average effort is reduced by using model-seeding. On average mode s. 1.0 achieves the fastest crash reproduction.

Crash reproduction efficiency (RQ2.2)
According to this table, and in contrast to test-seeding, model-seeding's efficiency is slightly positive. The number of crashes that model-seeding has a positive large or medium influence (as Vargha Delaney measures are lower than 0.5) on varies between 3 to 5. Also, model-seeding has a large adverse effect size (as Vargha Delaney measures are higher than 0.5) on one crash, while this number is higher for test-seeding (e.g., 13 for test s. 1.0). Table VII does not include the cost of model generation for seeding as mentioned in our experimental setup. In our case, model generation was not a burden and is performed only once per case study. We will cover this point in more detail in Section 7.  Table VII. Evaluation results for comparing model-seeding and no-seeding in the number of fitness evaluations evaluations and σ designate average fitness function evaluations needed for crash reproduction and standard deviation, respectively. The numbers in the comparison only count the statistically significant cases.

Conf. Fitness
Comparison to no s. large medium small evaluations  Table VI provides a comparison between modelseeding and no-seeding in the search initialization ratio. As shown in this Table,

Influencing factors (RQ2.4)
We have manually analyzed the crashes which lead to significant differences between different configurations of model seeding and no seeding. In doing so, we have identified 4 influencing factors in model-seeding on search-based crash reproduction, namely: (i) using Call sequence dissimilarity for guided initialization, (ii) having Information source diversity to infer the behavioral models, (iii) Sequence priority for seeding by focusing on the classes involved in the stack trace, and (iv) having Fixed size abstract object behavior selection from usage models.
Call sequence dissimilarity Using dissimilar call sequences to populate the object pool in model seeding seems particularly useful for search efficiency compared to test seeding. In particular, if the number of test cases is large, model seeding enables to (re)capture the behavior of those tests in the model and regenerate a smaller set of call sequences which maximize diversity, augmenting the probability to have more diverse objects used during the initialization. For instance, Botsing with model-seeding is statistically more efficient than other strategies for replicating crash XWIKI-13141. Through our manual analysis we observed that model-seeding could replicate crash XWIKI-13141 in the initial population in 100% of cases, while the other seeding strategies replicate it after a couple of iterations. In this case, despite the large size of the target class behavioral model (35 transitions and 17 states), the diversity of the selected abstract object behaviors guarantees that Botsing seeds the reproducing test cases to the initial population.
Information source diversity Having multiple sources to infer the model from helps to select diversified call sequences compared to test seeding. For instance, the sixth frame of the crash XWIKI-14556 points to a class called HqlQueryExecutor. No seeding cannot replicate this 23 crash because it does not have any guidance from existing solutions. Also, since the test carver could not detect any existing test which is using the related classes, this seeding strategy does not have any knowledge to achieve reproduction. In contrast, the knowledge required for reproducing this crash is available in the source code, and model-seeding learned it from static analysis of this resource. Hence, this seeding strategy is successful in accomplishing crash reproduction.
Sequence priority By prioritizing classes involved in the stack trace for the abstract object behaviors selection, the object pool contains more objects likely to help to reproduce the crash. For instance, for the 10th frame of the crash LANG-9b, model seeding could achieve reproduction in the majority of runs, compared to 0 for test and no seeding, by using the class FastDateParser appearing in the stack trace.
Fixed size abstract object behavior selection The last factor points to the fixed number of the generated abstract object behaviors from each model. In some cases, we observed that modelseeding was not successful in crash reproduction because the usage models of the related classes were large, and it was impossible to cover all of the paths with 100 abstract object behaviors. As such, this seeding strategy missed the useful dissimilar paths in the model. As an example, modelseeding was not successful in replicating crash XWIKI-8281 (which is replicated by no-seeding and test-seeding). In this crash, the unfavorable generated abstract object behaviors for the target class misguided the search process in model seeding.

Summary (RQ2)
Model seeding achieves a better search initialization ratio compared to no seeding. With respect to the best achievement of model seeding (model s. 0.8 & 1.0), they decrease the number of not started searches in 3 crashes. Moreover, compared to no seeding, model seeding increases the number of crashes that can be reproduced in the majority of times to 6%. It also reproduces 9 (out of 124) extra crashes that are unreproducible with no-seeding. In addition, model seeding improves the efficiency of search-based crash reproduction compared to no seeding. It takes, on average, less fitness function evaluations. Also, model seeding delivers more positive significant impact on the efficiency of the search process compared to no seeding. In general, model seeding outperforms no seeding in all of the aspects of search-based crash reproduction. According to the manual analysis that we have performed in this study, model seeding achieves this performance thanks to multiple factors: Call sequence dissimilarity, Information source diversity, and Sequence priority. Nevertheless, we observe a negative impacting factor in model seeding, as well. This factor is the fixed size abstract object behavior selection.

Practical implications
Model derivation costs. Generating seeds comes with a cost. For our worst case, XWIKI-13916, we collected 286K call sequences from static and dynamic analysis and generated 7,880 models from which we selected 6K abstract object behaviors. We repeated this process 10 times and found the average time for call sequence collection to be 14.2 seconds; model inference took 77.8 seconds; 24 P. DERAKHSHANFAR, X. DEVROEY, G. PERROUIN, A. ZAIDMAN, A. VAN DEURSEN and abstract object behavior selection and concretization took 51.5 seconds. We do note however that the model inference is a one-time process that could be done offline (in a continuous integration environment). After the initial inference of models, any search process can utilize model seeding. To summarize, the total initial overhead is ∼ 2.5 minutes, and the total nominal overhead is around ∼ 1.25 minute. We argue that the overhead of model seeding is affordable giving its increased effectiveness. The initial model inference can also be incremental, to avoid complete regeneration for each update of the code, or limited to subparts of the application (like in our evaluation where we only applied static and dynamic analysis for classes involved in the stack trace). Similarly, abstract object behavior selection and concretization may be prioritized to use only a subset of the classes and their related model. In our current work, this prioritization is based on the content of the stack traces. Other prioritization heuristics, based for instance on the size of the model (reflecting the complexity of the behavior), is part of our future work.
Applicability and effectiveness. Generally, test seeding alone does not make crash reproduction more effective. Actually, test seeding has a more negative impact on the search-based crash reproduction. Test seeding only uses dynamic analysis, which entails that it collects more accurate information from the potential usage scenarios of the software under test; it also means that this strategy collects more limited information for seeding. If these limited amounts of call sequences differ from the call sequences needed to reproduce the crash scenario, test seeding can misguide the crash reproduction search process.
In contrast to test-seeding, we observe that model seeding always performs better than no seeding with different configurations. As such, we observe that model seeding can reproduce more crashes than other strategies. Also, since model seeding also exploits test cases, thereby subsuming test seeding regarding the observed behavior of the application that is reused during the search, greater performance can be attributed to the analysis of the source code translated in the model.
In our experiments, various configurations of model seeding reproduced 8 new crashes that neither test seeding nor no seeding strategies could reproduce. Additionally, only model seeding could reproduce stack traces with more than seven frames (e.g., LANG-9b). Still, model seeding missed the reproduction of one crash which is reproduced by no seeding. Despite the achieved improvements by model seeding, this seeding strategy could not outperform no-seeding dramatically (crash reproduction improved by 6%). To better understand the reasons for the results, we manually analyzed the logs of Botsing executions on the crashes for which model seeding could not show any improvements. Through this investigation, we noticed that the generated usage models in these cases are limited and they do not contain the beneficial call sequences for covering the particular path that we need for crash reproduction. The average size of the generated model in this study is 7 states and 14 transitions. We believe that by collecting more call sequences from different sources (i.e., log files), model seeding can increase the number of crash reproductions.
Also, we observe that the size of the generated abstract behaviors set is commensurate to the size of the inferred model. If we have a small model, and we choose too many abstract behaviors, we will get similar abstract behaviours that misguide the search process. In contrast, if we chose a small set of abstract behaviors from a behavioral model with a large size, we will miss the chance of using all of the potential of the model for increasing the chance of crash reproduction by the search process. Extendability. The usage models can be inferred from any resource providing call sequences.
In this study, we used the call sequences derived from the source code and existing test cases. However, we can extend the models with extra resources (e.g., execution logs). Also, the abstract object behavior selection approach can be adapted according to the problem. In this study, we used the dissimilarity strategy to increase the diversity of the generated tests. Moreover, model seeding makes a distinction between using the object pool during guided initialization and guided mutation (as shown in Figure 2). This distinction enables us to study the influence of seeding during the different steps of the algorithm independently.

Model seeding configuration
Model seeding can be configured with different P r[pick init] and P r[pick mut] probabilities. Like many other parameters in search-based test case generation [50], the values of those parameters could influence our results. Although a full investigation of the effect of P r[pick init] and P r[pick mut] on the search process is beyond the scope of this paper, we set up a small experiment on a subset of crashes (10 crash in total) with 15 new configurations, each one run 10 times. Tables VIII and IX presents the configurations used for P r[pick init] and P r[pick mut] with, for each one, the crash reproduction effectiveness (Table VIII), and the crash reproduction efficiency (Table IX). In general, we observe that changing the probability of picking an object during guided initialization (P r[pick init]) has an impact on the search and leads to more reproduced crashes with a lower number of fitness evaluations. This confirms the results presented in Section 6. Changing the probability of picking an object during mutation (P r[pick mut]) does not seem to have a large impact on the search. A full investigation of the effects of P r[pick init] and P r[pick mut] on the search process is part of our future work.  [9] and we added additional crashes from Xwiki and Defects4J (see Section 5). Since we focused on the effect of seeding during guided initialization, we fixed the P r[pick mut] value (which, due to the current implementation of Botsing, is also used as P r[pick init] value in test seeding) to 0.3, the default value used in EvoSuite for unit test generation. The effect of this value for crash reproduction, as well as the usage of test and model seeding in guided initialization, is part of our future work. We cannot guarantee that our extension of Botsing is free of defects. We mitigated this threat by testing the extension and manually analyzing a sample of the results. Finally, each frame has been run 30 times for each seeding configuration to take randomness into account and we derive our conclusions based on standard statistical tests [51,52].

External validity
We cannot guarantee that our results are generalizable to all crashes. However, we used JCrashPack, which is the most recent benchmark for Java crash reproduction. This benchmark is assembled carefully from seven Java projects and contains 200 real-life crashes.
Since the EvoSuite test executor is unsuccessful in running the existing test cases of one of the seven projects in JCrashPack (ElasticSearch), thereby test seeding and dynamic analysis of model-seeding are not applicable on crashes of this project, we excluded ElasticSearch crashes from JCrashPack. The diversity of crashes in this benchmark also suggests mitigation of this threat.

Verifiability
A replication package of our empirical evaluation is available at https:// github.com/STAMP-project/ExRunner-bash/tree/master. The complete results and analysis scripts are also provided in this package. Our extension of Botsing is released under a LGPL 3.0 license and available at https://github.com/STAMP-project/botsing.

8. FUTURE WORK
We observed that one of the advantageous factors in model seeding, which helps the search process to reproduce more crashes, consists in using more multiple resources for collecting the call sequences. Further diversification of sources is worth considering. In our future work, we will consider other sources of information, like logs of the running environment, to collect relevant call sequences and additional information about the actual usage of the application. Also, collecting additional information from the log files would enable using full-fledged behavioral usage models (i.e., a transition system with probabilities on their transitions quantifying the actual usage of the application) to select and prioritize abstract object behaviors according to that usage as it is suggested by statistical testing approaches [26]. For instance, we can put a high priority for the most uncommon observed call sequences for the abstract object behavior selection. We observed that selecting the most dissimilar paths in model-seeding helps the search process through crash reproduction. However, there is no guarantee that this approach is the best one. In future studies, we examine this approach with the new abstract object behavior selection approaches that we gain by the new full-fledged behavioral usage models.
In this study, we focus on the impact of seeding during guided initialization by using different values for P r[pick init] and P r[clone] and setting P r[pick mut] to the default value (0.3). However, our results show that even with the default value 0.3, using seeded objects during the search process helps to reproduce several crashes. Our future work includes a thorough assessment of that factor. Furthermore, in the current version of model seeding, we noticed that the fixed size for the selected abstract object behaviors from the usage models could negatively impact the crash reproduction process. This set's size affects Botsing's performance and must be chosen carefully. If too small, abstract object behaviors may not cover the transition system sufficiently, missing out on important usage information. Too few abstract object behaviors can misguide the search process. In contrast, too many of them will lead to a time-consuming test concretization process. In future investigations, we will study the integration of the search process with the abstract object behavior selection from the models. This integration can guide the seeding (e.g., the abstract object behavior selection) using the current status of the search process.
Finally, we hypothesize that this seeding strategy may be useful for other search-based software testing applications and we will evaluate this hypothesis in our future work.

CONCLUSION
Manual crash reproduction is labor-intensive for developers. A promising approach to alleviate them from this challenging activity is to automate crash reproduction using search-based techniques. In this paper, we evaluate the relevance of using both test and behavioral model seeding to improve crash reproduction achieved by such techniques. We implement both test seeding and the novel model seeding in Botsing.
For practitioners, the implication is that more crashes can be automatically reproduced, with a small cost. In particular, our results show that behavioral model seeding outperforms test seeding and no seeding without a major impact on efficiency. The different behavioral model seeding 28 P. DERAKHSHANFAR, X. DEVROEY, G. PERROUIN, A. ZAIDMAN, A. VAN DEURSEN configurations reproduce 6% more crashes compared to no seeding, while test seeding reduces the number of reproduced crashes. Also, behavioral model seeding can significantly increase the search initialization rate for 3 crashes compared to no seeding, while test seeding performs worse than no seeding in this aspect. We hypothesize that the achieved improvements by model seeding can be further extended by using more resources (i.e., execution logs) for collecting the call sequences which are beneficial for the model generation.
From the research perspective, by abstracting behavior through models and taking advantage of the advances made by the model-based testing community, we can enhance search-based crash reproduction. Our analysis reveals that (1) using collected call sequences, together with (2) the dissimilar selection, and (3) prioritization of abstract object behaviors, as well as (4) the combined information from source code and test execution, enable more search processes to get started, and ultimately more crashes to be reproduced.
In our future work, we will explore whether behavioral model seeding has further ranging implications for the broader area of search-based software testing. Furthermore, we aim to study the effect of changing the seeding probabilities on the search process, explore other sources of data to generate the model and try different abstract object behavior selection strategies.