Get access

Comparing text-based and dependence-based approaches for determining the origins of bugs



Identifying bug origins – the point where erroneous code was introduced – is crucial for many software engineering activities, from identifying process weaknesses to gathering data to support bug detection tools. Unfortunately, this information is not usually recorded when fixing bugs, and recovering it later is challenging. Recently, the text approach and the dependence approach have been developed to tackle this problem. Respectively, they examine textual and dependence-related changes that occurred prior to a bug fix. However, only limited evaluation has been carried out, partially because of a lack of available implementations and of datasets linking bugs to origins. To address this, origins of 174 bugs in three projects were manually identified and compared to a simulation of the approaches. Both approaches were partially successful across a variety of bugs – achieving 29–79% precision and 40–70% recall. Results suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Some potential improvements are explored in detail and identify pragmatic strategies for combining techniques along with simple modifications. Even after adopting these improvements, there remain many challenges: large commits, unrelated changes and long periods between origins and fixes all reduce effectiveness. Copyright © 2013 John Wiley & Sons, Ltd.