An empirical study of faults in late propagation clone genealogies



Two similar code segments, or clones, form a clone pair within a software system. The changes to the clones over time create a clone evolution history. In this work, we study late propagation, a specific pattern of clone evolution. In late propagation, one clone in a clone pair is modified, causing the clone pair to diverge. The code segments are then reconciled in a later commit. Existing work has established late propagation as a clone evolution pattern and suggested that the pattern is related to a high number of faults. In this study, we examine the characteristics of late propagation in three long-lived software systems using the Simian ( Simon Harris, Victoria, Australia,, CCFinder, and NiCad (Software Technology Laboratory, Queen's University, Kingston, ON, Canada) clone detection tools. We define eight types of late propagation and compare them to other forms of clone evolution. Our results not only verify that late propagation is more harmful to software systems but also establish that some specific types of late propagations are more harmful than others. Specifically, two types are most risky: (1) when a clone experiences diverging changes and then a reconciling change without any modification to the other clone in a clone pair; and (2) when two clones undergo a diverging modification followed by a reconciling change that modifies both the clones in a clone pair. We also observe that the reconciliation in the former case is more prone to faults than in the latter case. We determine that the size of the clones experiencing late propagation has an effect on the fault proneness of specific types of late propagation genealogies. Lastly, we cannot report a correlation between the delay of the propagation of changes and its faults, as the fault proneness of each delay period is system dependent. Copyright © 2013 John Wiley & Sons, Ltd.