This section describes the research procedure used to simulate the text and dependence-based approaches. To demonstrate both of these, a detailed example of the analysis of jEdit bug 1965114 will be used.
The bug was related to the SVN revision that fixed the bug through the use of the bug ID in the commit comment, as provided by the original sample . Additional scripts were written to parse the provided dataset into an HTML page with links to jEdit's Bugzilla  and ViewVC  servers to aid analysis. An example of the output from these scripts for bug 1965114 is shown in Figure 5.
The Bugzilla hyperlink in Figure 5 opens the bug description shown in Figure 6, and the ViewVC hyperlink opens the revision description shown in Figure 7. The bug is described as: “Pressing ‘n’ with the file system browser dockable open, and the file list focused, opens a new file.” The problem is that typing an n character in a filename opens a new file, making it impossible to open files with the letter n in their filename using the keyboard. The shortcut to open a new file is supposed to be Ctrl + n, not just n.
The SVN revision description (Figure 7) identifies the jEdit commit that ‘fixed’ the bug (12672) and all the files that were modified during the fix. Often, there will be multiple source code (.java) files modified for one fix. In this example, the text file CHANGES.txt (the jEdit release history) has been updated to include a textual description of the bug fix, and the single file VFSDirectoryEntryTable.java has been modified.
There are now three stages of analysis to be performed using the SCM system: identifying the jEdit revision where the bug was introduced, simulating the text approach and simulating the dependence approach.
To identify the origin of the bug, the BTS entry is read to see if there is any information that could aid understanding of the nature of the bug and its potential source. For bug 1965114, this helps because it indicates that the origin code is likely to be concerned with file shortcut options. This helps to focus the examination of the modified code.
Next, the commit that fixed the bug, 12672, is examined. First, the CHANGES.txt file was checked in case it contained helpful information; no further information was gained in this case. Second, the source file involved was examined using the annotated view provided by ViewVC. This provides a side-by-side SVN diff comparison against the preceding version of this file highlighting each line added, removed or changed. Figure 8 shows a snapshot of the annotated view for revision 12672 compared to its predecessor 12671. The lines highlighted in yellow (326–328) have changed between revisions, and those in green (332) were added (any removed lines would be shown in red). Line 327 appears to be an added conditional to fix this bug.
Figure 8. ViewVC: bug jEdit 1965114 side-by-side comparison of revisions 12671 and 12672 (the corresponding dependence graphs may be seen in Figure 13).
Download figure to PowerPoint
In revision 12672 of VFSDirectoryEntryTable.java, there were many other changes: 10 different lines were changed in total, distributed throughout the file from line 128 to line 624. All of these changes must be considered as potential contributions to the bug fix. In this example, these other changes appear to be concerned with opportunistic cleaning of the code – replacing a concrete LinkedList by the List interface and removing many unnecessary brackets. Such incidental changes can have unfortunate consequences for the two approaches.
Having determined that the fix is likely to be the changes at lines 326–332 in revision 12672 and that the erroneous code is therefore on line 326 of revision 12671, the task now is to identify the origin of this bug – in what revision was the bug introduced? By examining the SVN annotated view of revision 12671 (shown in Figure 9), the last revisions to change the context around line 326 are highlighted.
Figure 9. ViewVC: annotated revision view – the numbers in column 3 are hyperlinks to the last revisions that modified the lines in column 1.
Download figure to PowerPoint
Opening the more recent revision 10326 associated with line 325 shows that there was only a change of layout in this revision, with no changes to the code. Opening revision 7998 reveals that all the code associated with this case branch was added in this revision, as shown in Figure 10. Revision 7998 is therefore deemed to be the origin of bug 1965114.
The second stage is to now simulate the text approach. This means going back to revision 12672 which was identified as containing the bug fix. The evaluation was carried out as if each version of the code had been run through a preprocessor to standardise whitespace and brace formatting, and remove comments. This meant changes in whitespace, formatting and comments were disregarded. Adding or removing comment markers around lines of codes was therefore treated as adding or removing those lines. Changes to import statements were also ignored as these were either accompanied by other changes or were unused and so unrelated to the bug. Otherwise, the text approach assumes that all other changes made in the revision that fixes the bug are part of the bug fix. Furthermore, as mentioned earlier, the text approach cannot deal with fixes that involve adding lines of code, and so the focus is only on removed and changed lines.
The text approach works by taking each line that has been changed or deleted in the revision that fixes the bug (subject to the exclusions just mentioned) and tracking back through the revision history to identify the revisions that previously altered these lines. The set of revisions identified is considered the potential origin of the bug in the text approach.
Revision 12672 included 10 separate line changes and one line addition. The 10 line changes are therefore all traced back through the revision history as potential origins of bug 1965114. This is similar to the process described earlier to identify the origin of the bug. The problem is that the text approach cannot distinguish amongst any of the 10 changes, as was carried out earlier where an understanding of code semantics was used to focus on the changes made around line 326 as being the actual bug fix. Tracking back for each of the 10 changes leads to a set of possible bug origins, potentially one origin for each change. In this case, the set of revisions identified is 4640, 4648, 5179, 5217, 7170, 7998, 9596, 10275 and 11009. This is a clear example of one of the limitations of the text approach. Notice that the set does include the origin, 7998, as a result of finding the revision that introduced the change at line 326.
The third stage is to simulate the dependence approach. The first step is to map methods between versions based on their name and signature. Although it did not occur in this case, this allows the dependence approach to handle methods which are relocated within the class (not true of the text approach).
Next, each method of the bug fix version is compared to the preceding version, and any removed or added dependences are noted. In the case of removed dependences, each preceding version of an updated method is manually compared until the most recent version to add one of the dependences is found. If no dependences are removed, then the same process is performed searching for the most recent version to have altered either the source or target of any added data/control dependences. In contrast to the text approach, rather than identifying the full set of potential bug origins, the dependence approach identifies only the most recent revision associated with a removed/added dependence and assumes that revision is the single bug origin.
The dependence approach therefore starts with the SVN diff between revision 12672 and its predecessor 12671. Of the 10 changes and one addition, eight are concerned with the removal of unnecessary brackets, and one is the addition of a closing brace matching the new conditional at line 326. This leaves only two changes that could affect the data or control dependences. Only dependences on other lines within the same method were considered; dependences on other methods and on any fields were ignored.
The first change is line 128 which was altered from LinkedList < VFSFile > returnValue = new LinkedList < VFSFile > (); to java.util.List < VFSFile > returnValue = new LinkedList < VFSFile > (); the dependences within this method are shown in Figure 11(b) and the relevant sections of code in Figure 12. As shown, line 128 has a control dependence onto method entry, and lines 132 and 134 have data dependences on line 128.
When comparing two methods, the first step of the dependence approach is to examine the graph for each version and determine which nodes in the first graph correspond to which nodes in the second graph, if any. This mapping is based on the similarity of the two lines and the lines surrounding them. Generally, this was straightforward to do, but where there were ambiguities, these were recorded and re-examined at the end in order to ensure that similar cases were treated in the same manner. Specifically, if changes were made to an object's type, or a method was added to or removed from a chain of method calls, then the two nodes were considered to not map to one another. Changes that were made to the condition of an if-statement were regarded as the nodes mapping to one another. These decisions are somewhat arbitrary, but are based on suggestions in the original description , and ensure consistency in the evaluation.
As such, as this is a change of type, the two nodes representing line 128 are not considered to match. Therefore, both the control and data dependences have been removed (and new ones added), as shown by the highlighted edges in Figure 11. The dependence approach then searches the previous revisions to determine where any of those dependences were originally added. This process is performed in a similar manner to the text approach described earlier, using ViewVC to track back through the jEdit revisions. In this case, the generic parameter VFSFile was introduced in revision 9596, which would be seen as the introduction of the dependence, although prior to that, the original data dependence was introduced in the baseline revision of jEdit, 4631. Had the two nodes representing line 128 instead been considered to match one another, then the two graphs would have been seen as identical: no dependences would have been seen as added or removed. This change would therefore not have led to any origin. The effects of these decisions are explored in more detail in Section 5.
The only other change is the code associated with the bug fix shown in Figure 8. Here, the control dependences of the three assignments (starting evt.consume();) onto line 320 are removed and replaced with control dependences onto the new if-statement, as shown in Figure 13. In addition, a new control dependence from the if-statement to the case-statement is introduced. As stated previously, changes within conditionals are regarded as lines mapping to each other in the PDG, so the change on line 320 can be disregarded. Therefore, the focus here is on the removed control dependences and the revision(s) in which they originated. As this is the code associated with the bug fix, tracking back to identify when these were introduced shows that they were all added in revision 7998.
Figure 13. Program dependence graph for VFSDirectoryEntryTable.processKeyEvent() which corresponds to the code in Figure 8 (many unchanged nodes and dependences have been omitted).
Download figure to PowerPoint
As mentioned earlier, the dependence approach only identifies one revision: the most recent based on removed dependences (or added dependences if none are removed). Here, the dependence approach will return revision 9596. Note though that if the two versions of line 128 had been considered to be the same node, by ignoring type changes, then the approach would have returned the actual origin, 7998. This example therefore highlights the major impact that subtle changes in the definition of PDG differences have on the dependence approach, particularly as it only returns a single revision. This is explored further in Section 5.