• reverse engineering;
  • software evolution;
  • communicated information;
  • execution log analysis


Substantial research in software engineering focuses on understanding the dynamic nature of software systems in order to improve software maintenance and program comprehension. This research typically makes use of automated instrumentation and profiling techniques after the fact, that is, without considering domain knowledge. In this paper, we examine another source of dynamic information that is generated from statements that have been inserted into the code base during development to draw the system administrators' attention to important run-time phenomena. We call this source communicated information (CI). Examples of CI include execution logs and system events. The availability of CI has sparked the development of an ecosystem of Log Processing Apps (LPAs) that surround the software system under analysis to monitor and document various run-time constraints. The dependence of LPAs on the timeliness, accuracy and granularity of the CI means that it is important to understand the nature of CI and how it evolves over time, both qualitatively and quantitatively. Yet, to our knowledge, little empirical analysis has been performed on CI and its evolution. In a case study on two large open source and one industrial software systems, we explore the evolution of CI by mining the execution logs of these systems and the logging statements in the source code. Our study illustrates the need for better traceability between CI and the LPAs that analyze the CI. In particular, we find that the CI changes at a high rate across versions, which could lead to fragile LPAs. We found that up to 70% of these changes could have been avoided and the impact of 15% to 80% of the changes can be controlled through the use of robust analysis techniques by LPAs. We also found that LPAs that track implementation-level CI (e.g. performance analysis) and the LPAs that monitor error messages (system health monitoring) are more fragile than LPAs that track domain-level CI (e.g. workload modelling), because the latter CI tends to be long-lived. Copyright © 2013 John Wiley & Sons, Ltd.