Identifying composite crosscutting concerns through semi-supervised learning



Aspect mining improves the modularity of legacy software systems through identifying their underlying crosscutting concerns (CCs). However, a realistic CC is a composite one that consists of CC seeds and relative program elements, which makes it a great challenge to identify a composite CC. In this paper, inspired by the state-of-the-art information retrieval techniques, we model this problem as a semi-supervised learning problem. First, the link analysis technique is adopted to generate CC seeds. Second, we construct a coupling graph, which indicates the relationship between CC seeds. Then, we adopt community detection technique to generate groups of CC seeds as constraints for semi-supervised learning, which can guide the clustering process. Furthermore, we propose a semi-supervised graph clustering approach named constrained authority-shift clustering to identify composite CCs. Two measurements, namely, similarity and connectivity, are defined and seeded graph is generated for clustering program elements. We evaluate constrained authority-shift clustering on numerous software systems including large-scale distributed software system. The experimental results demonstrate that our semi-supervised learning is more effective in detecting composite CCs. Copyright © 2013 John Wiley & Sons, Ltd.