In recent years, a large amount of information has become available online in the form of web documents, social networks, or blogs. Such networks are large, heterogeneous, and often contain a huge number of links. This linkage structure encodes rich structural information about the topical behavior of the network. Such networks are often dynamic and evolve rapidly over time. Much of the work in the literature has focused on classification either with purely text behavior or with purely linkage behavior. Furthermore, the work in the literature is mostly designed for static networks. However, a given network may be quite diverse, and the use of either content or structure could be more or less effective in different parts of the network. In this paper, we examine the problem of node classification in dynamic information networks with both text content and links. Our techniques use a random walk approach in conjunction with the content of the network to facilitate an effective classification process. Our approach is dynamic, and can be applied to networks which are updated incrementally. Our results suggest that an approach based on both content and links is extremely robust and effective. We also present methods to perform supervised keyword-based clustering of nodes using this approach. We present experimental results illustrating the effectiveness and efficiency of our classification approach. We also show that the approach is able to find effective and coherent clusters. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 5: 16–34, 2012
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.