Iterative cross-training: An algorithm for learning from unlabeled Web pages
Article first published online: 12 JAN 2004
DOI: 10.1002/int.10157
Copyright © 2004 Wiley Periodicals, Inc.
Issue
1098-111X/asset/cover.gif?v=1&s=7f7c12f2c86265974044b2b3f9936860ffc468a0)
International Journal of Intelligent Systems
Special Issue: Intelligent Technologies
Volume 19, Issue 1-2, pages 131–147, January - February 2004
Additional Information
How to Cite
Soonthornphisaj, N. and Kijsirikul, B. (2004), Iterative cross-training: An algorithm for learning from unlabeled Web pages. Int. J. Intell. Syst., 19: 131–147. doi: 10.1002/int.10157
Publication History
- Issue published online: 12 JAN 2004
- Article first published online: 12 JAN 2004
Funded by
- Thailand Research Fund
- National Electronics and Computer Technology Center
- Abstract
- References
- Cited By
Abstract
The article presents a new learning method, called iterative cross-training (ICT), for classifying Web pages in three classification problems, i.e., (1) classification of Thai/non-Thai Web pages, (2) classification of course/non-course home pages, and (3) classification of university-related Web pages. Given domain knowledge or a small set of labeled data, our method combines two classifiers that are able to use effectively unlabeled examples to iteratively train each other. We compare ICT against the other learning methods: a supervised word segmentation classifier, a supervised naïve Bayes classifier, and a co–training-style classifier. The experimental results on three classification problems show that ICT gives better performance than those of the other classifiers. One of the advantages of ICT is that it needs only a small set of prelabeled data or no prelabeled data in the case that domain knowledge is available. © 2004 Wiley Periodicals, Inc.

1098-111X/asset/INT_left.gif?v=1&s=c0d44ac5ce99265330169e2ac3d22da4ab6b1a5d)
1098-111X/asset/INT_centre.gif?v=1&s=e94826a6788e7bb0695867b68ca2c030d8c7a252)
1098-111X/asset/INT_right.gif?v=1&s=d4616ff123f9b0a0199cc9f89f77f112e4ce3a70)