Research Article
On the development of name search techniques for Arabic
Article first published online: 21 FEB 2006
DOI: 10.1002/asi.20323
Copyright © 2006 Wiley Periodicals, Inc.
Issue

Journal of the American Society for Information Science and Technology
Volume 57, Issue 6, pages 728–739, April 2006
Additional Information
How to Cite
Aqeel, S. U., Beitzel, S., Jensen, E., Grossman, D. and Frieder, O. (2006), On the development of name search techniques for Arabic. J. Am. Soc. Inf. Sci., 57: 728–739. doi: 10.1002/asi.20323
Publication History
- Issue published online: 24 MAR 2006
- Article first published online: 21 FEB 2006
- Manuscript Accepted: 9 MAR 2005
- Manuscript Revised: 21 FEB 2005
- Manuscript Received: 28 JUL 2003
- Abstract
- Article
- References
- Cited By
Abstract
The need for effective identity matching systems has led to extensive research in the area of name search. For the most part, such work has been limited to English and other Latin-based languages. Consequently, algorithms such as Soundex and n-gram matching are of limited utility for languages such as Arabic, which has vastly different morphologic features that rely heavily on phonetic information. The dearth of work in this field is partly caused by the lack of standardized test data. Consequently, we have built a collection of 7,939 Arabic names, along with 50 training queries and 111 test queries. We use this collection to evaluate a variety of algorithms, including a derivative of Soundex tailored to Arabic (ASOUNDEX), measuring effectiveness by using standard information retrieval measures. Our results show an improvement of 70% over existing approaches.

1532-2890/asset/olbannerleft.gif?v=1&s=d833098325c9f1060bcbee51adf276c155608167)
1532-2890/asset/olbannercenter.gif?v=1&s=661179918edb4fa732edfd3408eb050a6ce87809)
1532-2890/asset/olbannerright.gif?v=1&s=1ef8a363944134c502cbffa1937878a71b4cc635)