Research Article
Proper nouns in English–Arabic cross language information retrieval
Article first published online: 9 JUL 2008
DOI: 10.1002/asi.20913
© 2008 ASIS&T
Issue

Journal of the American Society for Information Science and Technology
Volume 59, Issue 12, pages 1925–1932, October 2008
Additional Information
How to Cite
Bellaachia, A. and Amor-Tijani, G. (2008), Proper nouns in English–Arabic cross language information retrieval. J. Am. Soc. Inf. Sci., 59: 1925–1932. doi: 10.1002/asi.20913
Publication History
- Issue published online: 12 SEP 2008
- Article first published online: 9 JUL 2008
- Manuscript Accepted: 22 MAY 2008
- Manuscript Revised: 21 MAY 2008
- Manuscript Received: 30 JAN 2008
- Abstract
- Article
- References
- Cited By
Abstract
Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do not cover most proper nouns, which are usually primary keys in the query. As they are spelling variants of each other in most languages, using an approximate string matching technique against the target database index is the common approach taken to find the target language correspondents of the original query key. N-gram technique proved to be the most effective among other string matching techniques. The issue arises when the languages dealt with have different alphabets. Transliteration is then applied based on phonetic similarities between the languages involved. In this study, both transliteration and the n-gram technique are combined to generate possible transliterations in an English–Arabic CLIR system. We refer to this technique as Transliteration N-Gram (TNG). We further enhance TNG by applying Part Of Speech disambiguation on the set of transliterations so that words with a similar spelling, but a different meaning, are excluded. Experimental results show that TNG gives promising results, and enhanced TNG further improves performance.

1532-2890/asset/olbannerleft.gif?v=1&s=d833098325c9f1060bcbee51adf276c155608167)
1532-2890/asset/olbannercenter.gif?v=1&s=661179918edb4fa732edfd3408eb050a6ce87809)
1532-2890/asset/olbannerright.gif?v=1&s=1ef8a363944134c502cbffa1937878a71b4cc635)