Editor's Note: Each year that the ASIS&T Research Award is given we invite the recipient to share his or her research goals and discoveries with Bulletin readers. This year's recipient is Kalervo Järvelin, professor and vice chair at the School of Information Sciences, University of Tampere, Finland. He can be reached at Kalervo.Jarvelin@uta.fi.
Originally aiming to become a librarian, I was first introduced to information science and information retrieval (IR) by my first professor, Sinikka Koskiala, at the University of Tampere in 1972. She guided me to read F.W. Lancaster's Information Retrieval Systems: Characteristics, Testing and Evaluation (1968), Manfred Kochen's The Growth of Knowledge (1967), Gerard Salton's Automatic Information Organization and Retrieval (1968) and several other excellent texts. The ideas gained from them remained in my mind while I studied computer science, database management in particular, and nearly stayed in that area as a researcher. Accidentally, I returned to information science in the early 1980s and assumed the responsibility of developing the curriculum for classification, indexing and information retrieval at one of the predecessors of my current school, the School of Information Sciences, University of Tampere, Finland.
My initial research efforts were split between information seeking and knowledge-work augmentation on the one hand, and relational database management on the other. Both interests have continued to date, but IR has formed my main research area since the early 1990s. For the curious reader, my publications are listed at http://people.uta.fi/∼kalervo.jarvelin/KalPubl.html, but most of them may be found through Google Scholar.
The initial driving aim in my research in IR was that all information should be available to anyone desiring it and in an accessible form, no matter in which form or language it is stored or where it is located. Today, much of this availability has been realized in the form of the web, its search engines and the resources accessible through them. With my colleagues in the research group FIRE, I have been happy to contribute to IR in the areas of natural language processing (NLP) method evaluation for IR, ontology-based query expansion and relevance feedback, cross-language IR (CLIR) methods/evaluation and IR evaluation metrics. This work has been great fun.
Originally, IR methods were developed for English, which is a morphologically simple language. This characteristic means that very simple methods of stemming are sufficient to make documents accessible as far as language is concerned. My native language is Finnish, which is highly inflectional. Every noun may have some 2000 inflectional forms, for example, in contrast to four forms in English. This complexity means that high recall is difficult to achieve with simple methods in Finnish – which has given Finnish benchmark-language status in NLP and CLIR experiments. Lemmatizers (see http://en.wikipedia.org/wiki/Lemmatisation) were seen as necessary instead of stemmers for document representation. We also noted that many other languages are, while morphologically simpler than Finnish, clearly more complicated than English. We have not created stemmers or lemmatizers for any language ourselves but have evaluated their effectiveness for document representation in a range of languages. However, such tools cannot always be applied – if one has no control over database production – or be available at all for many languages. We have created lightweight statistical lemmatizers for indexing, and morphologically smart query-time tools for expanding the original query words to their most frequent inflectional forms of each language. We have shown such methods to be effective. These findings are good news in the global information access scene, where many languages are not nearly as well equipped as English.
One of the basic tough problems in IR is vocabulary mismatch: the searcher's query words do not match with the words in relevant documents. Ontology-based query expansion and relevance feedback are two approaches to solve the problem through query reformulation. Both expand the query with new words that are semantically, syntagmatically or (at least) statistically associated with the original ones and hopefully better match the relevant document texts. We were among the first to analyze the effectiveness of various query structures in semantic query expansion in best-match IR in late 1990s and identified the effective synonym structure for expansion. Interactive relevance feedback, while not really popular in practice among searchers, has been an appealing idea for query modification for a long time. Here the searcher examines the result of an initial query and identifies the relevant results for the search engine. We have shown through a number of simulation studies that an effective approach is to provide feedback only on a few first results. This finding holds even if the first results are of marginal relevance and one aims to retrieve only highly relevant documents.
Cross-language IR methods gained in importance along with the global development of web IR. We developed in the late 1990s the dictionary-translation method for CLIR based on synonym structure. In a bilingual translation setting, the target language translation equivalents (for example, in English) of a single source language word (for example, Spanish) are all put into a synonym set in the target query without attempting to disambiguate various word senses. This simple method proved very effective and served as a challenging baseline in CLIR for a number of years. However, dictionary translation in CLIR may be bogged down by OOV (out-of-vocabulary) words. These words may be proper names spelled differently in different languages or technical terminology not covered by a machine-readable dictionary. We developed novel and effective approximate string-matching methods and statistical transliteration-based methods to overcome the OOV problems during 2000–2010.
My search engine is better than yours. – Statements like this one are often sought after in IR research and are based on IR evaluation, which is sometimes referred to as a hallmark and distinctive feature of IR research. In the early years of the U.S. National Institute of Standards' Text REtrieval Conference (TREC), in the 1990s, test-collection based evaluation used binary relevance assessments with a very liberal relevance criterion. In addition, the evaluation itself was dominated by a scenario where the (simulated) searcher was exhaustively searching for relevant documents. We asked the questions: What if most documents are of marginal value, others being highly relevant? What if early retrieval of a relevant document, of any degree of relevance, is far more valuable than late retrieval? These questions led to the development of evaluation methods by highly relevant documents and, in particular, to a family of evaluation metrics based on cumulated gain. Among the latter normalized discounted cumulated gain, the nDCG, became very popular in IR evaluation and also in operational development within search engine companies.
While the progress in the field of IR is astonishing and impossible for anyone to follow-up in all detail, I currently believe that we can do a much better job in supporting information access in people's tasks and focused everyday-life information need situations. My current work focuses on task-based IR and interactive IR, including simulation of multiple query IR sessions.
Regarding task-based IR, we have collected comprehensive qualitative data in two task settings, research tasks in molecular medicine and administrative tasks in city administration. We are planning to continue these efforts in public administration and commercial companies. The data collection methods include interviewing, task performance shadowing, client-side interaction logging, photo logging through SenseCam and questionnaires. We have found how information needs in simple tasks are satisfied through one or a few organizational information systems, while complex tasks require a range of sources and traversing through several types of systems, not just one search engine. We also classified barriers in information access by their character (conceptual, syntactic and technological) and by their context of appearance (work task, system integration, or system) and analyzed how these depend on task complexity.
Taking the human searcher as an actor (and thus, a variable) in IR research design poses many challenges. Humans learn, get tired and are expensive to hire for experiments. At any step in an interaction, they may take a range of decisions that may lead to the termination of their search session with either success or frustration. Such decisions depend on many factors such as their personal traits, work task and search task, current situation in the search, search strategy, the quality of document representation or search platform among others. Human information access behavior can be modeled, to some degree, through behavioral probabilities observed in real life. This ability provides an opportunity to simulate interactive sessions in the computer economically and without (unprogrammed) learning effects or fatigue. In fact, one may run in reasonable time (hours) experiments involving many million interactive sessions and identify which kind of decisions or behaviors are likely to lead to successful results. We have recently shown that expected human fallibility in providing relevance feedback does not deteriorate search results and how important it is to consider time factors as opposed to plain ranking quality in IR evaluation.