SEARCH

SEARCH BY CITATION

In this issue, Chen and Fu examine Rocchio's similaritybased relevance feedback algorithm, one of the most used query reformation methods in information retrieval. Despite its popularity in various applications, there has been little rigorous analysis of its learning complexity. The authors show that the learning complexity of Rocchio's algorithm is O( d + d2(log d + log n)) over the discretized vector space {0,…, n − 1}d, when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier ( q, 0) over {0, …, n − 1}d can be improved to, at most, 1 + 2k ( n − 1) (log d + log( n − 1), where k is the number of nonzero components in q. Several lower bounds on the learning complexity are also obtained for Rocchio's algorithm.

Rorissa reports on a study that tests Tversky's contrast model, which equates the degree of similarity of two stimuli to a linear combination of their common and distinctive features, in the context of image representation and retrieval. Data were collected from 150 participants who performed an image description and a similarity judgment task. Structural equation modeling, correlation, and regression analyses confirmed the relationships between perceived features and similarity of objects hypothesized by Tversky. The results hold implications for future research that will attempt to further test the contrast model and assist designers of image organization and retrieval systems by pointing toward alternative document representation and similarity measures that more closely match human similarity judgments.

Ou, Khoo, and Goh describe the development of a method for automatic construction of multidocument summaries of sets of research abstracts that may be retrieved by a digital library or search engine in response to a user query. Sociology dissertation abstracts were selected as the sample domain. A variable-based framework was proposed for integrating and organizing research concepts and relationships as well as research methods and contextual relations extracted from four different dissertation abstracts. Based on the framework, a new summarization method was developed, which parses the discourse structure of abstracts, extracts research concepts and relationships, integrates the information across different abstracts, and organizes and presents them in a Web-based interface. A user evaluation was performed to assess the overall quality and usefulness of the summaries. Two types of variable-based summaries generating using the summarization method—with or without the use of a taxonomy—were compared against a sentence-based summary that lists only the research-objective sentences extracted from each abstract and another sentence-based summary generated using the MEAD system that extracts important sentences. The evaluation results indicate that the majority of sociological researchers (70%) and general users (64%) preferred the variable-based summaries generated with the use of the taxonomy.

Sawyer and Huang examine the discernibly different patterns among conceptualizations of information, technology, and people across information systems and information science literature. The intent of the study is to clarify the differences in these two areas of scholarship, and to further encourage the substantial overlap possible, but not yet engaged, in the research pursued in these areas. The authors analyze published literature in the areas to frame the discussion of the challenges and opportunities for scholars in information science and information systems disciplines to engage in collaborative work.

Hjørland contrasts Bates' understanding of information as an observer-independent phenomenon with an understanding of information as situational, put forward by, among others, Bateson, Yovits, Spang-Hanssen, Brier, Buckland, Goguen, and Hjørland. The conflict between objective and subjective ways of understanding information corresponds to the conflict between an understanding of information as a thing or a substance versus an understanding of it as a sign. It is a fundamental distinction that involves a whole theory of knowledge, and it has roots back to different metaphors applied to Shannon's information theory. It is argued that a subject-dependent/situation specific understanding of information is best suited to fulfill the needs in information science and that it is urgent to base information science on this alternative theoretical frame.

Shankar reports on a study in which, using ethnographic methods, the author studied record-keeping as it is practiced in a basic research science laboratory. The process by which the record is created to reflect both personal need and professional norms is framed as a series of acts of selection, synthesis, and standardization. The article concludes with reflections on the role of deep understanding of scientific record-keeping for other disciplines and the design of digital laboratory techniques.

Hong, Thong, and Tam describe a controlled laboratory experiment involving 230 participants (business undergraduate students from a public university) to study the effects of animation on searching and browsing tasks in the context of online shopping, where flash animation is applied to product lists (i.e., non-banner-ads animation). The search task situation involved users searching for a specific brand of product from among a list of items presented on the Web page before making their purchase decisions. The browsing task situation involved users with no specific brand specified, who were browsing through a list of different brands of a type of product before making their purchase decisions. Participants were randomly assigned to one of four experimental conditions: search task with animation, search task without animation, browsing without animation, and browsing with animation. Users' clicking behavior was tracked through a Web log. Task performance was recorded in terms of the average time per shopping trip and the average number of clicks made in each shopping trip. After completing six shopping trips, each participant completed an online survey that measures perceived focused attention while completing the shopping tasks and attitudes toward using the Web site. The results indicate that non-banner-ads animation does attract Web users' attention, with the animated item more likely to be clicked first and also more likely to be purchased when users are performing browsing tasks. Web users' task performance and perceptions are negatively affected in the presence of animation. Moreover, the negative effects of animation on task performance are greater in browsing tasks than searching tasks. Finally, experience can help Web users to reduce the distraction from animation and is more effective when users are engaged in searching tasks than when they are engaged in browsing tasks.

Buschman asserts that, despite quantities of popular rhetoric, democratic theory holds an aposiopetic place within library and information science (LIS) in both senses of that word. It is both in a stasis holding to basic ideas outlined 200 years ago and also a silence largely maintained. Areview of a number of state-of-the-literature reviews makes the case that it has not been systematically explored or applied, and most LIS work elides the questions democratic theory raises. The author concludes that it is time to emend this and account for a relevant intellectual source which can more firmly ground LIS practice and research in normative terms. Toward that end, three productive wellsprings of democratic theory are reviewed: Jürgen Habermas, Sheldon Wolin, and those working on democratic eduction (Amy Gutmann, Richard Brosio, and Maxine Greene). The article concludes with an outline of some possible LIS questions and approaches drawn from these democratic theorists.

Lorigo and Pellacini present results from a real-world study depicting remote collaboration trends of a community of more than 87,000 scientists over 30 years. Journal publication data were obtained from the SPIRES-HEP database, a professionally maintained database of high-energy physics that has been run by the Stanford Linear Accelerator Center since the late 1960s. The authors utilized publication records of more than 200,000 scholarly journal articles, together with affiliations of the authors, to infer distance collaborations. The longevity of the study is of interest because it covers several years before and after the birth of the Internet and computer-supported collaborative work (CSCW). The results indicate that there has been a steady and constant growth in the frequency of both interinstitute and cross-country collaborations in a particular physics domain, regardless of the introduction of these technologies. This suggests an evolution, rather than a revolution, with respect to long-distance collaborative behavior.

Chung and Neuman describe a study that examined the activities and strategies that 11th grade students with high academic abilities used during their information seeking and use to complete class projects in a Persuasive Speech class. The study took place in a suburban high school in Maryland, and participants included 21 junior honors students, their teacher, and their library media specialist. Each student produced a 5–7 minute speech on a self-chosen topic. Conducted in the framework of qualitative research in a constructivist paradigm, the study used data collected from observations, individual interviews, and documents students produced for their projects—concept maps, paragraphs, outlines, and research journals. Interview and observation data were analyzed using the constant comparative method with the help of QSR NVivo. Students' documents were analyzed manually. The findings show that students' understanding, strategies, and activities during information seeking and use were interactive and serendipitous, and that students learned about their topics as they searched. The research suggests that high school honors students in an information-rich environment are especially confident with learning tasks requiring an exploratory mode of learning.

Chua compares the preparation and response efforts to hurricanes Katrina and Rita through a knowledge management (KM) perspective. To achieve this objective, a theoretical KM framework is developed to examine the KM processes that underpin disaster management activities. The framework is then used to identify different dimensions along which the two disasters can be compared. Data were drawn from a variety of sources such as newspapers, newswires, press releases, transcripts of television interviews/presentations, and congressional hearings via LexisNexis. Additionally, blogs and Wikipedia sites that discussed Katrina and Rita were searched using technorati.com and the meta-search engine Copernic. The search covered materials dating from August 26, 2005, 4 days before Katrina struck, to December, 13, 2005. The search yielded 784 documents. The data collected were analyzed using a systematic approach of textual analysis. First, all 784 documents were coded by six graduate students familiar with textual coding. Intercoder reliability was established using Cohen's kappa for all variables. Six variables were specified: the prediction of Katrina and the prediction of Rita, the implementation of disaster plans for Katrina and for Rita, and rescue operations in Katrina and in Rita. Of the 784 documents analyzed, 546 were identified as addressing at least one of the six variables. The author further analyzed these 546 documents. The findings indicate that nonchalance toward Katrina's imminence, grossly inadequate preparations, and chaotic responses contrast to the precautionary measures and responses operations primed for Rita. The author concludes with three KM implications for managing future large-scale disasters relating to the process of knowledge creation that underpins disaster prediction, the knowledge reuse process, and the process of knowledge transfer among agencies involved in the management of relief and rescue operations.

Moore, Erdelez, and He describe a study that examined ways in which search experience is defined and measured when used as a research variable. The qualitative design began by identifying a body of LIS articles that reported empirical research results with search experience as a research variable. Articles were identified using a “pearl-growing” approach that starts with articles known to be relevant and uses various bibliographic and online search techniques to identify other relevant articles. The full text of 120 articles was reviewed. Articles that were repetitive (i.e., two or more articles based on the same study) or did not provide a sufficient level of detail about how search experience was measured in the study were eliminated. The final set consisted of 32 articles published during the period 1981 to 2004. A content analysis of the articles identified 19 unique concepts used to describe the search experience variable. Based on semantic meanings, the authors grouped these concepts into six major categories: search experience, online database search experience, information seeking experience, Web/Internet experience, computer experience, and other. The content analysis also identified 18 unique measures of search experience, which the authors grouped into three broad categories: professional demographics, self-reported experience, and objective assessment. The authors found that a majority of the studies used a generic label “search experience” and relied on the reader to grasp specific context of the information retrieval environment to which the variable applies from the description of the overall research design. In addition, there was a strong preference for measures that represented subjective self-reporting about a level of exposure to some information retrieval system. The authors conclude that there is need for detailed definitions of search experience variables for readers to truly understand the research findings.

Vanclay, in a brief communication, asserts that the h-index is robust, remaining relatively unaffected by errors in the long tails of the citations-rank distribution, such as typographic errors that short-change frequently cited articles and create bogus additional records. This robustness, and the ease with which h-indices can be verified, support the use of a Hirsch-type index over alternatives such as the journal impact factor. These merits of the h-index apply both to individuals and to journals.

Rousseau, in a brief communication, addresses Egghe's construction of Lorenz curves. Egghe proposed a new approach to the concept of a Lorenz curve (2005). In reaction, Burrell (2006) cautioned potential users of Egghe's approach that this new construction would not coincide with the continuous approach to the classical Lorenz curve. Rousseau concludes that Burrell performed an interesting analysis, but that Egghe's theory of continuous concentration does include the construction of a standard Lorenz curve.