Knowledge is power, wrote Bacon, because understanding cause and effect enables one to make things happen. More generally, knowing the context and relationships involved makes the difference between seeing and understanding. Researchers, students, the general reader, editors, … everyone is necessarily concerned with the context and relationships of whatever is studied: What other documents relate to this topic? Where and when did this happen? What else was going on around that time and place? Who were the people and institutions mentioned? How were they related? What else did they do?
Over time genres of auxiliary resources have evolved: dictionaries and encyclopedias; bibliographies and library catalogs; place name gazetteers and maps; timelines and chronologies; biographical dictionaries; and so on. In a print environment the reference collection of the library provides a carefully constructed environment of auxiliary resources well-designed for exploring context. In a well-stocked library one can consult a diversity of documents and reference works to build up an understanding of any topic. Bibliographies and library catalogs provide documentary context by listing documents associated with any author or topic. Place name gazetteers and maps provide geographical context by identifying places, indicating where they are, what kind of place they are, and how they are spatially related to other places. Chronologies and timelines place events in temporal context by listing them in calendar time and showing what happened in any period and what else was going on at, before or after any given event. Biographical dictionaries and Who's Whos provide personal contexts by explaining who people were and something about what they did, where and when they did it, and who else they were associated with. Encyclopedias provide a wide range or information.
In the digital environment the need for the functionality of a well-stocked, well-organized reference collection has been neglected and the organizational genius of well-selected, easy-to-use reference collections has been neglected. In the potentially better-stocked Internet environment, one should be able to be able to enjoy an even better and more supportive selfservice “reference” environment.
We present a progress report on a project to create, demonstrate, and evaluate techniques to enable anyone to easily, inexpensively, and rapidly search out the background for topics, places, events, institutions, and persons encountered in online reading of academic texts of the kind provided by JSTOR. The project team includes specialists in bibliographical access, online search support, markup, a professional editor of historical texts, a professor of Celtic Studies, and students.
Three technical deliverables are involved: two tools and a technique:
- i.A Context Finder: To find contextualizing information about any word or phrase of interest encountered when reading online. The reader determines whether to search for a place, a time period, a person, an institution, or a topic, then chooses from a menu of appropriate searchable resources. Searches are automatically generated and results displayed. When the text already contains TEI or other XML compliant mark-up the interface could use the mark-up to pre-set default facet and resource options.
- ii.A Context Builder: An enhanced Context Finder with the option of adding XML compliant mark-up into the text such that the facet identification (as topic, place, period, person, or institution) and the choice of resource could be remembered as mark-up, or used to enrich or replace existing mark-up. Either way, the next reader could have pre-prepared live search prompts for the same auxiliary sources, perhaps by now more up-to-date. The editorial tool could serve everybody who wants to retain search annotations, including professional editors. Annotations and marked-up texts can be shared in a contemporary peer production mode.Named entity recognition software can be used to pre-process the text by identifying proper nouns - including persons, places, and other recognizable names - by a combination of natural language processing and reference to lists.
- iii.A Context Provider: Suppose that some historical and literary texts mention a hundred different place names and that these have been marked-up with links to a suitable place name gazetteer. The mark-up and the positions in the text could be extracted, rearranged by place name, and used to augment and enrich the gazetteer. The gazetteer becomes a geographical index to the texts; a map of the geographic aspects of the texts can be produced; and the geography of different texts can be compared. Time periods, persons, and institutions could be treated similarly.
As of May 2008, an initial versions of the Context Finder are functioning and are being used on biographical articles in Citizendiumand Wikipediaand on pages of academic articles on Irish culture and history that have been scanned, OCR'd, and marked up for future inclusion in JSTOR. Initially, the interface draws on a limited range of bibliographical and encyclopedic resources. By the time of the ASIST Annual Meeting we expect to be able to describe and demonstrate (1) a more refined Context Finderdrawing on a wide range of explanatory resources; (2) A Context Builderwhich adds links to explanatory resources ready for the next reading or the next reader; and (3) At least one example of Context Providingwhereby links from texts to a place name gazetteer (or other explanatory resource) have been copied, reversed, and pasted into the explanatory resource as mark-up, such that the gazetteer now offers links to where places are mentioned in texts.
The motivation for this work was inspired by the resource needs of undergraduates seeking to complete assignments using their laptops in their dorms late at night. The huge increase in texts made available online through JSTOR and other mass digitization projects means that the potential application of these tools is very extensive.