An architecture for scaling federated search



Federated search has the tremendous potential to make a wide range of diverse information and viewpoints available to scientists, researchers and the public. Traditional federated search engines provide access to a relatively small number of content sources, – typically several dozen or fewer. But, depending on the discipline, there may exist hundreds of databases with relevant content. And, given the value of searching content sources in fields that are seemingly unrelated, a researcher will benefit from simultaneously searching hundreds, or thousands of sources. The greater the number of relevant and diverse high quality sources a researcher can access, the faster he or she will make discoveries that advance science and improve the quality of our lives.

The current paradigm for federated search suffers from a number of problems that hinder the development of large and scalable federated search engines. Search speed, relevance ranking, and source selection all suffer in today's paradigm as the number of sources increases. Deep Web Technologies, a Santa Fe New Mexico-based federated search technology company, is pioneering the effort to build applications that overcome these obstacles. Deep Web Technologies has created a hierarchical “divide-and-conquer” architecture that distributes the federated search work flow to eliminate the traditional bottlenecks and allow for massive scalability.

Deep Web Technologies built the search engine behind, a global gateway to government-produced and government-supported science research information. employs Deep Web Technologies' hierarchical approach to search sources that are themselves federated search engines. searches 140 sources through this approach. Deep Web Technologies is building, by mid-2009, a 500-source science research portal. Deep Web Technologies will describe, in the ASIS 2009 poster session, its architecture and how it facilitates large-scale scalability of federated search applications.