This research investigates the overap in the results found in the three major search engines: Google-Yahoo-MSN Live (GYM). Overlap was measured on per query basis and also by pooling all the indexed URLs retrieved by each engine for all the queries searched. Relevance of returned results was not evaluated.
Using a random sample of 65,000 queries from the AOL query log data set, searches were conducted in the three major search engines (Google-Yahoo-MSN Live) using the search engine APIs. Each query was passed to the search engine and the first 10 results were stored along with an search engine identifier. Before comparing the sets we developed processes to reliably compare the individual pairs of URLs in the sets. We considered three approaches to this issue: domain matches, exact matches, and relative matches. The results in each result set were evaluated by considering them as a (a) ranked (ordered) list, and (b) unordered list. To evaluate the similarity of the result sets first we compared across all three engines, then we conducted pairwise comparisons of the three search engines.