SEARCH

SEARCH BY CITATION

In this paper, we tackle the private information retrieval (PIR) problem associated with the use of Internet search engines. We address the desire for a user to retrieve information from the Web without the search provider learning about it. Traditional PIR protocols present two main shortcomings for their application: (i) They assume cooperation by the database, which is not affordable for a real-world search engine like Google and (ii) their computational complexity is linear in the size of the database, which is unfeasible in the case of the Web. More recent approaches relax PIR conditions to overcome these limitations and present some level of privacy. Mostly, they aim to distort server logs regardless of the loss of information that is involved. Server logs are used by search engines for profiling and, thereby, provide personalized results. This becomes a user's need given the growth of the Web and can also be used for targeted advertising. This study focuses on a noncooperative agent for private search that considers profiling as valuable data used for both sides of the search process. It is based on the assumption that the user's identity is formed by the union of various areas of interests or facets. Managing the HTTP connections properly, submitted queries are mapped to different server logs according to these facets. The rationale is that these logs cannot be used for tracing the user while they are still helpful for profiling. We present a personalized query classification approach based on the user's browsing history and to provide empirical results; we developed an attacking algorithm against the agent that shows that the disclosure risk is reduced.