Efficient Query Processing Techniques for Next-Page Retrieval


In top-$k$ ranked retrieval the goal is to efficiently compute an ordered list of the highest scoring $k$ documents according to some stipulated similarity function such as the well-known BM25 approach. In most implementation techniques a min-heap of size $k$ is used to track the top scoring candidates. In this work we consider the question of how best to retrieve the second page of search results, given that a first page has already been computed; that is, identification of the documents at ranks $k + 1$ to $2k$ for some query. Our goal is to understand what information is available as a by-product of the first-page scoring, and how it can be employed to accelerate the second-page computation, assuming that the second-page of results is required for only a fraction of the query load. We propose a range of simple, yet efficient, next-page retrieval techniques which are suitable for accelerating Document-at-a-Time mechanisms, and demonstrate their performance on three large text collections.

Information Retrieval Journal