Index-Based Batch Query Processing Revisited


Large scale web search engines provide sub-second response times to interactive user queries. However, not all search traffic arises interactively - cache updates, internal testing and prototyping, generation of training data, and web mining tasks all contribute to the workload of a typical search service. If these non-interactive query components are collected together and processed as a batch, the overall execution cost of query processing can be significantly reduced. In this reproducibility study, we revisit query batching in the context of large-scale conjunctive processing over inverted indexes, considering both on-disk and in-memory index arrangements. Our exploration first verifies the results reported in the reference work [Ding et al., WSDM 2011], and then provides novel approaches for batch processing which give rise to better time-space trade-offs than have been previously achieved.

Proceedings of the 45th European Conference on Information Retrieval (ECIR 2023)