A Common Framework for Exploring Document-at-a-Time and Score-at-a-Time Retrieval Methods


Document-at-a-time (DaaT) and score-at-a-time (SaaT) query evaluation techniques represent different approaches to top-$k$ retrieval with inverted indexes. While modern deployed systems are dominated by DaaT methods, the academic literature has seen decades of debate about the merits of both. Recently, there has been renewed interest in SaaT methods for learned sparse lexical models, where studies have shown that transformers generate ‘wacky weights’ that appear to reduce opportunities for optimizations in DaaT methods. However, researchers currently lack an easy-to-use SaaT system to support further exploration. This is the gap that our work tries to fill. Starting with JASS, a modern SaaT system, we built Python bindings to create PyJASS, and then further integrated PyJASS into the Pyserini IR toolkit. The result is a common frontend to both a DaaT system (Lucene) and a SaaT system (JASS). We demonstrate how recent experiments with a wide range of learned sparse lexical models can be easily reproduced. Our contribution is a framework that enables future research comparing DaaT and SaaT methods in the context of modern neural retrieval models.

Proceedings of the 45th International ACM Conference on Research and Development in Information Retrieval (SIGIR 2022)