Fast, Compact, Immediate-Access Indexing for Learned Sparse Retrieval Systems

Abstract

Learned sparse retrieval (LSR) is an emerging paradigm that uses pretrained language models to assign learned weights to the terms of a document, enabling practitioners to deploy next-generation rankers within their existing lexical retrieval pipelines. Although LSR systems have been found to provide strong increases in effectiveness over traditional statistical approaches, this boost comes at the cost of both indexing and retrieval efficiency. In this work, we explore the application of LSR to a practical online setting where new documents must be indexed and searchable as soon as they arrive. In particular, we create a clean-room re-implementation of the current state-of-the-art linked block dynamic indexing approach, and propose a set of important augmentations that enable efficient online indexing and query processing to generalize to the learned sparse regime. Our results over two traditional and two LSR models, and a multitude of experimental settings, demonstrate the practicality of our approach, allowing new documents to be queried at a reasonable latency while maint ining fast insertion ability.

Publication
Proceedings of the 48th European Conference on Information Retrieval (ECIR 2026)
Date
Links