Cost-Effective Updating of Distributed Reordered Indexes


Index reordering techniques allow document collections to be renumbered, with the goal of developing a permutation of the initial document ordinal identifiers that places documents that are (somehow) like each other into positions near each other in the permuted ordering. The clustering that results allows inverted index size to be reduced, since each term’s posting list is more likely to contain a non-uniform set of inter-document integer gaps. Reordering is normally performed once, at the time the index is created. Here we consider the role of index reordering in collections that grow over time, noting that simply appending new documents to the collection may erode the effectiveness of an earlier reordering. In particular, we discuss methods for maintaining and reinstating reorderings as document collections grow, and measure the effectiveness of those techniques on a large corpus of English news articles. We also provide experimental results that illustrate the benefits of reordering in terms of query execution time.

Proceedings of the 25th Australasian Document Computing Symposium (ADCS 2021)