Filter by type:

Rank fusion is a powerful technique that allows multiple sources of information to be combined into a single result set. However, to date fusion has not been regarded as being cost-effective in cases where strict per-query efficiency guarantees are required, such as in web search. In this work we propose a novel solution to rank fusion by splitting the computation into two parts – one phase that is carried out offline to generate pre-computed centroid answers for queries with broadly similar information needs, and then a second online phase that uses the corresponding topic centroid to compute a result page for each query. We explore efficiency improvements to classic fusion algorithms whose costs can be amortized as a pre-processing step, and can then be combined with re-ranking approaches to dramatically improve effectiveness in multi-stage retrieval systems with little efficiency overhead at query time. Experimental results using the ClueWeb12B collection and the UQV100 query variations demonstrate that centroid-based approaches allow improved retrieval effectiveness at little or no loss in query throughput or latency, and with reasonable pre-processing requirements. We additionally show that queries that do not match any of the pre-computed clusters can be accurately identified and efficiently processed in our proposed ranking pipeline.
TOIS, 2019

Performant Indexes and Search for Academia (PISA) is an experimental search engine that focuses on efficient implementations of state-of-the-art representations and algorithms for text retrieval. In this work, we outline our effort in creating a replicable search run from PISA for the 2019 Open Source Information Retrieval Replicability Challenge, which encourages the information retrieval community to produce replicable systems through the use of a containerized, Docker-based infrastructure. We also discuss the origins, current functionality, and future direction and challenges for the PISA system.
OSIRRC, 2019

Processing top-$k$ bag-of-words queries is critical to many information retrieval applications, including web-scale search. In this work, we consider algorithmic properties associated with dynamic pruning mechanisms. Such algorithms maintain a score threshold (the $k$th highest similarity score identified so far) so that low-scoring documents can be bypassed, allowing fast top-$k$ retrieval with no loss in effectiveness. In standard pruning algorithms the score threshold is initialized to the lowest possible value. To accelerate processing, we make use of term- and query-dependent features to predict the final value of that threshold, and then employ the predicted value right from the commencement of processing. Because of the asymmetry associated with prediction errors (if the estimated threshold is too high the query will need to be re-executed in order to assure the correct answer), the prediction process must be risk-sensitive. We explore techniques for balancing those factors, and provide detailed experimental results that show the practical usefulness of the new approach.
SIGIR, 2019

Email continues to be one of the most commonly used forms of online communication. As inboxes grow larger, users rely more heavily on email search to effectively find what they are looking for. However, previous studies on email have been exclusive to enterprises with access to large user logs, or limited to small-scale qualitative surveys and analyses on outdated public datasets such as Enron and Avocado. In this work, we propose a novel framework that allows for experimentation with real email data. In particular, our approach provides a realistic way of simulating email re-finding tasks in a crowdsourcing environment using the workers’ personal email data. We use our approach to experiment with various ranking functions and quality degradation to measure how users behave under different conditions, and conduct analysis across various email types and attributes. Our results show that user behavior can be significantly impacted as a result of the quality of the search ranker, but only when differences in quality are very pronounced. Our analysis confirms that time-based ranking begins to fail as email age increases, suggesting that hybrid approaches may help bridge the gap between relevance-based rankers and the traditional time-based ranking approach. Finally, we also found that users typically reformulate search queries by either entirely re-writing the query, or simply appending terms to the query, which may have implications for email query suggestion facilities
WWW, 2019

Document reordering is an important but often overlooked preprocessing stage in index construction. Reordering document identifiers in graphs and inverted indexes has been shown to reduce storage costs and improve processing efficiency in the resulting indexes. However, surprisingly few document reordering algorithms are publicly available despite their importance. A new reordering algorithm derived from recursive graph bisection was recently proposed by Dhulipala (KDD 2016), and shown to be highly effective and efficient when compared against other state-of-the-art reordering strategies. In this work, we present a reproducibility study of this new algorithm. We describe the implementation challenges encountered, and explore the performance characteristics of our clean-room reimplementation. We show that we are able to successfully reproduce the core results of the original paper, and show that the algorithm generalizes to other collections and indexing frameworks. Furthermore, we make our implementation publicly available to help promote further research in this space.
ECIR, 2019

The Waterloo spam scores are now a commonly used static document feature in web collections such as ClueWeb. This feature can be used as a post-retrieval filter, as a document prior, or as one of many features in a Learning-to-Rank system. In this work, we highlight the risks associated with using spam scores as a post-retrieval filter, which is now common practice in experiments with the ClueWeb test collection. While it increases the average evaluation score and boosts the performance of some topics, it can significantly harm the performance of others. Through a detailed failure analysis, we show that simple spam filtering is a high risk practice that should be avoided in future work, particularly when working with the ClueWeb12 test collection.
ADCS, 2018

Ad-hoc retrieval is an important problem with many practical applications. It forms the basis of web search, question-answering, and a new generation of virtual assistants being developed by several of the largest software companies in the world. In this report, we continue our exploration of the importance of multiple expressions of information needs. Our thesis is that over-reliance on a single query can lead to suboptimal performance, and that by creating multiple query representations for an information need and combining the relevance signals through fusion and relevance modeling, highly effective systems can be produced. This approach may form the basis for more complex multi-stage retrieval systems in a variety of applications.
TREC, 2018

Relevance modeling and data fusion are powerful yet simple approaches to improving the effectiveness of Information Retrieval Systems. For many of the classic TREC test collections, these approaches were used in many of the top performing retrieval systems. However, these approaches are often inefficient and are therefore rarely applied in production systems which must adhere to strict performance guarantees. Inspired by our recent work with human-derived query variations, we propose a new sampling-based system which provides significantly better efficiency-effectiveness trade-offs while leveraging both relevance modeling and data fusion. We show that our new end-to-end search system approaches the state-of-the-art in effectiveness while still being efficient in practice. Orthogonally, we also show how to leverage query expansion and data fusion to achieve significantly better risk-reward trade-offs than plain relevance modeling approaches.

Negated query terms allow documents containing such terms to be filtered out of a search results list, supporting disambiguation. In this work, the effect of negation on the efficiency of disjunctive, top-$k$ retrieval is examined. First, we show how negation can be integrated efficiently into two popular dynamic pruning algorithms. Then, we explore the efficiency of our approach, and show that while often efficient, negation can negatively impact the dynamic pruning effectiveness for certain queries.
ECIR, 2018

Large scale retrieval systems often employ cascaded ranking architectures, in which an initial set of candidate documents are iteratively refined and re-ranked by increasingly sophisticated and expensive ranking models. In this paper, we propose a unified framework for predicting a range of performance-sensitive parameters based on minimizing end-to-end effectiveness loss. The framework does not require relevance judgments for training, is amenable to predicting a wide range of parameters, allows for fine tuned efficiency-effectiveness trade-offs, and can be easily deployed in large scale search systems with minimal overhead. As a proof of concept, we show that the framework can accurately predict a number of performance parameters on a query-by-query basis, allowing efficient and effective retrieval, while simultaneously minimizing the tail latency of an early-stage candidate generation system. On the $50$ million document ClueWeb09B collection, and across $25{,}000$ queries, our hybrid system can achieve superior early-stage efficiency to fixed parameter systems without loss of effectiveness, and allows more finely-grained efficiency-effectiveness trade-offs across the multiple stages of the retrieval system.
WSDM, 2018

Score-at-a-Time index traversal is a query processing approach which supports early termination in order to balance efficiency and effectiveness trade-offs. In this work, we explore new techniques which extend a modern Score-at-a-Time traversal algorithm to allow for parallel postings traversal. We show that careful integration of parallel traversal can improve both efficiency and effectiveness when compared with current single threaded early termination approaches. In addition, we explore the various trade-offs for differing early termination heuristics, and propose hybrid systems which parallelize long running queries, while processing short running queries with only a single thread.
ADCS, 2017

Furthering the state-of-the-art in adhoc web search is one of the underlying goals for the NTCIR We Want Web (WWW) task. Adhoc search can be viewed as a bridge connecting many of the specialized sub-fields that are a result of the way people connect to and use information access systems. Since this is the first year of the WWW task, and no training data was provided for the English subtask, we focused on classic techniques for improving effectiveness in lieu of modern techniques based on supervised learning. In particular, we explored the use of Markov Random Field Models (MRFs), static document features, field-based weighting, and query expansion. This round we made extensive use of the Indri search system and the flexible query language it provides to produce effective results.
NTCIR, 2017

The 2017 TREC CORE Track is a re-run of the classic TREC ad hoc search evaluation campaign, with the vision of establishing new methodologies for creating IR test collections. The previous TREC newswire ad hoc task was the 2004 Robust Track, where the emphasis was on improving the effectiveness of poorly performing topics in previous tracks. The TREC CORE 2017 track reuses the Robust 2004 topic set, for the development of relevance judgments over a new New York Times corpus, composed of newswire articles published between 1987 and 2007. For this task, RMIT contributed three different runs, including two manual, and one automatic run.
TREC, 2017

With the growing popularity of the world-wide-web and the increasing accessibility of smart devices, data is being generated at a faster rate than ever before. This presents scalability challenges to web-scale search systems – how can we efficiently index, store and retrieve such a vast amount of data? A large amount of prior research has attempted to address many facets of this question, with the invention of a range of efficient index storage and retrieval frameworks that are able to efficiently answer most queries. However, the current literature generally focuses on improving the mean or median query processing time in a given system. In the proposed PhD project, we focus on improving the efficiency of high percentile tail latencies in large scale IR systems while minimising end-to-end effectiveness loss. Although there is a wealth of prior research involving improving the efficiency of large scale IR systems, the most relevant prior work involves predicting long-running queries and processing them in various ways to avoid large query processing times. Prediction is often done through pre-trained models based on both static and dynamic features from queries and documents. Many different approaches to reducing the processing time of long running queries have been proposed, including parallelising queries that are predicted to run slowly, scheduling queries based on their predicted run time, and selecting or modifying the query processing algorithm depending on the load of the system. Considering the specific focus on tail latencies in large-scale IR systems, the proposed research aims to: (i) study what causes large tail latencies to occur in large-scale web search systems, (ii) propose a framework to mitigate tail latencies in multi-stage retrieval through the prediction of a vast range of query-specific efficiency parameters, (iii) experiment with mixed-mode query semantics to provide efficient and effective querying to reduce tail latencies, and (iv) propose a time-bounded solution for Document-at-a-Time (DaaT) query processing which is suitable for current web search systems. As a preliminary study, Crane et al. compared some state-of-the-art query processing strategies across many modern collections. They found that although modern DaaT dynamic pruning strategies are very efficient for ranked disjunctive processing, they have a much larger variance in processing times than Score-at-a-Time (SaaT) strategies which have a similar efficiency profile regardless of query length or the size of the required result set. Furthermore, Mackenzie et al. explored the efficiency trade-offs for paragraph retrieval in a multi-stage question answering system. They found that DaaT dynamic pruning strategies could efficiently retrieve the top-1,000 candidate paragraphs for very long queries. Extending on prior work, Mackenzie et al. showed how a range of per-query efficiency settings can be accurately predicted such that $99.99$ percent of queries are serviced in less than $200$ ms without noticeable effectiveness loss. In addition, a reference list framework was used for training models such that no relevance judgements or annotations were required. Future work will focus on improving the candidate generation stage in large-scale multi-stage retrieval systems. This will include further exploration of index layouts, traversal strategies, and query rewriting, with the aim of improving early stage efficiency to reduce the system tail latency, while potentially improving end-to-end effectiveness.
SIGIR, 2017

We present an empirical comparison between document-at-a-time (DaaT) and score-at-a-time (SaaT) document ranking strategies within a common framework. Although both strategies have been extensively explored, the literature lacks a fair, direct comparison: such a study has been difficult due to vastly different query evaluation mechanics and index organizations. Our work controls for score quantization, document processing, compression, implementation language, implementation effort, and a number of details, arriving at an empirical evaluation that fairly characterizes the performance of three specific techniques: WAND (DaaT), BMW (DaaT), and JASS (SaaT). Experiments reveal a number of interesting findings. The performance gap between WAND and BMW is not as clear as the literature suggests, and both methods are susceptible to tail queries that may take orders of magnitude longer than the median query to execute. Surprisingly, approximate query evaluation in WAND and BMW does not significantly reduce the risk of these tail queries. Overall, JASS is slightly slower than either WAND or BMW, but exhibits much lower variance in query latencies and is much less susceptible to tail query effects. Furthermore, JASS query latency is not particularly sensitive to the retrieval depth, making it an appealing solution for performance-sensitive applications where bounds on query latencies are desirable.
WSDM, 2017

This paper describes the four systems RMIT fielded for the TREC 2016 LiveQA task and the associated experiments. Similar to last year, the results show that simple solutions tend to work best, and that our improved baseline systems achieved an above-average performance. We use a commercial search engine as a first stage retrieval mechanism and compare it with our internal system which uses a carefully curated document collection. Somewhat surprisingly, we found that on average the small curated collection performed better within our current framework, warranting further studies on when and when not to use an external resource, such as a publicly available search engine API. Finally, we show that small improvements to performance can substantially reduce failure rates.
TREC, 2016

Mobile search is quickly becoming the most common mode of search on the internet. This shift is driving changes in user behaviour, and search engine behaviour. Just over half of all search queries from mobile devices have local intent, making location-aware search an increasingly important problem. In this work, we compare the efficiency and effectiveness of two general types of geographical search queries, range queries and $k$-nearest-neighbor queries, for common web search tasks. We test state-of-the-art spatial-textual indexing and search algorithms for both query types on two large datasets. Finally, we present a rank-safe dynamic pruning algorithm that is simple to implement and use with current inverted indexing techniques. Our algorithm is more efficient than the tightly coupled best-in-breed hybrid indexing algorithms that are commonly used for top-$k$ spatial textual queries, and more likely to find relevant documents than techniques derived from range queries.
ADCS, 2015