Efficient dynamic pruning on largest scores first (LSF) retrieval

2016, Volume 17, Issue 1

Abstract

Keywords

Related Research

Frontiers of Information Technology & Electronic Engineering >> 2016, Volume 17, Issue 1 doi: 10.1631/FITEE.1500190

Efficient dynamic pruning on largest scores first (LSF) retrieval

College of Computer, National University of Defense Technology, Changsha 410073, China

Received: 2015-06-06 Accepted: 2016-01-05 Available online: 2016-01-11

HTML0 PDF 25 Collect 0

Next Previous

Abstract

Inverted index traversal techniques have been studied in addressing the query processing performance challenges of web search engines, but still leave much room for improvement. In this paper, we focus on the inverted index traversal on document-sorted indexes and the optimization technique called dynamic pruning, which can efficiently reduce the hardware computational resources required. We propose another novel exhaustive index traversal scheme called largest scores first (LSF) retrieval, in which the candidates are first selected in the posting list of important query terms with the largest upper bound scores and then fully scored with the contribution of the remaining query terms. The scheme can effectively reduce the memory consumption of existing term-at-atime (TAAT) and the candidate selection cost of existing document-at-a-time (DAAT) retrieval at the expense of revisiting the posting lists of the remaining query terms. Preliminary analysis and implementation show comparable performance between LSF and the two well-known baselines. To further reduce the number of postings that need to be revisited, we present efficient rank safe dynamic pruning techniques based on LSF, including two important optimizations called list omitting (LSF_LO) and partial scoring (LSF_PS) that make full use of query term importance. Finally, experimental results with the TREC GOV2 collection show that our new index traversal approaches reduce the query latency by almost 27% over the WAND baseline and produce slightly better results compared with the MaxScore baseline, while returning the same results as exhaustive evaluation.

Keywords

Inverted index ; Index traversal ; Query latency ; Largest scores first (LSF) retrieval ; Dynamic pruning

Related Research