Logo
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • English 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Login
View Item 
  •   University of Thessaly Institutional Repository
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • View Item
  •   University of Thessaly Institutional Repository
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
Institutional repository
All of DSpace
  • Communities & Collections
  • By Issue Date
  • Authors
  • Titles
  • Subjects

Improved retrieval effectiveness by efficient combination of term proximity and zone scoring: A simulation-based evaluation

Thumbnail
Author
Akritidis, L.; Katsaros, D.; Bozanis, P.
Date
2012
DOI
10.1016/j.simpat.2011.12.002
Keyword
Web
Search engines
Inverted index
Simulation
Evaluation
VECTOR-SPACE MODEL
SEARCH
COMPRESSION
RANKING
Computer Science, Interdisciplinary Applications
Computer Science,
Software Engineering
Metadata display
Abstract
During the past few years, the commercial Web search engines have augmented their underlying index structures by significantly enriching the information which describes the appearance of a word within a document Dean (2009) [7]. This enriched information is now used in complex and effective functions which rank documents by taking into consideration hundreds of features, with respect to a user query. Despite the evolution of the search engines, the past research has mainly concentrated on improving plain Web indexes storing typical data only. In this work we study the problem of organizing an inverted index storing additional information. In particular, we examine how the physical locations of a document, called zones, can be efficiently integrated with such an index structure. We introduce TZP, an encoder which compresses these zones in combination to the positions of a word in a document, by employing a fixed number of bits for each portion of a word's inverted list. We demonstrate that our method allows direct access to the compressed zones and positions without expensive look-ups, avoids decoding any unnecessary information, while its overall index size is analogous or even better when compared against state-of-the art schemes. Moreover, we examine how the word positions can be combined to the zones to improve retrieval effectiveness. We introduce BM25TOPF, a scheme which incorporates term proximity and zone weighting into a single ranking formula. Unlike other term proximity approaches, BM25TOPF also takes into account the ordering of the query terms by rewarding the documents containing them in the correct order. Our experiments with the Web Adhoc Task of TREC 2009 and a set of own queries show that BM25TOPF outperforms the current state-of-the-art approaches by a margin between 6% and 11%. (C) 2011 Elsevier B.V. All rights reserved.
URI
http://hdl.handle.net/11615/25428
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]
htmlmap 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister (MyDspace)
Help Contact
DepositionAboutHelpContact Us
Choose LanguageAll of DSpace
EnglishΕλληνικά
htmlmap