• English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • français 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Ouvrir une session
Voir le document 
  •   Accueil de DSpace
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Voir le document
  •   Accueil de DSpace
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Voir le document
JavaScript is disabled for your browser. Some features of this site may not work without it.
Tout DSpace
  • Communautés & Collections
  • Par date de publication
  • Auteurs
  • Titres
  • Sujets

Improved retrieval effectiveness by efficient combination of term proximity and zone scoring: A simulation-based evaluation

Thumbnail
Auteur
Akritidis, L.; Katsaros, D.; Bozanis, P.
Date
2012
DOI
10.1016/j.simpat.2011.12.002
Sujet
Web
Search engines
Inverted index
Simulation
Evaluation
VECTOR-SPACE MODEL
SEARCH
COMPRESSION
RANKING
Computer Science, Interdisciplinary Applications
Computer Science,
Software Engineering
Afficher la notice complète
Résumé
During the past few years, the commercial Web search engines have augmented their underlying index structures by significantly enriching the information which describes the appearance of a word within a document Dean (2009) [7]. This enriched information is now used in complex and effective functions which rank documents by taking into consideration hundreds of features, with respect to a user query. Despite the evolution of the search engines, the past research has mainly concentrated on improving plain Web indexes storing typical data only. In this work we study the problem of organizing an inverted index storing additional information. In particular, we examine how the physical locations of a document, called zones, can be efficiently integrated with such an index structure. We introduce TZP, an encoder which compresses these zones in combination to the positions of a word in a document, by employing a fixed number of bits for each portion of a word's inverted list. We demonstrate that our method allows direct access to the compressed zones and positions without expensive look-ups, avoids decoding any unnecessary information, while its overall index size is analogous or even better when compared against state-of-the art schemes. Moreover, we examine how the word positions can be combined to the zones to improve retrieval effectiveness. We introduce BM25TOPF, a scheme which incorporates term proximity and zone weighting into a single ranking formula. Unlike other term proximity approaches, BM25TOPF also takes into account the ordering of the query terms by rewarding the documents containing them in the correct order. Our experiments with the Web Adhoc Task of TREC 2009 and a set of own queries show that BM25TOPF outperforms the current state-of-the-art approaches by a margin between 6% and 11%. (C) 2011 Elsevier B.V. All rights reserved.
URI
http://hdl.handle.net/11615/25428
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]
htmlmap 

 

Parcourir

Tout DSpaceCommunautés & CollectionsPar date de publicationAuteursTitresSujetsCette collectionPar date de publicationAuteursTitresSujets

Mon compte

Ouvrir une sessionS'inscrire
Help Contact
DepositionAboutHelpContactez-nous
Choose LanguageTout DSpace
EnglishΕλληνικά
htmlmap