Εμφάνιση απλής εγγραφής

dc.creatorAkritidis, L.en
dc.creatorKatsaros, D.en
dc.creatorBozanis, P.en
dc.date.accessioned2015-11-23T10:21:55Z
dc.date.available2015-11-23T10:21:55Z
dc.date.issued2012
dc.identifier10.1016/j.simpat.2011.12.002
dc.identifier.issn1569-190X
dc.identifier.urihttp://hdl.handle.net/11615/25428
dc.description.abstractDuring the past few years, the commercial Web search engines have augmented their underlying index structures by significantly enriching the information which describes the appearance of a word within a document Dean (2009) [7]. This enriched information is now used in complex and effective functions which rank documents by taking into consideration hundreds of features, with respect to a user query. Despite the evolution of the search engines, the past research has mainly concentrated on improving plain Web indexes storing typical data only. In this work we study the problem of organizing an inverted index storing additional information. In particular, we examine how the physical locations of a document, called zones, can be efficiently integrated with such an index structure. We introduce TZP, an encoder which compresses these zones in combination to the positions of a word in a document, by employing a fixed number of bits for each portion of a word's inverted list. We demonstrate that our method allows direct access to the compressed zones and positions without expensive look-ups, avoids decoding any unnecessary information, while its overall index size is analogous or even better when compared against state-of-the art schemes. Moreover, we examine how the word positions can be combined to the zones to improve retrieval effectiveness. We introduce BM25TOPF, a scheme which incorporates term proximity and zone weighting into a single ranking formula. Unlike other term proximity approaches, BM25TOPF also takes into account the ordering of the query terms by rewarding the documents containing them in the correct order. Our experiments with the Web Adhoc Task of TREC 2009 and a set of own queries show that BM25TOPF outperforms the current state-of-the-art approaches by a margin between 6% and 11%. (C) 2011 Elsevier B.V. All rights reserved.en
dc.sourceSimulation Modelling Practice and Theoryen
dc.source.uri<Go to ISI>://WOS:000301307700006
dc.subjectWeben
dc.subjectSearch enginesen
dc.subjectInverted indexen
dc.subjectSimulationen
dc.subjectEvaluationen
dc.subjectVECTOR-SPACE MODELen
dc.subjectSEARCHen
dc.subjectCOMPRESSIONen
dc.subjectRANKINGen
dc.subjectComputer Science, Interdisciplinary Applicationsen
dc.subjectComputer Science,en
dc.subjectSoftware Engineeringen
dc.titleImproved retrieval effectiveness by efficient combination of term proximity and zone scoring: A simulation-based evaluationen
dc.typejournalArticleen


Αρχεία σε αυτό το τεκμήριο

ΑρχείαΜέγεθοςΤύποςΠροβολή

Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής