Algorithms for processing the group K nearest-neighbor query on distributed frameworks

Moutafis P., García-García F., Mavrommatis G., Vassilakopoulos M., Corral A., Iribarne L.

dc.creator	Moutafis P., García-García F., Mavrommatis G., Vassilakopoulos M., Corral A., Iribarne L.	en
dc.date.accessioned	2023-01-31T09:02:13Z
dc.date.available	2023-01-31T09:02:13Z
dc.date.issued	2021
dc.identifier	10.1007/s10619-020-07317-8
dc.identifier.issn	09268782
dc.identifier.uri	http://hdl.handle.net/11615/76815
dc.description.abstract	Given two datasets of points (called Query and Training), the Group (K) Nearest-Neighbor (GKNN) query retrieves (K) points of the Training with the smallest sum of distances to every point of the Query. This spatial query has been studied during the recent years and several performance improving techniques and pruning heuristics have been proposed. In previous work, we presented the first MapReduce algorithm, consisting of alternating local and parallel phases, which can be used to effectively process the GKNN query when the Query fits in memory, while the Training one belongs to the Big Data category. In this paper, we present a significantly improved algorithm that incorporates a new high-performance refining method, a fast way to calculate distance sums for pruning purposes and several other minor coding and algorithmic improvements. Moreover, we transform this algorithm (which has been implemented in the Hadoop framework) to SpatialHadoop (a popular distributed framework that is dedicated to spatial processing), using a novel two-level partitioning method. Using real world and synthetic datasets, we also present a thorough experimental study of the Hadoop and SpatialHadoop versions of the algorithm, including a backstage analysis of the algorithm’s performance, using metrics that highlight its internal functioning. Finally, we present an experimental comparison of the Hadoop, the SpatialHadoop versions and the version of our previous work, showing that the improved versions are the big winners, with the SpatialHadoop one being faster than its Hadoop counterpart. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.	en
dc.language.iso	en	en
dc.source	Distributed and Parallel Databases	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85095711107&doi=10.1007%2fs10619-020-07317-8&partnerID=40&md5=8bb94a248b662bdd254622330d7975fc
dc.subject	Information systems	en
dc.subject	Software engineering	en
dc.subject	Distributed framework	en
dc.subject	Experimental comparison	en
dc.subject	Improving techniques	en
dc.subject	Internal functioning	en
dc.subject	K nearest neighbor queries	en
dc.subject	Partitioning methods	en
dc.subject	Spatial processing	en
dc.subject	Synthetic datasets	en
dc.subject	Nearest neighbor search	en
dc.subject	Springer	en
dc.title	Algorithms for processing the group K nearest-neighbor query on distributed frameworks	en
dc.type	journalArticle	en

Αρχεία σε αυτό το τεκμήριο

Αρχεία	Μέγεθος	Τύπος	Προβολή
Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19705]

Εμφάνιση απλής εγγραφής