Enhancing Sedona (formerly GeoSpark) with Efficient k Nearest Neighbor Join Processing

García-García F., Corral A., Iribarne L., Vassilakopoulos M.

dc.creator	García-García F., Corral A., Iribarne L., Vassilakopoulos M.	en
dc.date.accessioned	2023-01-31T07:39:42Z
dc.date.available	2023-01-31T07:39:42Z
dc.date.issued	2021
dc.identifier	10.1007/978-3-030-78428-7_24
dc.identifier.isbn	9783030784270
dc.identifier.issn	03029743
dc.identifier.uri	http://hdl.handle.net/11615/71958
dc.description.abstract	Sedona (formerly GeoSpark) is an in-memory cluster computing system for processing large-scale spatial data, which extends the core of Apache Spark to support spatial datatypes, partitioning techniques, indexes, and operations (e.g., spatial range, k Nearest Neighbor (kNN) and spatial join queries). k Nearest Neighbor Join Query (kNNJQ) finds for each object in one dataset P, k nearest neighbors of this object in another dataset Q. It is a common operation used in numerous spatial applications (e.g., GISs, location-based systems, continuous monitoring, etc.). kNNJQ is a time-consuming spatial operation, since it can be considered a hybrid of spatial join and nearest neighbor search. Given that Sedona outperforms other Spark-based spatial analytics systems in most cases and, it does not support kNN joins, including kNNJQ is a worthwhile challenge. Therefore, in this paper, we investigate how to design and implement an efficient kNNJQ algorithm in Sedona, using the most appropriate spatial partitioning technique and other improvements. Finally, the results of an extensive set of experiments with real-world datasets are presented, demonstrating that the proposed kNNJQ algorithm is efficient, scalable and robust in Sedona. © 2021, Springer Nature Switzerland AG.	en
dc.language.iso	en	en
dc.source	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85111362900&doi=10.1007%2f978-3-030-78428-7_24&partnerID=40&md5=a45faf2a3403826c0114587344bb2d36
dc.subject	Cluster computing	en
dc.subject	Large scale systems	en
dc.subject	Learning algorithms	en
dc.subject	Motion compensation	en
dc.subject	Text processing	en
dc.subject	Continuous monitoring	en
dc.subject	Design and implements	en
dc.subject	K nearest neighbor (KNN)	en
dc.subject	K-nearest neighbors	en
dc.subject	Location-based systems	en
dc.subject	Partitioning techniques	en
dc.subject	Spatial applications	en
dc.subject	Spatial partitioning	en
dc.subject	Nearest neighbor search	en
dc.subject	Springer Science and Business Media Deutschland GmbH	en
dc.title	Enhancing Sedona (formerly GeoSpark) with Efficient k Nearest Neighbor Join Processing	en
dc.type	conferenceItem	en

Dateien zu dieser Ressource

Dateien	Größe	Format	Anzeige
Zu diesem Dokument gibt es keine Dateien.

Das Dokument erscheint in:

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19705]

Zur Kurzanzeige