Efficient distance join query processing in distributed spatial data management systems

García-García F., Corral A., Iribarne L., Vassilakopoulos M., Manolopoulos Y.

dc.creator	García-García F., Corral A., Iribarne L., Vassilakopoulos M., Manolopoulos Y.	en
dc.date.accessioned	2023-01-31T07:39:45Z
dc.date.available	2023-01-31T07:39:45Z
dc.date.issued	2020
dc.identifier	10.1016/j.ins.2019.10.030
dc.identifier.issn	00200255
dc.identifier.uri	http://hdl.handle.net/11615/71967
dc.description.abstract	Due to the ubiquitous use of spatial data applications and the large amounts of such data these applications use, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Distance Join Queries (DJQs) are important and frequently used operations in numerous applications, including data mining, multimedia and spatial databases. DJQs (e.g., k Nearest Neighbor Join Query, k Closest Pair Query, ε Distance Join Query, etc.) are costly operations, since they involve both the join and distance-based search, and performing DJQs efficiently is a challenging task. Recent Big Data developments have motivated the emergence of novel technologies for distributed processing of large-scale spatial data in clusters of computers, leading to Distributed Spatial Data Management Systems (DSDMSs). Distributed cluster-based computing systems can be classified as Hadoop-based or Spark-based systems. Based on this classification, in this paper, we compare two of the most recent and leading DSDMSs, SpatialHadoop and LocationSpark, by evaluating the performance of several existing and newly proposed parallel and distributed DJQ algorithms under various settings with large spatial real-world datasets. A general conclusion arising from the execution of the distributed DJQ algorithms studied is that, while SpatialHadoop is a robust and efficient system when large spatial datasets are joined (since it is built on top of the mature Hadoop platform), LocationSpark is the clear winner in total execution time efficiency when medium spatial datasets are combined (due to in-memory processing provided by Spark). However, LocationSpark requires higher memory allocation when large spatial datasets are involved in DJQs (even more so when k and ε are large). Finally, this detailed performance study has demonstrated that the new distributed DJQ algorithms we have proposed are efficient, robust and scalable with respect to different parameters, such as dataset sizes, k, ε and number of computing nodes. © 2019 Elsevier Inc.	en
dc.language.iso	en	en
dc.source	Information Sciences	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85074421662&doi=10.1016%2fj.ins.2019.10.030&partnerID=40&md5=5e74b16c3cde3b741f1ac0b4ec638ada
dc.subject	Classification (of information)	en
dc.subject	Cluster computing	en
dc.subject	Data handling	en
dc.subject	Data mining	en
dc.subject	Database systems	en
dc.subject	Information management	en
dc.subject	Large dataset	en
dc.subject	Location	en
dc.subject	Nearest neighbor search	en
dc.subject	Search engines	en
dc.subject	LocationSpark	en
dc.subject	Space partitioning	en
dc.subject	Spatial data processing	en
dc.subject	Spatial queries	en
dc.subject	SpatialHadoop	en
dc.subject	Spatial distribution	en
dc.subject	Elsevier Inc.	en
dc.title	Efficient distance join query processing in distributed spatial data management systems	en
dc.type	journalArticle	en

Αρχεία σε αυτό το τεκμήριο

Αρχεία	Μέγεθος	Τύπος	Προβολή
Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19705]

Εμφάνιση απλής εγγραφής

Efficient distance join query processing in distributed spatial data management systems

Αρχεία σε αυτό το τεκμήριο

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Related items

Use of Wild Bird Surveillance, Human Case Data and GIS Spatial Analysis for Predicting Spatial Distributions of West Nile Virus in Greece ﻿

Effects of sub-anesthetic doses of ketamine on rats' spatial and non-spatial recognition memory ﻿

Pre-training administration of anesthetic ketamine differentially affects rats' spatial and non-spatial recognition memory ﻿

Use of Wild Bird Surveillance, Human Case Data and GIS Spatial Analysis for Predicting Spatial Distributions of West Nile Virus in Greece

Effects of sub-anesthetic doses of ketamine on rats' spatial and non-spatial recognition memory

Pre-training administration of anesthetic ketamine differentially affects rats' spatial and non-spatial recognition memory