A comparison of distributed spatial data management systems for processing distance join queries
Date
2017Language
en
Keyword
Abstract
Due to the ubiquitous use of spatial data applications and the large amounts of spatial data that these applications generate, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Two of the most studied distance join queries are the K Closest Pair Query (KCPQ) and the ε Distance Join Query (ε DJQ). The KCPQ finds the K closest pairs of points from two datasets and the ε DJQ finds all the possible pairs of points from two datasets, that are within a distance threshold ε of each other. Distributed cluster-based computing systems can be classified in Hadoop-based and Spark-based systems. Based on this classification, in this paper, we compare two of the most current and leading distributed spatial data management systems, namely SpatialHadoop and LocationSpark, by evaluating the performance of existing and newly proposed parallel and distributed distance join query algorithms in different situations with big real-world datasets. As a general conclusion, while SpatialHadoop is more mature and robust system, LocationSpark is the winner with respect to the total execution time. © 2017, Springer International Publishing AG.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Big spatial and spatio-temporal data analytics systems
Velentzas P., Corral A., Vassilakopoulos M. (2021)We are living in the era of Big Data, and Spatial and Spatio-temporal Data are not an exception. Mobile apps, cars, GPS devices, ships, airplanes, medical devices, IoT devices, etc. are generating explosive amounts of data ... -
Efficient distance join query processing in distributed spatial data management systems
García-García F., Corral A., Iribarne L., Vassilakopoulos M., Manolopoulos Y. (2020)Due to the ubiquitous use of spatial data applications and the large amounts of such data these applications use, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Distance ... -
RkNN query processing in distributed spatial infrastructures: A performance study
García-García F., Corral A., Iribarne L., Vassilakopoulos M. (2017)The Reverse k-Nearest Neighbor (RkNN) problem, i.e. finding all objects in a dataset that have a given query point among their corresponding k-nearest neighbors, has received increasing attention in the past years. RkNN ...