Enhancing the SliceNBound Algorithm for the Closest-Pairs Query with Binary Space Partitioning
Data
2021Language
en
Soggetto
Abstract
Given two datasets P and Q, the (K) Closest-Pairs Query, KCPQ, finds the (K) pairs of objects between the datasets with the least distance. In a previous work, we presented SliceNBound, a fast distributed algorithm for the KCPQ on Apache Spark, consisting of four phases. The algorithm partitions the datasets in slices across an axis. Since it is well known that proper partitioning of the datasets directly affects the running time of every algorithm, in a subsequent work we tested and evaluated variations of the Binary Space Partitioning (BSP) for solving the KCPQ and found that this technique achieves better performance. In this paper we present an improvement of our distributed algorithm SliceNBound which consists in using a BSP scheme to create the partitions of data and reducing the samplings to one. The experiments show that this technique significantly reduces the total running time of the algorithm, thus improving an already fast and efficient algorithm. © 2021 ACM.