• English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • español 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Login
Ver ítem 
  •   DSpace Principal
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Ver ítem
  •   DSpace Principal
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.
Todo DSpace
  • Comunidades & Colecciones
  • Por fecha de publicación
  • Autores
  • Títulos
  • Materias

Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

Thumbnail
Autor
Moutafis P., Mavrommatis G., Vassilakopoulos M., Sioutas S.
Fecha
2019
Language
en
DOI
10.1016/j.datak.2019.04.003
Materia
Balancing
Computer software
Large dataset
Motion compensation
Nearest neighbor search
Text processing
Apache hadoop
Map-reduce
Nearest neighbor queries
Plane sweep
Quad trees
Spatial query processing
Distributed computer systems
Elsevier B.V.
Mostrar el registro completo del ítem
Resumen
Numerous modern applications, from social networking to astronomy, need efficient answering of queries on spatial data. One such query is the All k Nearest-Neighbor Query, or k Nearest-Neighbor Join, that takes as input two datasets and, for each object of the first one, returns the k nearest-neighbors from the second one. It is a combination of the k nearest-neighbor and join queries and is computationally demanding. Especially, when the datasets involved fall in the category of Big Data, a single machine cannot efficiently process it. Only in the last few years, papers proposing solutions for distributed computing environments have appeared in the literature. In this paper, we focus on parallel and distributed algorithms using the Apache Hadoop framework. More specifically, we focus on an algorithm that was recently presented in the literature and propose improvements to tackle three major challenges that distributed processing faces: improvement of load balancing (we implement an adaptive partitioning scheme based on Quadtrees), acceleration of local processing (we prune points during calculations by utilizing plane-sweep processing), and reduction of network traffic (we restructure and reduce the output size of the most demanding phase of computation). Moreover, by using real 2D and 3D datasets, we experimentally study the effect of each improvement and their combinations on performance of this literature algorithm. Experiments show that by carefully addressing the three aforementioned issues, one can achieve significantly better performance. Thereby, we conclude to a new scalable algorithm that adapts to the data distribution and significantly outperforms its predecessor. Moreover, we present an experimental comparison of our algorithm against other well-known MapReduce algorithms for the same query and show that these algorithms are also significantly outperformed. © 2019 Elsevier B.V.
URI
http://hdl.handle.net/11615/76818
Colecciones
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]
htmlmap 

 

Listar

Todo DSpaceComunidades & ColeccionesPor fecha de publicaciónAutoresTítulosMateriasEsta colecciónPor fecha de publicaciónAutoresTítulosMaterias

Mi cuenta

AccederRegistro
Help Contact
DepositionAboutHelpContacto
Choose LanguageTodo DSpace
EnglishΕλληνικά
htmlmap