Browsing by Subject "Map-reduce"

Cherry: A Distributed Task-Aware Shuffle Service for Serverless Analytics

Nikitas N., Konstantinou I., Kalogeraki V., Koziris N. (2021)

While there has been a lot of effort in recent years in optimising Big Data systems like Apache Spark and Hadoop, the all-to-all transfer of data between a MapReduce computation step, i.e., the shuffle data mechanism between ...

Computing scientometrics in large-scale academic search engines with MapReduce

Akritidis, L.; Bozanis, P. (2012)

Apart from the well-established facility of searching for research articles, the modern academic search engines also provide information regarding the scientists themselves. Until recently, this information was limited to ...

A Covering Classification Rule Induction Approach for Big Datasets

Kolias V., Anagnostopoulos I., Kayafas E. (2015)

With the ever increasing production of data from various heterogeneous sources in modern information societies, the need for scalable data-intensive processing is increasing. MapReduce quickly became the de facto framework ...

Distance range queries in spatialhadoop

García-García F., Corral A., Iribarne L., Vassilakopoulos M. (2016)

Efficient processing of Distance Range Queries (DRQs) is of great importance in spatial databases due to the wide area of applications. This type of spatial query is characterized by a distance range over one or two datasets. ...

Efficient large-scale distance-based join queries in spatialhadoop

García-García F., Corral A., Iribarne L., Vassilakopoulos M., Manolopoulos Y. (2018)

Efficient processing of Distance-Based Join Queries (DBJQs) in spatial databases is of paramount importance in many application domains. The most representative and known DBJQs are the K Closest Pairs Query (KCPQ) and the ...

Efficient processing of all-k-nearest-neighbor queries in the MapReduce programming framework

Moutafis P., Mavrommatis G., Vassilakopoulos M., Sioutas S. (2019)

Numerous modern applications, from social networking to astronomy, need efficient answering of queries on spatial data. One such query is the All k Nearest-Neighbor Query, or k Nearest-Neighbor Join, that takes as input ...

Enhancing spatialhadoop with closest pair queries

García-García F., Corral A., Iribarne L., Vassilakopoulos M., Manolopoulos Y. (2016)

Given two datasets P and Q, the K Closest Pair Query (KCPQ) finds the K closest pairs of objects from P×Q. It is an operation widely adopted by many spatial and GIS applications. As a combination of the K Nearest Neighbor ...

Exploratory analysis of a terabyte scale web corpus

Kolias, V.; Anagnostopoulos, I.; Kayafas, E. (2014)

In this paper we present a preliminary analysis over the largest publicly accessible web dataset: The Common Crawl Corpus. We measure nine web characteristics from two levels of granularity using MapReduce and we comment ...

Hadoop MapReduce Performance on SSDs for Analyzing Social Networks

Bakratsas M., Basaras P., Katsaros D., Tassiulas L. (2018)

The advent of Solid State Drives (SSDs) stimulated a lot of research to investigate and exploit to the extent possible the potentials of the new drive. The focus of this work is on the investigation of the relative performance ...

Hadoop mapreduce performance on SSDs: The case of complex network analysis tasks

Bakratsas M., Basaras P., Katsaros D., Tassiulas L. (2017)

This article investigates the relative performance of SSDs versus hard disk drives (HDDs) when they are used as underlying storage for Hadoop’s MapReduce. We examine MapReduce tasks and data suitable for performing analysis ...

Hadoop-based distributed k-shell decomposition for social networks

Pechlivanidou K., Katsaros D., Tassiulas L. (2017)

Complex network analysis comprises a popular set of tools for the analysis of online social networks. Among these techniques, k-shell decomposition of a network is a technique that has been used for centrality analysis, ...

Improving Distance-Join Query processing with Voronoi-Diagram based partitioning in SpatialHadoop

García-García F., Corral A., Iribarne L., Vassilakopoulos M. (2020)

SpatialHadoop is an extended MapReduce framework supporting global indexing techniques that partition spatial datasets across several machines and improve spatial query processing performance compared to traditional Hadoop ...

Investigating the efficiency of machine learning algorithms on mapreduce clusters with SSDs

Akritidis L., Fevgas A., Tsompanopoulou P., Bozanis P. (2018)

In the big data era, the efficient processing of large volumes of data has became a standard requirement for both organizations and enterprises. Since single workstations cannot sustain such tremendous workloads, MapReduce ...

MapReduce algorithms for the k group nearest-neighbor query

Moutafis P., Vassilakopoulos M., García-García F., Corral A., Mavrommatis G., Iribarne L. (2019)

Given two datasets of points (called Query and Training), the Group (K) Nearest Neighbor (GNN) query retrieves (K) points of the Training dataset with the smallest sum of distances to every point of the Query one. This ...

MRSLICE: Efficient RkNN Query Processing in SpatialHadoop

García-García F., Corral A., Iribarne L., Vassilakopoulos M. (2019)

Nowadays, with the continuously increasing volume of spatial data, it is difficult to execute spatial queries efficiently in spatial data-intensive applications, because of the limited computational capability and storage ...

RuleMR: Classification rule discovery with MapReduce

Kolias, V.; Kolias, C.; Anagnostopoulos, I.; Kayafas, E. (2014)

The vast amounts of data generated, exchanged and consumed on a daily basis by contemporary networks and devices renders their analysis a cumbersome procedure with inherent difficulties. On the one hand, the need for ...

A Scalable Framework for Customer Sentiment Analysis in the Telecommunication Industry

Skoularikis K., Savvas I.K., Garani G., Kakarontzas G. (2021)

Big Data explosion is a phenomenon of the 21st century. Nowadays, more and more people are using the internet and creating new data regarding ideas, opinions, feelings or their views on a variety of topics and products. ...

Voronoi-diagram based partitioning for distance join query processing in spatialhadoop

García-García F., Corral A., Iribarne L., Vassilakopoulos M. (2018)

SpatialHadoop is an extended MapReduce framework supporting global indexing techniques that partition spatial data across several machines and improve query processing performance compared to traditional Hadoop systems. ...