Mostra i principali dati dell'item

dc.creatorAkritidis L., Alamaniotis M., Fevgas A., Bozanis P.en
dc.date.accessioned2023-01-31T07:30:36Z
dc.date.available2023-01-31T07:30:36Z
dc.date.issued2021
dc.identifier10.1109/ICTAI52525.2021.00149
dc.identifier.isbn9781665408981
dc.identifier.issn10823409
dc.identifier.urihttp://hdl.handle.net/11615/70349
dc.description.abstractShort text clustering deals with the problem of grouping together semantically similar documents with small lengths. Nowadays, huge amounts of text data is being generated by numerous applications such as microblogs, messengers, and services that generate or aggregate entitled entities. This large volume of highly dimensional and sparse information may easily overwhelm the current serial approaches and render them inefficient, or even inapplicable. Although many traditional clustering algorithms have been successfully parallelized in the past, the parallelization of short text clustering algorithms is a rather overlooked problem. In this paper we introduce pVEPHC, a short text clustering method that can be executed in parallel in large computer clusters. The algorithm draws inspiration from VEPHC, a recent two-stage approach with decent performance in several diverse tasks. More specifically, in this work we employ the Apache Spark framework to design parallel implementations of both stages of VEPHC. During the first stage, pVEPHC generates an initial clustering by identifying and modelling common low-dimensional vector representations of the original documents. In the sequel, the initial clustering is improved in the second stage by applying cluster split and merge operations in a hierarchical fashion. We have attested our implementation on an experimental Spark cluster and we report an almost linear improvement in the execution times of the algorithm. © 2021 IEEE.en
dc.language.isoenen
dc.sourceProceedings - International Conference on Tools with Artificial Intelligence, ICTAIen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85123944483&doi=10.1109%2fICTAI52525.2021.00149&partnerID=40&md5=ecd0b90a5627520b8d576ff9186da31b
dc.subjectBig dataen
dc.subjectClustering algorithmsen
dc.subjectMachine learningen
dc.subjectParallel algorithmsen
dc.subject'currenten
dc.subjectClusteringsen
dc.subjectLarge volumesen
dc.subjectMicro-blogen
dc.subjectShort text clusteringen
dc.subjectShort textsen
dc.subjectText Clusteringen
dc.subjectText dataen
dc.subjectText-clustering algorithmen
dc.subjectTraditional clusteringen
dc.subjectCluster analysisen
dc.subjectIEEE Computer Societyen
dc.titleA Scalable Short-Text Clustering Algorithm Using Apache Sparken
dc.typeconferenceItemen


Files in questo item

FilesDimensioneFormatoMostra

Nessun files in questo item.

Questo item appare nelle seguenti collezioni

Mostra i principali dati dell'item