Mostra i principali dati dell'item

dc.creatorAkritidis L., Alamaniotis M., Fevgas A., Bozanis P.en
dc.date.accessioned2023-01-31T07:30:36Z
dc.date.available2023-01-31T07:30:36Z
dc.date.issued2020
dc.identifier10.1109/ICTAI50040.2020.00129
dc.identifier.isbn9781728192284
dc.identifier.issn10823409
dc.identifier.urihttp://hdl.handle.net/11615/70350
dc.description.abstractShort text clustering is a popular problem that focuses on the unsupervised grouping of similar short text documents, or entitled entities. Since the short texts are currently being utilized in a vast number of applications, the problem in question has been rendered increasingly significant in the past few years. The high cluster homogeneity and completeness are two among the most important goals of all data clustering algorithms. However, in the context of short texts, their fulfilment is particularly difficult, because this type of data is typically represented by sparse vectors that collectively comprise a very high dimensional space. In this article we introduce VEPHC, a two-stage clustering algorithm designed to confront the sparseness and high dimensionality traits of short texts. During the first stage (or else, the VEP part), the initial feature vectors are projected onto a lower dimensional space by constructing and scoring variable-sized combinations of features (that is, terms). In the second stage (or else, the HC part), VEPHC improves the homogeneity and completeness of the generated clusters through split and merge operations that are based on the similarities of all inter-cluster elements. The experimental evaluation of VEPHC on two real-world datasets demonstrates its superior performance over numerous state-of-The-Art clustering algorithms in terms of F1 scores and Normalized Mutual Information. © 2020 IEEE.en
dc.language.isoenen
dc.sourceProceedings - International Conference on Tools with Artificial Intelligence, ICTAIen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85098793675&doi=10.1109%2fICTAI50040.2020.00129&partnerID=40&md5=9ed8f68236b20ec369b31ecce52127a4
dc.subjectArtificial intelligenceen
dc.subjectCluster analysisen
dc.subjectVector spacesen
dc.subjectData clustering algorithmen
dc.subjectExperimental evaluationen
dc.subjectHigh dimensional spacesen
dc.subjectNormalized mutual informationen
dc.subjectReal-world datasetsen
dc.subjectShort-text documentsen
dc.subjectSplit-and-merge operationsen
dc.subjectTwo-stage clusteringen
dc.subjectClustering algorithmsen
dc.subjectIEEE Computer Societyen
dc.titleConfronting Sparseness and High Dimensionality in Short Text Clustering via Feature Vector Projectionsen
dc.typeconferenceItemen


Files in questo item

FilesDimensioneFormatoMostra

Nessun files in questo item.

Questo item appare nelle seguenti collezioni

Mostra i principali dati dell'item