Zur Kurzanzeige

dc.creatorAkritidis L., Alamaniotis M., Fevgas A., Tsompanopoulou P., Bozanis P.en
dc.date.accessioned2023-01-31T07:30:36Z
dc.date.available2023-01-31T07:30:36Z
dc.date.issued2022
dc.identifier10.1142/S0218213022500348
dc.identifier.issn02182130
dc.identifier.urihttp://hdl.handle.net/11615/70351
dc.description.abstractThis paper focuses on the popular problem of short text clustering. Since the short text documents typically exhibit high degrees of data sparseness and dimensionality, the problem in question is generally considered more challenging than the traditional clustering scenarios. Our proposed solution, named VEPH, is based on a novel algorithm that was published recently with the aim of optimally clustering short text documents. VEPH includes two stages: During the first stage, the original text vectors are projected on a lower dimensional space and the documents with projection vectors lying on the same dimensional space are grouped in the same cluster. The second stage is a refinement process which attempts to improve the quality of the clusters that were generated during the previous stage. The quality of a cluster is determined by its homogeneity and completeness and these are the two primary design criteria of this stage. Initially VEPH cleanses the clusters by removing all dissimilar elements, and then, it iteratively merges the similar clusters in a hierarchical agglomerative manner. The proposed algorithm has been experimentally evaluated in terms of F1 and NMI, by employing three datasets with diverse attributes. The results demonstrated its superiority over other state-of-the-art works of the relevant literature. © 2022 World Scientific Publishing Company.en
dc.language.isoenen
dc.sourceInternational Journal on Artificial Intelligence Toolsen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85136201674&doi=10.1142%2fS0218213022500348&partnerID=40&md5=925e43ceb1900432afdde91235ebb3ba
dc.subjectCluster analysisen
dc.subjectIterative methodsen
dc.subjectClusteringsen
dc.subjectData dimensionalityen
dc.subjectData sparsenessen
dc.subjectFeature learningen
dc.subjectMachine-learningen
dc.subjectShort text clusteringen
dc.subjectShort textsen
dc.subjectShort-text documentsen
dc.subjectText Clusteringen
dc.subjectTraditional clusteringen
dc.subjectVector spacesen
dc.subjectWorld Scientificen
dc.titleImproving Hierarchical Short Text Clustering through Dominant Feature Learningen
dc.typejournalArticleen


Dateien zu dieser Ressource

DateienGrößeFormatAnzeige

Zu diesem Dokument gibt es keine Dateien.

Das Dokument erscheint in:

Zur Kurzanzeige