Afficher la notice abrégée

dc.creatorAkritidis L., Bozanis P., Fevgas A.en
dc.date.accessioned2023-01-31T07:30:37Z
dc.date.available2023-01-31T07:30:37Z
dc.date.issued2018
dc.identifier10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00140
dc.identifier.isbn9781538675182
dc.identifier.urihttp://hdl.handle.net/11615/70353
dc.description.abstractThe problem of classifying a research article into one or more fields of science is of particular importance for the academic search engines and digital libraries. A robust classification algorithm offers the users a wide variety of useful tools, such as the refinement of their search results, the browsing of articles by category, the recommendation of other similar articles, etc. In the current literature we encounter approaches which attempt to address this problem without taking into consideration important parameters such as the previous history of the authors and the categorization of the scientific journals which publish the articles. In addition, the existing works overlook the huge volume of the involved academic data. In this paper, we expand an existing effective algorithm for research articles classification, and we parallelize it on Apache Spark-A parallelization framework which is capable of sharing large amounts of data into the main memory of the nodes of a cluster-to enable the processing of large academic datasets. Furthermore, we present data manipulation methodologies which are useful not only for this particular problem, but also for most parallel machine learning approaches. In our experimental evaluation, we demonstrate that our proposed algorithm is considerably more accurate than the supervised learning approaches implemented within the machine learning library of Spark, whereas it outperforms them in terms of execution speed by a significant margin. © 2018 IEEE.en
dc.language.isoenen
dc.sourceProceedings - IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, IEEE 16th International Conference on Pervasive Intelligence and Computing, IEEE 4th International Conference on Big Data Intelligence and Computing and IEEE 3rd Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec 2018en
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85056835173&doi=10.1109%2fDASC%2fPiCom%2fDataCom%2fCyberSciTec.2018.00140&partnerID=40&md5=004eb42f35ee78003707bfaa707505e0
dc.subjectArtificial intelligenceen
dc.subjectClassification (of information)en
dc.subjectClustering algorithmsen
dc.subjectData miningen
dc.subjectDigital librariesen
dc.subjectLearning systemsen
dc.subjectSearch enginesen
dc.subjectDimensionality reductionen
dc.subjectEffective algorithmsen
dc.subjectExperimental evaluationen
dc.subjectHigh dimensional dataen
dc.subjectLarge amounts of dataen
dc.subjectRobust classificationen
dc.subjectSparse random projectionsen
dc.subjectSupervised learning approachesen
dc.subjectBig dataen
dc.subjectInstitute of Electrical and Electronics Engineers Inc.en
dc.titleSupervised papers classification on large-scale high-dimensional data with apache sparken
dc.typeconferenceItemen


Fichier(s) constituant ce document

FichiersTailleFormatVue

Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée