Applying unsupervised and supervised machine learning methodologies in social media textual traffic data
Επιτομή
Traffic increasingly shapes the trajectory of city growth and impacts on the climate change in modern cities. Traffic patterns’ monitoring can provide with innovative practices in understanding city traffic dynamics, especially via utilizing sensory and textual data analytics. State-of-the-art research recently has focused on processing voluminous real time data in vast quantities by capturing real time sensory observations and/or social network (textual) data regarding city traffic. In this paper, we investigate the feasibility of using Big Data produced by Twitter textual streams for extracting traffic related events. After describing a generic yet innovative application used for data capturing, we preprocess this data so they fit into the structuring of the machine learning models for clustering (unsupervised learning) and classification (supervised learning). For the case of clustering we use Apache Spark on a MapR sandbox with the use of KMeans algorithm. For the classification case we compare various machine learning methodologies including Multi-Layer Perceptron Neural Networks, (MLP-NN), Support Vector Machines, (SVM) and a Deep Convolutional Learning, (DCL) approach to contextualize citizen observations and responses via tweets. The criteria of precision, accuracy, recall and F-score are used as statistical metrics to determine the accuracy and performance of each model. Our experiments include clustering, a 2-class and a 3-class classification, where, MLP-NN gave accuracy of 89.6%, SVM 92.73% and DCL was inferior performing at 81.76%. © Springer Nature Switzerland AG 2019.