Abstract:
Sentiment analysis has demonstrated that automation and computational recognition of sentiments is possible and evolving, due to factors such as, emergence of new technological trends and the continued dynamic state the human language. Sentiment analysis is therefore an Information extraction task that aims at obtaining private sentiments that can either be expressed as ‘positive’ or ‘negative’, toward a specific object or subject. However, social media platforms are marred with unstructured texts that make extraction and parsing of relevant information a problem for most systems and models. This can pose as a challenge to companies, individuals or organizations seeking to make specific strategic decisions based on the available data. To overcome such inefficiencies, on the first phase of experimentation of this study, the research implemented the use of two classifier models, Naïve Bayes and Support Vector Machine, based on feature selection and extraction of sentiments from Twitter product reviews. This was with the aim of evaluating performance of the classifiers on a ‘positive’ and ‘negative’ sentiment classification of product reviews. The classifiers are commonly used as benchmarks against which state-of-the-art (SOTA) approaches and techniques can be compared. The second phase of the experiments involved implementation of an ensemble model of the two classifiers, where the two supervised classifiers together with the ensemble model were compared and evaluated, based on the models’ accuracy measures, precision and robustness. Comparison of the two classifiers were concluded, and a competitive performance between Naïve Bayes and SVM was recorded. In addition, initial experiments in terms of accuracy and error rate measurement with Naive Bayes and SVM classifiers, indicated Naive Bayes classifier to be better in performance with progressive increase in the total number of documents to be analyzed. SVM classifier on the other hand, equally demonstrated good performance on the final phase of Ensemble model experimentations. The results indicated that both SVM and Naïve Bayes were good classifiers for text classification, where, relatively good performances were achieved on accuracy, with the best performing classifiers attaining 99.40%. As there existed significant amount of errors for the classifiers, SVM and the Stacked Ensemble model performed at 0.60% on error rate. While based on the general performance outcome of the classifiers, Naïve Bayes could not be arguably regarded as an unstable classifier. In the final phase, the Stacked Ensemble model additionally demonstrated a good ability to cope with errors, resulting to the development of a robust model.