dc.description.abstract |
Twitter has flourished to several hundred Million users and could present a rich information source for detecting and classifying hate speech instigator and hate targets using the platform. Microblogging sites are well-known to be suitable for conveying hate speech. As such, hateful wording involves communications that unlawfully demean any group or person based on certain characteristics, including color, race, gender, ethnicity, sexual orientation, religion, or nationality. Such content can frighten, intimidate, or silence platform users, and a few of it will incite different users to commit a crime. The continuing rise of social internet platforms, especially Twitter, has forced the need for more immediate analysis of hatreds and other related antagonistic responses to various trigger events. Twitter users usually air their views about various topics of their interest. The problem is that each tweet is limited in characters and is hence very short. It may contain slang and misspelled words. Thus, it isn't easy to apply traditional NLP techniques designed for working with formal languages into the Twitter domain.Another problem is that the total volume of tweets is extremely high, and it takes a long time to process, thus motivating for analysis within the field. We performed a comparative analysis using various sentiment analysis and machine learning tools using various feature values and model hyperparameters. This thesis developed an approach for collection, preprocessing, and classifying hateful speech that uses content created by self-identifying hateful communities from Twitter. Therefore, this study aims to detect and classify hate speech based on Kenya's context over the Twitter platform, using Natural Language Processing (NLP) techniques, various machine learning methods, and a novel approach for sentiment analysis on Twitter data. These tweets were extracted from Twitter through Twitter API and stored in JSON format in Mongo DB. The Naive Bayes machine-learning algorithm was developed to classify hate tweets into positive and negative sentiments. Experimental evaluations show that the proposed machine learning classifiers are efficient and perform better in accuracy and time. For actual implementation, Python with NLTK and python-twitter APIs have been used. To validate the results of applied offensive tweets, identification, and classification techniques, various performance metrics were used in the study. The experiment results show that Naïve Bayes offered the best performance among other classifiers on Twitter data set classification with an accuracy performance value of 83.1%. |
en_US |