Sentiment Classification for Hate Tweet Detection in Kenya on Twitter Data Using Naïve Bayes Algorithm

Kiilu, Kelvin Kiema

JKUAT Repository Home
→
Theses and Dissertations
→
College of Engineering and Technology (COETEC)
→
View Item

dc.contributor.author	Kiilu, Kelvin Kiema
dc.date.accessioned	2021-03-04T09:45:51Z
dc.date.available	2021-03-04T09:45:51Z
dc.date.issued	2021-03-04
dc.identifier.uri	http://localhost/xmlui/handle/123456789/5521
dc.description	Master of Science in Computer Systems	en_US
dc.description.abstract	Twitter has flourished to several hundred Million users and could present a rich information source for detecting and classifying hate speech instigator and hate targets using the platform. Microblogging sites are well-known to be suitable for conveying hate speech. As such, hateful wording involves communications that unlawfully demean any group or person based on certain characteristics, including color, race, gender, ethnicity, sexual orientation, religion, or nationality. Such content can frighten, intimidate, or silence platform users, and a few of it will incite different users to commit a crime. The continuing rise of social internet platforms, especially Twitter, has forced the need for more immediate analysis of hatreds and other related antagonistic responses to various trigger events. Twitter users usually air their views about various topics of their interest. The problem is that each tweet is limited in characters and is hence very short. It may contain slang and misspelled words. Thus, it isn't easy to apply traditional NLP techniques designed for working with formal languages into the Twitter domain.Another problem is that the total volume of tweets is extremely high, and it takes a long time to process, thus motivating for analysis within the field. We performed a comparative analysis using various sentiment analysis and machine learning tools using various feature values and model hyperparameters. This thesis developed an approach for collection, preprocessing, and classifying hateful speech that uses content created by self-identifying hateful communities from Twitter. Therefore, this study aims to detect and classify hate speech based on Kenya's context over the Twitter platform, using Natural Language Processing (NLP) techniques, various machine learning methods, and a novel approach for sentiment analysis on Twitter data. These tweets were extracted from Twitter through Twitter API and stored in JSON format in Mongo DB. The Naive Bayes machine-learning algorithm was developed to classify hate tweets into positive and negative sentiments. Experimental evaluations show that the proposed machine learning classifiers are efficient and perform better in accuracy and time. For actual implementation, Python with NLTK and python-twitter APIs have been used. To validate the results of applied offensive tweets, identification, and classification techniques, various performance metrics were used in the study. The experiment results show that Naïve Bayes offered the best performance among other classifiers on Twitter data set classification with an accuracy performance value of 83.1%.	en_US
dc.description.sponsorship	Dr.George Okeyo, PhD JKUAT, Kenya Dr. Richard Rimiru, PhD JKUAT, Kenya	en_US
dc.language.iso	en	en_US
dc.publisher	JKUAT-COETEC	en_US
dc.subject	Naïve Bayes Algorithm	en_US
dc.subject	Hate Tweet Detection	en_US
dc.subject	Sentiment Classification	en_US
dc.title	Sentiment Classification for Hate Tweet Detection in Kenya on Twitter Data Using Naïve Bayes Algorithm	en_US
dc.type	Thesis	en_US