Sentiment Classification for Hate Tweet Detection in Kenya on Twitter Data Using Naïve Bayes Algorithm

Show simple item record

dc.contributor.author Kiilu, Kelvin Kiema
dc.date.accessioned 2021-03-04T09:45:51Z
dc.date.available 2021-03-04T09:45:51Z
dc.date.issued 2021-03-04
dc.identifier.uri http://localhost/xmlui/handle/123456789/5521
dc.description Master of Science in Computer Systems en_US
dc.description.abstract Twitter has flourished to several hundred Million users and could present a rich information source for detecting and classifying hate speech instigator and hate targets using the platform. Microblogging sites are well-known to be suitable for conveying hate speech. As such, hateful wording involves communications that unlawfully demean any group or person based on certain characteristics, including color, race, gender, ethnicity, sexual orientation, religion, or nationality. Such content can frighten, intimidate, or silence platform users, and a few of it will incite different users to commit a crime. The continuing rise of social internet platforms, especially Twitter, has forced the need for more immediate analysis of hatreds and other related antagonistic responses to various trigger events. Twitter users usually air their views about various topics of their interest. The problem is that each tweet is limited in characters and is hence very short. It may contain slang and misspelled words. Thus, it isn't easy to apply traditional NLP techniques designed for working with formal languages into the Twitter domain.Another problem is that the total volume of tweets is extremely high, and it takes a long time to process, thus motivating for analysis within the field. We performed a comparative analysis using various sentiment analysis and machine learning tools using various feature values and model hyperparameters. This thesis developed an approach for collection, preprocessing, and classifying hateful speech that uses content created by self-identifying hateful communities from Twitter. Therefore, this study aims to detect and classify hate speech based on Kenya's context over the Twitter platform, using Natural Language Processing (NLP) techniques, various machine learning methods, and a novel approach for sentiment analysis on Twitter data. These tweets were extracted from Twitter through Twitter API and stored in JSON format in Mongo DB. The Naive Bayes machine-learning algorithm was developed to classify hate tweets into positive and negative sentiments. Experimental evaluations show that the proposed machine learning classifiers are efficient and perform better in accuracy and time. For actual implementation, Python with NLTK and python-twitter APIs have been used. To validate the results of applied offensive tweets, identification, and classification techniques, various performance metrics were used in the study. The experiment results show that Naïve Bayes offered the best performance among other classifiers on Twitter data set classification with an accuracy performance value of 83.1%. en_US
dc.description.sponsorship Dr.George Okeyo, PhD JKUAT, Kenya Dr. Richard Rimiru, PhD JKUAT, Kenya en_US
dc.language.iso en en_US
dc.publisher JKUAT-COETEC en_US
dc.subject Naïve Bayes Algorithm en_US
dc.subject Hate Tweet Detection en_US
dc.subject Sentiment Classification en_US
dc.title Sentiment Classification for Hate Tweet Detection in Kenya on Twitter Data Using Naïve Bayes Algorithm en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account