N-grams for Text Classification Using Supervised machine learning algorithms

Ogada, Kennedy Odhiambo

dc.contributor.author	Ogada, Kennedy Odhiambo
dc.date.accessioned	2016-07-11T14:40:24Z
dc.date.available	2016-07-11T14:40:24Z
dc.date.issued	2016-06-21
dc.identifier.uri	http://hdl.handle.net/123456789/2173
dc.description	Doctor of Philosophy (Information Technology)	en_US
dc.description.abstract	Sentiment Analysis or Text Classi cation aims at determining the overall sentiment orientation of a given input text. Most data mining methods assume that the data to be mined is represented in a structured relational database. However, in many applications, available electronic information is in the form of unstructured natural language documents. The bag-of-words (BoW) model has been widely used to represent documents in text classi cation and many other applications. BoW ignores the relationships between terms, o ers a rather poor document representation. N-gram Model is a statistical technique to automatic document classi cation which involves the determination of certain probability relationships between individual content-bearing words and the subject categories and the use of these relationships to predict the category to which a document containing the words belongs. To capture this discriminative power of words as phrases there is need for a model which can include such N-grams in the vector space model without any additional changes in classi ers based on the vector space models. Furthermore, determining the correct value of n, i.e. the size of the sliding window that is to be used, when using word based n-gram analysis, is an area of experimentation on each particular domain of knowledge. This PhD research analyzed performance of ve text classi ers with N-grams of varying sentiment length. The research methodology employed was experiments. The major contribution of this research is a proposed hybrid framework for sentiment classi cation that includes bag of words, N-grams, Skip-grams, and contextual knowledge into learning and prediction phases of sentiment classi cation . Keywords: Text Classi cation, Natural Language Modeling, Bag of Words, Ngrams, Supervised Machine Learning.	en_US
dc.description.sponsorship	Prof. Waweru Mwangi JKUAT, Kenya Dr Wilson Cheruiyot JKUAT, Kenya	en_US
dc.language.iso	en	en_US
dc.publisher	Jomo Kenyatta University of Agriculture and Technology	en_US
dc.subject	N-grams for Text Classification Using Supervised Machine Learning	en_US
dc.subject	Information Technology	en_US
dc.subject	Natural Language Modeling	en_US
dc.subject	Text Classi cation	en_US
dc.subject	Bag of Words	en_US
dc.subject	Ngrams	en_US
dc.subject	Supervised Machine Learning.	en_US
dc.title	N-grams for Text Classification Using Supervised machine learning algorithms	en_US
dc.type	Thesis	en_US