N-grams for Text Classification Using Supervised Machine Learning

Show simple item record

dc.contributor.author Ogada, Kennedy Odhiambo
dc.date.accessioned 2016-07-11T14:40:39Z
dc.date.available 2016-07-11T14:40:39Z
dc.date.issued 2016-06-21
dc.identifier.uri http://hdl.handle.net/123456789/2174
dc.description Doctor of Philosophy (Information Technology) en_US
dc.description.abstract Sentiment Analysis or Text Classi cation aims at determining the overall sentiment orientation of a given input text. Most data mining methods assume that the data to be mined is represented in a structured relational database. However, in many applications, available electronic information is in the form of unstructured natural language documents. The bag-of-words (BoW) model has been widely used to represent documents in text classi cation and many other applications. BoW ignores the relationships between terms, o ers a rather poor document representation. N-gram Model is a statistical technique to automatic document classi cation which involves the determination of certain probability relationships between individual content-bearing words and the subject categories and the use of these relationships to predict the category to which a document containing the words belongs. To capture this discriminative power of words as phrases there is need for a model which can include such N-grams in the vector space model without any additional changes in classi ers based on the vector space models. Furthermore, determining the correct value of n, i.e. the size of the sliding window that is to be used, when using word based n-gram analysis, is an area of experimentation on each particular domain of knowledge. This PhD research analyzed performance of ve text classi ers with N-grams of varying sentiment length. The research methodology employed was experiments. The major contribution of this research is a proposed hybrid framework for sentiment classi cation that includes bag of words, N-grams, Skip-grams, and contextual knowledge into learning and prediction phases of sentiment classi cation . Keywords: Text Classi cation, Natural Language Modeling, Bag of Words, Ngrams, Supervised Machine Learning. xv en_US
dc.description.sponsorship Prof. Waweru Mwangi JKUAT, Kenya Dr Wilson Cheruiyot JKUAT, Kenya en_US
dc.language.iso en en_US
dc.publisher Jomo Kenyatta University Of Agriculture and Technology en_US
dc.subject N-grams for Text Classification Using Supervised Machine Learning en_US
dc.subject Text Classification en_US
dc.subject Natural Language Modeling en_US
dc.subject Bag of Words en_US
dc.subject Supervised Machine Learning. en_US
dc.subject Ngrams en_US
dc.title N-grams for Text Classification Using Supervised Machine Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account