| dc.contributor.author | Ogada, Kennedy Odhiambo | |
| dc.date.accessioned | 2016-07-11T14:40:39Z | |
| dc.date.available | 2016-07-11T14:40:39Z | |
| dc.date.issued | 2016-06-21 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/2174 | |
| dc.description | Doctor of Philosophy (Information Technology) | en_US |
| dc.description.abstract | Sentiment Analysis or Text Classi cation aims at determining the overall sentiment orientation of a given input text. Most data mining methods assume that the data to be mined is represented in a structured relational database. However, in many applications, available electronic information is in the form of unstructured natural language documents. The bag-of-words (BoW) model has been widely used to represent documents in text classi cation and many other applications. BoW ignores the relationships between terms, o ers a rather poor document representation. N-gram Model is a statistical technique to automatic document classi cation which involves the determination of certain probability relationships between individual content-bearing words and the subject categories and the use of these relationships to predict the category to which a document containing the words belongs. To capture this discriminative power of words as phrases there is need for a model which can include such N-grams in the vector space model without any additional changes in classi ers based on the vector space models. Furthermore, determining the correct value of n, i.e. the size of the sliding window that is to be used, when using word based n-gram analysis, is an area of experimentation on each particular domain of knowledge. This PhD research analyzed performance of ve text classi ers with N-grams of varying sentiment length. The research methodology employed was experiments. The major contribution of this research is a proposed hybrid framework for sentiment classi cation that includes bag of words, N-grams, Skip-grams, and contextual knowledge into learning and prediction phases of sentiment classi cation . Keywords: Text Classi cation, Natural Language Modeling, Bag of Words, Ngrams, Supervised Machine Learning. xv | en_US |
| dc.description.sponsorship | Prof. Waweru Mwangi JKUAT, Kenya Dr Wilson Cheruiyot JKUAT, Kenya | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Jomo Kenyatta University Of Agriculture and Technology | en_US |
| dc.subject | N-grams for Text Classification Using Supervised Machine Learning | en_US |
| dc.subject | Text Classification | en_US |
| dc.subject | Natural Language Modeling | en_US |
| dc.subject | Bag of Words | en_US |
| dc.subject | Supervised Machine Learning. | en_US |
| dc.subject | Ngrams | en_US |
| dc.title | N-grams for Text Classification Using Supervised Machine Learning | en_US |
| dc.type | Thesis | en_US |