dc.contributor.author |
Ogada, Kennedy Odhiambo |
|
dc.date.accessioned |
2016-07-11T14:40:39Z |
|
dc.date.available |
2016-07-11T14:40:39Z |
|
dc.date.issued |
2016-06-21 |
|
dc.identifier.uri |
http://hdl.handle.net/123456789/2174 |
|
dc.description |
Doctor of Philosophy
(Information Technology) |
en_US |
dc.description.abstract |
Sentiment Analysis or Text Classi cation aims at determining the overall
sentiment orientation of a given input text. Most data mining methods assume
that the data to be mined is represented in a structured relational database.
However, in many applications, available electronic information is in the form of
unstructured natural language documents. The bag-of-words (BoW) model has
been widely used to represent documents in text classi cation and many other
applications. BoW ignores the relationships between terms, o ers a rather poor
document representation. N-gram Model is a statistical technique to automatic
document classi cation which involves the determination of certain probability
relationships between individual content-bearing words and the subject categories
and the use of these relationships to predict the category to which a document
containing the words belongs. To capture this discriminative power of words as
phrases there is need for a model which can include such N-grams in the vector
space model without any additional changes in classi ers based on the vector
space models. Furthermore, determining the correct value of n, i.e. the size of
the sliding window that is to be used, when using word based n-gram analysis, is
an area of experimentation on each particular domain of knowledge. This PhD
research analyzed performance of ve text classi ers with N-grams of varying
sentiment length. The research methodology employed was experiments. The
major contribution of this research is a proposed hybrid framework for sentiment
classi cation that includes bag of words, N-grams, Skip-grams, and contextual
knowledge into learning and prediction phases of sentiment classi cation .
Keywords: Text Classi cation, Natural Language Modeling, Bag of Words, Ngrams,
Supervised Machine Learning.
xv |
en_US |
dc.description.sponsorship |
Prof. Waweru Mwangi
JKUAT, Kenya
Dr Wilson Cheruiyot
JKUAT, Kenya |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Jomo Kenyatta University Of Agriculture and Technology |
en_US |
dc.subject |
N-grams for Text Classification Using Supervised Machine Learning |
en_US |
dc.subject |
Text Classification |
en_US |
dc.subject |
Natural Language Modeling |
en_US |
dc.subject |
Bag of Words |
en_US |
dc.subject |
Supervised Machine Learning. |
en_US |
dc.subject |
Ngrams |
en_US |
dc.title |
N-grams for Text Classification Using Supervised Machine Learning |
en_US |
dc.type |
Thesis |
en_US |