Abstract:
Traditional audience measurement systems are increasingly inadequate in capturing
fragmented, cross-platform television viewership, especially as unstructured social
media data becomes central to understanding audience behavior. However, the lack
of standardized, scalable, and computationally efficient machine learning models for
processing this data presents a major computing challenge for broadcasters seeking
accurate and inclusive audience insights. The main objective of the research was to
develop and evaluate the performance of a hybrid of Naïve Bayes and Support
Vector Machines (SVM) machine learning algorithms in predicting television
viewership using sentiment analysis of Twitter data. Focusing on two Kenyan TV
programs, NTV’s AM Live and KTN’s Morning Express, the study applied a five
step experimental design encompassing data collection, text preprocessing, sentiment
classification, and predictive modeling. Twitter data from 2018 to 2022 was collected
using Python-based tools (Tweepy and snscrape). Text mining techniques including
preprocessing, categorization, and feature extraction were used to prepare the data
for sentiment analysis. Using TextBlob and Scikit-learn, tweets were classified into
positive, neutral, and negative sentiments. These sentiment scores served as features
to train and evaluate three machine learning models: Naïve Bayes, SVM, and the
hybrid ensemble model. Results indicated that Naïve Bayes outperformed SVM with
a test accuracy of 65% compared to 62%. However, the hybrid model, which
combined both classifiers through a soft voting ensemble, achieved a superior
predictive accuracy of 77%. This finding confirms that hybrid approaches are more
effective in modeling viewership behavior from unstructured social media data. The
study concludes that hybrid machine learning models offer a robust and scalable
approach to predicting audience engagement, providing actionable insights for
broadcasters and advertisers. The integration of real-time sentiment analysis into
media analytics frameworks represents a promising innovation for the future of
television audience measurement.