An Enhanced K-Means Clustering Information Mining Model for Selective Dissemination of Information for Library Users

Too, Titus Kiprugut Rotich

JKUAT Repository Home
→
Theses and Dissertations
→
Collage of Pure and Applied Sciences (COPAS)
→
View Item

An Enhanced K-Means Clustering Information Mining Model for Selective Dissemination of Information for Library Users

Too, Titus Kiprugut Rotich

URI: http://localhost/xmlui/handle/123456789/6912

Date: 2026-03-05

Abstract:

The amount of information materials held by academic libraries is enormous and ever increasing at an astonishing rate as new pieces of information are now not only present in the physical form but also in digital sources. These large numbers of information resources are a challenge to librarians’ on how they can effectively and efficiently provide such relevant information resources to users. This abundance of information has created a twofold challenge. First, users have to navigate through huge information to locate what they need and that is relevant. Secondly, the challenge in computational problems where most human errors go unidentified and corrected. Mining of information is essential, as looking for data is in itself a troublesome process. Clustering is concerned with the grouping of unlabeled feature vectors into clusters, such that samples within a cluster are more similar to each other than samples belonging to different clusters. Usually, it is assumed that the number of clusters is known in advance, but otherwise no prior information is given about the data. Clustering can be used for information mining in the library. K-means is an algorithm for clustering a set of unlabeled feature vectors X: {x1, …, xn} that are drawn independently from the mixture density p(X|θ) with a parameter set θ. The main objective of this study implemented and evaluated a clustering mining model for selective dissemination of information at an academic library. The research methodology approach that was used was quantitative experimental research design. The dataset population was obtained from an online open-source repository containing datasets that was acquired by scrapping goodreads.com. That dataset was 48 MB in size. The research was based on purposive sampling technique, a form of non-probability sampling. The K-means model for clustering used a set of unlabeled feature vectors X:{x1, … , xn} that were drawn independently from the mixture density p(X|θ) with a parameter set θ. The dataset was divided into two-dimensional with P = 12 data points naturally clustered into K = 3 clusters. The dataset was imported online from a CSV file using python import library function numpy. The dataset contained both training and test data that provided an enhanced validation of the model. Once the training dataset was created, it was time to train the model. AutoML Google Colab was used to train the model that got an RMSE of 0.198. AutoML used a number of sophisticated models such as neural architecture search, which build a learning networks one layer at a time. The comparison results from the experiments prove that the implemented recommendation k-mean clustering model approach for clustering information mining model to enhance selective dissemination of information for library users has the least percentage of mean vector and covariance matrix which resulted in a higher accuracy of 71%. When matrix vector and covariance vector are low, it brings an impression of an efficient model approach to clustering information mining model. In future, work may be extended by adding suitable pre-processing approaches to improve the datasets as well as features selection approach to improve the classification accuracy. Future work should also extend on time series dynamic data that are in real time, thereby developing new technique against improved hybrid approaches.