Data Cleansing Framework to Enhance Quality of Multimedia Data Using Convolutional Neural Networks

Show simple item record

dc.contributor.author Kioko, Alphonse Muthusi
dc.date.accessioned 2025-10-28T08:29:43Z
dc.date.available 2025-10-28T08:29:43Z
dc.date.issued 2025-10-28
dc.identifier.citation KiokoAM2025 en_US
dc.identifier.uri http://localhost/xmlui/handle/123456789/6810
dc.description Master of Science in Computer Systems en_US
dc.description.abstract In the era of data-driven decision-making, the fitness of data quality to meet its intended purpose is of paramount importance. The success of machine learning (ML) models hinges on the quality of the datasets used during training. Real-world datasets, however, are often riddled with imperfections such as label noise, outliers, missing values, and inconsistencies across features, all of which degrade model performance and generalization. Traditional data-cleaning frameworks, while effective in specific scenarios, struggle to adapt to dynamic data patterns, multi-modal formats, and resource-constrained environments due to their domain-specific design. Most frameworks were developed for structured numeric or textual data, rendering them inadequate for addressing the unique challenges of multimedia formats. Even existing multimedia data-cleaning models remain domain-specific, highlighting the critical need for an enhanced, adaptive solution to improve data quality in image-centric applications. This study introduces the Intelligent Image Forensic Analyzer Layer (iFAL) integrated into a CNN framework, a novel approach enabling adaptive, efficient, and robust data cleaning through iFAL-learning features. This study systematically evaluates iFAL-CNN’s capacity to address challenges across diverse datasets by integrating multi-modal features with capacity for extraction detailed dataset features missed by most of the outlined prior frameworks within and cross-domain generalization. This redefines automated data purification paradigms, offering a scalable algorithm for modern ML pipelines. Conventional rule-based models such as AutoClean (2019), CleanNet (2020), DCN-Clean (2021), and PurifiCNN (2022) are constrained by static heuristics, single-modality focus, and reliance on noise distribution assumptions. In contrast, the iFAL-CNN architecture overcomes these limitations by leveraging metadata analysis, error-level analysis, and authentication accuracy metrics. Designed to generalize across multimedia datasets, iFAL-CNN achieves state-of-the-art performance in accuracy, efficiency, and adaptability. Experimental results on a static dataset within various Data cleansing frameworks demonstrated the following accuracy percentages: Raw Data: 85.2%, AutoClean: 86.5% (+1.3), CleanNet: 87.8% (+2.6), Noise2Self: 88.1% (+2.9), DCN-Clean: 88.4% (+3.2), PurifiCNN: 89.3% (+4.1) and iFAL-CNN (Proposed): 90.7% (+5.5), making it the best performer. The results in this study demonstrates that incorporating hybrid feature extraction, metadata integrity verification, and adaptive error detection mechanisms enables superior cleansing, validation, and preparation of noisy large-scale image datasets for downstream machine learning tasks. iFAL-CNN framework (Proposed) offers a highly scalable, forensic-aware, and data-efficient purification framework that significantly enhances both dataset integrity and model learning stability. Future researches should prioritize Advanced Deep Learning Architectures (ADLA), such as Transformer-CNN hybrids with self-attention mechanisms, to extend these advancements into dynamic and virtual reality context. en_US
dc.description.sponsorship Prof. Cheruiyot W.K, PhD JKUAT, Kenya Dr. Richard Rimiru, PhD JKUAT, Kenya en_US
dc.language.iso en en_US
dc.publisher JKUAT-COPAS en_US
dc.subject Data Cleansing en_US
dc.subject Multimedia Data en_US
dc.subject Convolutional Neural Networks en_US
dc.subject Networks en_US
dc.title Data Cleansing Framework to Enhance Quality of Multimedia Data Using Convolutional Neural Networks en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account