Improved Adaptive Boosting in Heterogeneous Ensembles for Outlier Detection: Prioritizing Minimization of Bias, Variance and Order of Base Learners

Show simple item record

dc.contributor.author Bii, Joash Kiprotich
dc.date.accessioned 2023-05-29T07:51:42Z
dc.date.available 2023-05-29T07:51:42Z
dc.date.issued 2023-05
dc.identifier.uri http://localhost/xmlui/handle/123456789/6107
dc.description Doctor of Philosophy in Computer Science en_US
dc.description.abstract Real-world data suffer from corruption caused by human errors, for instance, rounding errors, wrong measurements, biases, faults, or rare events, including malicious activities like credit card fraud or cyber activities that cause unusual patterns or outliers in data. The detection of outliers is a difficult task that requires complex ensemble models. The ideal outlier detection ensemble should assess the strengths and optimize the results of its base detectors while carefully combining their outputs to create a robust overall model and achieve unbiased accuracy with minimal variance. Existing outlier detection ensembles fuse numerous detectors (weak learners) in either parallel or sequential order to increase detection accuracy by obtaining a combined result through a majority vote. However, trusting the results of all weak learners may deteriorate overall ensemble performance as some learners may produce erroneous results depending on the types of data and their underlying rules. The general objective was to develop an outlier detection model by integrating multiple yet different (heterogeneous) base detectors into one model (ensemble), by first selecting highly accurate base detectors through training and evaluating every model by their error rates, and then implementing the adaptive boosting technique, where misclassified samples got to be feedback for the next detector (to minimize bias), then strategically combining all their decisions (to minimize variance), in order to obtain a strong detector by a combination function. The research’s specific objectives were: identifying weak learners by analyzing their initial biases and variances, analyzing fusion strategies, developing and evaluating an outlier detection model with a focus on minimizing bias, variance, and order of base learners. The CRISP-DM methodology was employed. Outlier datasets were drawn from ODDS library. The model was validated against four other baselines, and test results were compared using performance measures such as Recall, Precision, ROC and AUC values. The experiments showed improvement in results in at least 8 out of ten datasets in terms of average AUCROC even when the least of outliers (single cases up to 10%) were used. en_US
dc.description.sponsorship Dr. Richard Rimiru, PhD JKUAT, Kenya Prof. Waweru Ronald Mwangi, PhD JKUAT, Kenya en_US
dc.language.iso en en_US
dc.publisher JKUAT-COPAS en_US
dc.subject Outliers en_US
dc.subject Weak learners en_US
dc.subject Ensembles en_US
dc.subject Bias en_US
dc.subject Variance en_US
dc.title Improved Adaptive Boosting in Heterogeneous Ensembles for Outlier Detection: Prioritizing Minimization of Bias, Variance and Order of Base Learners en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account