Improved Adaptive Boosting in Heterogeneous Ensembles for Outlier Detection: Prioritizing Minimization of Bias, Variance and Order of Base Learners

Bii, Joash Kiprotich

JKUAT Repository Home
→
Theses and Dissertations
→
College of Pure and Applied Sciences (COPAS)
→
View Item

Improved Adaptive Boosting in Heterogeneous Ensembles for Outlier Detection: Prioritizing Minimization of Bias, Variance and Order of Base Learners

Bii, Joash Kiprotich

URI: http://localhost/xmlui/handle/123456789/6107

Date: 2023-05

Abstract:

Real-world data suffer from corruption caused by human errors, for instance, rounding errors, wrong measurements, biases, faults, or rare events, including malicious activities like credit card fraud or cyber activities that cause unusual patterns or outliers in data. The detection of outliers is a difficult task that requires complex ensemble models. The ideal outlier detection ensemble should assess the strengths and optimize the results of its base detectors while carefully combining their outputs to create a robust overall model and achieve unbiased accuracy with minimal variance. Existing outlier detection ensembles fuse numerous detectors (weak learners) in either parallel or sequential order to increase detection accuracy by obtaining a combined result through a majority vote. However, trusting the results of all weak learners may deteriorate overall ensemble performance as some learners may produce erroneous results depending on the types of data and their underlying rules. The general objective was to develop an outlier detection model by integrating multiple yet different (heterogeneous) base detectors into one model (ensemble), by first selecting highly accurate base detectors through training and evaluating every model by their error rates, and then implementing the adaptive boosting technique, where misclassified samples got to be feedback for the next detector (to minimize bias), then strategically combining all their decisions (to minimize variance), in order to obtain a strong detector by a combination function. The research’s specific objectives were: identifying weak learners by analyzing their initial biases and variances, analyzing fusion strategies, developing and evaluating an outlier detection model with a focus on minimizing bias, variance, and order of base learners. The CRISP-DM methodology was employed. Outlier datasets were drawn from ODDS library. The model was validated against four other baselines, and test results were compared using performance measures such as Recall, Precision, ROC and AUC values. The experiments showed improvement in results in at least 8 out of ten datasets in terms of average AUCROC even when the least of outliers (single cases up to 10%) were used.

Description:

Doctor of Philosophy in Computer Science

Show full item record

Files in this item

Name: Bii, Joash Kiprotich ...

Size: 3.525Mb

Format: PDF

Description: THESIS

View/Open

This item appears in the following Collection(s)

College of Pure and Applied Sciences (COPAS) [404]
Depts. in this collection Mathematics, Chemistry, Physics. ICT, Biochemistry, Microbiology

Search DSpace

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Improved Adaptive Boosting in Heterogeneous Ensembles for Outlier Detection: Prioritizing Minimization of Bias, Variance and Order of Base Learners

Improved Adaptive Boosting in Heterogeneous Ensembles for Outlier Detection: Prioritizing Minimization of Bias, Variance and Order of Base Learners

Abstract:

Description:

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account