Abstract:
Most organizations rely on data that is generated after performing their daily transactions and operations. This data is retrieved from different source systems in a distributed network hence it comes in varying data types and formats. The source data is prepared and cleaned by subjecting it to algorithms and functions before transferring it to the target systems which takes more time. Moreover, there is pressure from data users within the data warehouse for data to be availed quickly for them to make appropriate decisions and forecasts. There has been a lot of delay in data delivery to the business users due to immense data explosion emanating from millions of transactions running concurrently. The current legacy systems cannot handle large data levels due to processing capabilities and customizations. The performance degradation has raised concerns since organizations invest a lot of resources to establish functioning data warehouses. Data staging, a technological innovation within data warehouses is targeted since most data manipulations are carried out here. It determines which data is to be integrated, harmonized by the staging functions, cleansed, verified, and archived for future use. The population selected to carry out the study was chosen amongst large organizational databases available online for research purposes. The stratified random sampling method was used to determine the sample frame for study. Several tools including Ms Excel, SQL Server Analysis and Integration Services were vital during data analysis and experimentation. The deterministic prioritization algorithm was developed and tested with a focus on data staging performance and efficiency. The proposed solution highlights the necessities of pre-determining the expected data loads and ways of prioritizing them and optimizing the execution plans. The experiment test runs for the different scenarios demonstrated in the study shows that data staging processing time improved by 2.66% and consequently the loading process time improved by 93.44%. Therefore, a recommendation to data warehouse practitioners and business intelligence designers was put forward to implement the Deterministic Prioritization algorithm providing enhancement for future design of Extraction, Transformation and Loading processes in data warehouse development.