Enhancing Speech Coding Quality by Embedding the GRU Predictor with the Interactive Multimedia Association Adaptive Differential Pulse Code Modulation (IMA-ADPCM) Codec System

Show simple item record

dc.contributor.author Gebremichael, Sheferaw Kibret
dc.date.accessioned 2026-03-05T11:50:12Z
dc.date.available 2026-03-05T11:50:12Z
dc.date.issued 2026-03-05
dc.identifier.citation GebremichaelKS2026 en_US
dc.identifier.uri http://localhost/xmlui/handle/123456789/6909
dc.description PhD in Information Technology en_US
dc.description.abstract Speech coding is essential for the effective transmission and storage of audio signals, but the conventional, fixed, linear prediction model of the Interactive Multimedia Association Adaptive Differential Pulse Code Modulation (IMA-ADPCM) codec struggles with the dynamic, non-stationary nature of human speech, thereby limiting coding quality. This thesis addresses this critical limitation by proposing and implementing a novel system that embeds a Gated Recurrent Unit (GRU)-based neural network predictor directly within the standard IMA-ADPCM codec architecture. The core methodological contribution is the development of this GRU-IMA-ADPCM codec system and an innovative training approach that leverages paired Pulse-Code Modulation (PCM) speech samples and ADPCM predictor outputs to optimize the GRU's ability to capture complex, non-linear temporal dependencies for better signal reconstruction. Using the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus, the system was evaluated across three experimental configurations: the baseline fixed-predictor IMA-ADPCM, an online learning GRU predictor, and batch learning based GRU predictive model embedded with IMA-ADPCM speech Decoder. The batch learning based GRU predictor with IMA-ADPCM system consistently demonstrated methodological superiority, outperforming traditional methods in both objective and subjective evaluations. Specifically, objective metrics showed Signal-to-Noise Ratio (SNR) values reached as high as 45 dB, and subjective evaluations confirmed enhanced perceived quality, achieving Mean Opinion Scores (MOS) between 3.8 and 4.3. This work's key contribution is the development and validation of this computationally efficient, integrated GRU predictor algorithm, which significantly enhances speech coding accuracy and provides a robust solution for real-time speech signal processing. en_US
dc.description.sponsorship Prof. Waweru Mwangi, PhD JKUAT, Kenya Dr. Michael Kimwele, PhD JKUAT, Kenya Dr. Adane Mamuye, PhD Addis Ababa University Institute of Technology, Ethiopia en_US
dc.language.iso en en_US
dc.publisher COPAS- JKUAT en_US
dc.subject Speech Coding en_US
dc.subject GRU Predictor en_US
dc.subject Multimedia Association en_US
dc.subject Pulse Code Modulation (IMA-ADPCM) Codec System en_US
dc.title Enhancing Speech Coding Quality by Embedding the GRU Predictor with the Interactive Multimedia Association Adaptive Differential Pulse Code Modulation (IMA-ADPCM) Codec System en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account