Enhancing Speech Coding Quality by Embedding the GRU Predictor with the Interactive Multimedia Association Adaptive Differential Pulse Code Modulation (IMA-ADPCM) Codec System

Gebremichael, Sheferaw Kibret

JKUAT Repository Home
→
Theses and Dissertations
→
Collage of Pure and Applied Sciences (COPAS)
→
View Item

dc.contributor.author	Gebremichael, Sheferaw Kibret
dc.date.accessioned	2026-03-05T11:50:12Z
dc.date.available	2026-03-05T11:50:12Z
dc.date.issued	2026-03-05
dc.identifier.citation	GebremichaelKS2026	en_US
dc.identifier.uri	http://localhost/xmlui/handle/123456789/6909
dc.description	PhD in Information Technology	en_US
dc.description.abstract	Speech coding is essential for the effective transmission and storage of audio signals, but the conventional, fixed, linear prediction model of the Interactive Multimedia Association Adaptive Differential Pulse Code Modulation (IMA-ADPCM) codec struggles with the dynamic, non-stationary nature of human speech, thereby limiting coding quality. This thesis addresses this critical limitation by proposing and implementing a novel system that embeds a Gated Recurrent Unit (GRU)-based neural network predictor directly within the standard IMA-ADPCM codec architecture. The core methodological contribution is the development of this GRU-IMA-ADPCM codec system and an innovative training approach that leverages paired Pulse-Code Modulation (PCM) speech samples and ADPCM predictor outputs to optimize the GRU's ability to capture complex, non-linear temporal dependencies for better signal reconstruction. Using the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus, the system was evaluated across three experimental configurations: the baseline fixed-predictor IMA-ADPCM, an online learning GRU predictor, and batch learning based GRU predictive model embedded with IMA-ADPCM speech Decoder. The batch learning based GRU predictor with IMA-ADPCM system consistently demonstrated methodological superiority, outperforming traditional methods in both objective and subjective evaluations. Specifically, objective metrics showed Signal-to-Noise Ratio (SNR) values reached as high as 45 dB, and subjective evaluations confirmed enhanced perceived quality, achieving Mean Opinion Scores (MOS) between 3.8 and 4.3. This work's key contribution is the development and validation of this computationally efficient, integrated GRU predictor algorithm, which significantly enhances speech coding accuracy and provides a robust solution for real-time speech signal processing.	en_US
dc.description.sponsorship	Prof. Waweru Mwangi, PhD JKUAT, Kenya Dr. Michael Kimwele, PhD JKUAT, Kenya Dr. Adane Mamuye, PhD Addis Ababa University Institute of Technology, Ethiopia	en_US
dc.language.iso	en	en_US
dc.publisher	COPAS- JKUAT	en_US
dc.subject	Speech Coding	en_US
dc.subject	GRU Predictor	en_US
dc.subject	Multimedia Association	en_US
dc.subject	Pulse Code Modulation (IMA-ADPCM) Codec System	en_US
dc.title	Enhancing Speech Coding Quality by Embedding the GRU Predictor with the Interactive Multimedia Association Adaptive Differential Pulse Code Modulation (IMA-ADPCM) Codec System	en_US
dc.type	Thesis	en_US