Have a personal or library account? Click to login
Speech Processing Using Dynamic Micro-Block Optimization Based on Deep Learning Cover

Speech Processing Using Dynamic Micro-Block Optimization Based on Deep Learning

By: Jiajun Hao and  Chaoyang Geng  
Open Access
|Dec 2025

Abstract

—Driven by deep learning advances, speech processing systems such as automatic speech recognition (ASR), source segregation, noise suppression have achieved significant performance improvements. However, traditional training strategies, particularly static mini-batch selection, often overlook the dynamic variations in data complexity and model convergence behavior, resulting in ineffective training efficiency and limited model accuracy. To tackle this limitation, we introduce a novel training paradigm called Dynamic Micro-block Optimization (DMBO). The method introduces a fine-grained sampling mechanism by partitioning the training set into smaller units called “micro-blocks,” which are dynamically updated during training based on real-time characteristics such as sample loss, gradient diversity, and utterance complexity. Four sampling strategies—loss-weighted, gradient-diversity, gender-based, and accent-based—are designed to self-adjust the composition of training data. The DMBO framework is implemented using Connectionist Temporal Classification (CTC) and Long Short-term Memory (LSTM) networks for end-to-end speech recognition. Experimental evaluations on the VCTK datasets demonstrate that the proposed method significantly accelerates convergence and improves model accuracy. Specifically, the gender-homogeneous strategy reduces the Label Error Rate (LER) by 9.0% compared to standard mini-batch training, while the accent-heterogeneous strategy achieves a 9.2% absolute LER reduction. These results confirm that dynamic optimization at the micro-block level enhances the efficacy of deep learning models in speech processing tasks, and the experimental outcomes are consistent with theoretical expectations, validating the effectiveness and correctness of the proposed approach.

Language: English
Page range: 46 - 58
Published on: Dec 31, 2025
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Jiajun Hao, Chaoyang Geng, published by Xi’an Technological University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.