Speech disorders have a significant impact on quality of life, as they decrease the ability to define one's character, exercise autonomy, and frequently affect relationships and self-esteem, particularly in young children. Dysarthria is a neurological illness that affects motor speech pronunciation. Young children who experience this disorder have no issue with their understanding, but they have a problem expressing their words. They might struggle to communicate precisely and smoothly with their friends and family members due to this illness. A dysarthric child has significant trouble with communication, as this disorder causes poorly pronounced phonemes and poor speech articulation. To address this condition, numerous speech assistive technologies have been developed for consumers with dysarthria, tailored to the level of severity. Currently, deep learning (DL) systems offer potential for objective evaluation, thereby improving diagnostic accuracy. Its goal is to systematically analyze present approaches for detecting dysarthria based on severity levels. In this manuscript, a novel pediatric dysarthria disorder detection framework using residual recurrent neural network and transformer (PD3F-RRNNT) technique is proposed. The PD3F-RRNNT technique aims to develop a real-time recognition method for accurately detecting dysarthria speech disorders in children, supporting early diagnosis and intervention. Initially, the audio processing phase involved various steps, including voice activity detection (VAD), noise removal, pre-emphasis, framing, windowing, and normalization, to transform and extract significant data from audio signals. Furthermore, the PD3F-RRNNT method utilizes the transformer-attention-based U-Net (TransAttUnet) technique for feature extraction. Finally, the residual bidirectional gated recurrent unit (RBG) method is employed to detect and classify speech disorders accurately. The experimental validation of the PD3F-RRNNT model is performed under the dysarthria and non-dysarthria speech dataset. The comparison analysis of the PD3F-RRNNT model revealed a superior accuracy value of 99.50% compared to existing techniques.
Citation: Ala Saleh Alluhaidan, Eman M Alanazi, Nasser Aljohani, Amani A Alneil. A real-time pediatric dysarthria speech disorder detection using residual recurrent neural network with attention U-net based transformer encoder model[J]. AIMS Mathematics, 2025, 10(12): 28787-28814. doi: 10.3934/math.20251267
Speech disorders have a significant impact on quality of life, as they decrease the ability to define one's character, exercise autonomy, and frequently affect relationships and self-esteem, particularly in young children. Dysarthria is a neurological illness that affects motor speech pronunciation. Young children who experience this disorder have no issue with their understanding, but they have a problem expressing their words. They might struggle to communicate precisely and smoothly with their friends and family members due to this illness. A dysarthric child has significant trouble with communication, as this disorder causes poorly pronounced phonemes and poor speech articulation. To address this condition, numerous speech assistive technologies have been developed for consumers with dysarthria, tailored to the level of severity. Currently, deep learning (DL) systems offer potential for objective evaluation, thereby improving diagnostic accuracy. Its goal is to systematically analyze present approaches for detecting dysarthria based on severity levels. In this manuscript, a novel pediatric dysarthria disorder detection framework using residual recurrent neural network and transformer (PD3F-RRNNT) technique is proposed. The PD3F-RRNNT technique aims to develop a real-time recognition method for accurately detecting dysarthria speech disorders in children, supporting early diagnosis and intervention. Initially, the audio processing phase involved various steps, including voice activity detection (VAD), noise removal, pre-emphasis, framing, windowing, and normalization, to transform and extract significant data from audio signals. Furthermore, the PD3F-RRNNT method utilizes the transformer-attention-based U-Net (TransAttUnet) technique for feature extraction. Finally, the residual bidirectional gated recurrent unit (RBG) method is employed to detect and classify speech disorders accurately. The experimental validation of the PD3F-RRNNT model is performed under the dysarthria and non-dysarthria speech dataset. The comparison analysis of the PD3F-RRNNT model revealed a superior accuracy value of 99.50% compared to existing techniques.
| [1] |
M. Shahin, U. Zafar, B. Ahmed, The automatic detection of speech disorders in children: challenges, opportunities, and preliminary results, IEEE J.-STSP, 14 (2020), 400–412. https://doi.org/10.1109/JSTSP.2019.2959393 doi: 10.1109/JSTSP.2019.2959393
|
| [2] |
G. A. Attwell, K. E. Bennin, B. Tekinerdogan, A systematic review of online speech therapy systems for intervention in childhood speech communication disorders, Sensors, 22 (2022), 9713. https://doi.org/10.3390/s22249713 doi: 10.3390/s22249713
|
| [3] |
A. Bhardwaj, M. Sharma, S. Kumar, S. Sharma, P.C. Sharma, Transforming pediatric speech and language disorder diagnosis and therapy: the evolving role of artificial intelligence, Health Sciences Review, 12 (2024), 100188. https://doi.org/10.1016/j.hsr.2024.100188 doi: 10.1016/j.hsr.2024.100188
|
| [4] |
A. Mytsyk, M. Pryshliak, Telepractice in the system of providing correctional and developmental services to children with speech disorders: interaction at a distance, Journal of History Culture and Art Research, 9 (2020), 94–105. http://doi.org/10.7596/taksad.v9i3.2674 doi: 10.7596/taksad.v9i3.2674
|
| [5] |
T. Sunderajan, S. V. Kanhere, Speech and language delay in children: prevalence and risk factors, Journal of Family Medicine and Primary Care, 8 (2019), 1642–1646. https://doi.org/10.4103/jfmpc.jfmpc_162_19 doi: 10.4103/jfmpc.jfmpc_162_19
|
| [6] |
A. N. Bhat, Motor impairment increases in children with autism spectrum disorder as a function of social communication, cognitive and functional impairment, repetitive behavior severity, and comorbid diagnoses: A SPARK study report, Autism Res., 14 (2021), 202–219. https://doi.org/10.1002/aur.2453 doi: 10.1002/aur.2453
|
| [7] | M. Jefferson, Usability of automatic speech recognition systems for individuals with speech disorders: past, present, future, and a proposed model, University Digital Conservancy, 2019. https://hdl.handle.net/11299/202757 |
| [8] |
S. S. Liu, M. Z. Geng, S. K. Hu, X. R. Xie, M. Y. Cui, J. W. Yu, et al., Recent progress in the CUHK dysarthric speech recognition system, IEEE-ACM T. Audio Spe., 29 (2021), 2267–2281. https://doi.org/10.1109/TASLP.2021.3091805 doi: 10.1109/TASLP.2021.3091805
|
| [9] |
M. Ur Rehman, A. Shafique, Q.-U.-A. Azhar, S. S. Jamal, Y. Gheraibia, A. B. Usman, Voice disorder detection using machine learning algorithms: An application in speech and language pathology, Eng. Appl. Artif. Intel., 133 (2024), 108047. https://doi.org/10.1016/j.engappai.2024.108047 doi: 10.1016/j.engappai.2024.108047
|
| [10] |
I. El-Henawy, M. Abo-Elazm, Handling within-word and cross-word pronunciation variation for Arabic speech recognition (knowledge-based approach), Journal of Intelligent Systems and Internet of Things, 1 (2020), 72–79. https://doi.org/10.54216/JISIoT.010202 doi: 10.54216/JISIoT.010202
|
| [11] |
Y. V. Sravya, K. Charishma, J. V. Narayana, K. Umasri, K. Sanjana, K. A. Pallavi, Autism spectrum disorder detection using hybrid machine learning and deep learning techniques, Frontiers in Collaborative Research, 3 (2025), 51–61. https://doi.org/10.70162/fcr/2025/v3/i1/v3i105 doi: 10.70162/fcr/2025/v3/i1/v3i105
|
| [12] |
V. R. Prabha, C. H. Bindu, K. R. Devi, An interpretable deep learning approach for autism spectrum disorder detection in children using NASNet-mobile, Biomed. Phys. Eng. Express, 11 (2025), 045006. https://doi.org/10.1088/2057-1976/addbe7 doi: 10.1088/2057-1976/addbe7
|
| [13] |
J. J. Gao, S. T. Song, A hierarchical feature extraction and multimodal deep feature integration-based model for autism spectrum disorder identification, IEEE J. Biomed. Health, 29 (2025), 4920–4931. https://doi.org/10.1109/JBHI.2025.3540894 doi: 10.1109/JBHI.2025.3540894
|
| [14] | C. B. Hu, J. Thrasher, W. Q. Li, M. D. Ruan, X. X. Yu, L. K. Paul, et al., Exploring speech pattern disorders in autism using machine learning, 2024, arXiv: 2405.05126. https://doi.org/10.48550/arXiv.2405.05126 |
| [15] |
M. Dia, G. Khodabandelou, A. Q. M. Sabri, A. Othmani, Video-based continuous affect recognition of children with autism spectrum disorder using deep learning, Biomed. Signal Proces., 89 (2024), 105712. https://doi.org/10.1016/j.bspc.2023.105712 doi: 10.1016/j.bspc.2023.105712
|
| [16] |
H. A. Mengash, H. Alqahtani, M. Maray, M. K. Nour, R. Marzouk, M. A. Al-Hagery, et al., Automated autism spectral disorder classification using optimal machine learning model, CMC-Comput. Mater. Con., 74 (2023), 5251–5265. https://doi.org/10.32604/cmc.2023.032729 doi: 10.32604/cmc.2023.032729
|
| [17] | K. C. Raja, S. Kannimuthu, Deep learning-based feature selection and prediction system for autism spectrum disorder using a hybrid meta-heuristics approach, J. Intell. Fuzzy Syst., 45 (2023), 797–807. |
| [18] |
A. Almadhor, R. Irfan, J. C. Gao, N. Saleem, H. T. Rauf, S. Kadry, E2E-DASR: End-to-end deep learning-based dysarthric automatic speech recognition, Expert Syst. Appl., 222 (2023), 119797. https://doi.org/10.1016/j.eswa.2023.119797 doi: 10.1016/j.eswa.2023.119797
|
| [19] |
O. Klempir, A. Skryjova, A. Tichopad, R. Krupicka, Ranking pre-trained speech embeddings in Parkinson's disease detection: Does Wav2Vec 2.0 outperform its 1.0 version across speech modes and languages, Comput. Struct. Biotec., 27 (2025), 2584–2601. https://doi.org/10.1016/j.csbj.2025.06.022 doi: 10.1016/j.csbj.2025.06.022
|
| [20] | K. Attaluri, A. Chvs, S. Chittepu, Empowering dysarthric speech: Leveraging advanced LLMs for accurate speech correction and multimodal emotion analysis, 2024, arXiv: 2410.12867. https://doi.org/10.48550/arXiv.2410.12867 |
| [21] |
R. Mahum, I. Ganiyu, L. Hidri, A. M. El-Sherbeeny, H. Hassan, A novel Swin transformer based framework for speech recognition for dysarthria, Sci. Rep., 15 (2025), 20070. https://doi.org/10.1038/s41598-025-02042-7 doi: 10.1038/s41598-025-02042-7
|
| [22] |
S. I. Ng, C. W. Y. Ng, J. R. Wang, T. Lee, Automatic detection of speech sound disorder in Cantonese-speaking pre-school children, IEEE-ACM T. Audio Spe., 32 (2024), 4355–4368. https://doi.org/10.1109/TASLP.2024.3463503 doi: 10.1109/TASLP.2024.3463503
|
| [23] |
S. R. Shahamiri, K. Mandal, S. Sarkar, Dysarthric speech recognition: an investigation on using depthwise separable convolutions and residual connections, Neural Comput. & Applic., 37 (2025), 7991–8005. https://doi.org/10.1007/s00521-024-10870-3 doi: 10.1007/s00521-024-10870-3
|
| [24] |
S. S. Sung, J. So, T. J. Yoon, S. Ha, Automatic detection of speech sound disorder in children using automatic speech recognition and audio classification, Phonetics Speech Sci., 16 (2024), 87–94. https://doi.org/10.13064/KSSS.2024.16.3.087 doi: 10.13064/KSSS.2024.16.3.087
|
| [25] |
M. Manoswini, B. Sahoo, A. Swetapadma, A novel speech signal feature extraction technique to detect speech impairment in children accurately, Comput. Biol. Med., 195 (2025), 110681. https://doi.org/10.1016/j.compbiomed.2025.110681 doi: 10.1016/j.compbiomed.2025.110681
|
| [26] |
G. Kim, Y. Eom, S. S. Sung, S. Ha, T. J. Yoon, J. So, Automatic children speech sound disorder detection with age and speaker bias mitigation, Proc. Interspeech, 2024 (2024), 1420–1424. https://doi.org/10.21437/Interspeech.2024-1799 doi: 10.21437/Interspeech.2024-1799
|
| [27] |
J. Mun, S. Kim, M. Chung, A cascaded multimodal framework for automatic social communication severity assessment in children with autism spectrum disorder, Proc. Interspeech, 2025 (2025), 3055–3059. https://doi.org/10.21437/Interspeech.2025-726 doi: 10.21437/Interspeech.2025-726
|
| [28] |
A. Ziani, A. Adouane, M. N. Amiri, S. Smail, New proposed solution for speech recognition without labeled data: tutoring system for children with autism spectrum disorder, Informatica, 48 (2024), 109–122. https://doi.org/10.31449/inf.v48i18.5204 doi: 10.31449/inf.v48i18.5204
|
| [29] | M. Labied, A. Belangour, M. Banane, A. Erraissi, An overview of automatic speech recognition pre-processing techniques, 2022 International Conference on Decision aid Sciences and Applications (DASA), Chiangrai, Thailand, 2022,804–809. https://doi.org/10.1109/DASA54658.2022.9765043 |
| [30] |
B. Z. Chen, Y. S. Liu, Z. Zhang, G. M. Lu, A. W. K. Kong, Transattunet: Multilevel attention-guided u-net with Transformer for medical image segmentation, IEEE Transactions on Emerging Topics in Computational Intelligence, 8 (2024), 55–68. https://doi.org/10.1109/TETCI.2023.3309626 doi: 10.1109/TETCI.2023.3309626
|
| [31] |
X. W. Mou, Y. F. Song, X. P. Xie, M. X. You, R. J. Wang, Res-RBG facial expression recognition in image sequences based on dual neural networks, Sensors, 25 (2025), 3829. https://doi.org/10.3390/s25123829 doi: 10.3390/s25123829
|
| [32] | Dysarthria and non-dysarthria speech dataset, 2022. Available from: https://www.kaggle.com/datasets/poojag718/dysarthria-and-nondysarthria-speech-dataset. |
| [33] |
F. Javanmardi, S. R. Kadiri, P. Alku, Pre-trained models for detection and severity level classification of dysarthria from speech, Speech Commun., 158 (2024), 103047. https://doi.org/10.1016/j.specom.2024.103047 doi: 10.1016/j.specom.2024.103047
|
| [34] |
A. Al-Ali, S. Al-Maadeed, M. Saleh, R. C. Naidu, Z. C. Alex, P. Ramachandran, The detection of dysarthria severity levels using AI models: A review, IEEE Access, 12 (2024), 48223–48238. https://doi.org/10.1109/ACCESS.2024.3382574 doi: 10.1109/ACCESS.2024.3382574
|
| [35] |
R. Vinotha, D. Hepsiba, L. D. V. Anand, P. M. Bruntha, L. Dinh, H. Dang, Empowering dysarthric communication: hybrid transformer-CTC based speech recognition system, IEEE Access, 13 (2025), 82479–82491. https://doi.org/10.1109/ACCESS.2025.3568342 doi: 10.1109/ACCESS.2025.3568342
|
| [36] |
N. A. Aljarallah, A. K. Dutta, A. R. W. Sait, Image classification-driven speech disorder detection using deep learning technique, SLAS Technol., 32 (2025), 100261. https://doi.org/10.1016/j.slast.2025.100261 doi: 10.1016/j.slast.2025.100261
|