The process of genre classification involves the identification of distinctive stylistic elements and musical characteristics that define a particular genre. It assists in developing a comprehensive understanding of the historical context, cultural influences, and musical evolution of a particular genre. This study was conducted to resolve the challenges of classifying Ethiopian music genres according to their melodic structures using deep learning techniques. The main objective was to develop a deep learning model for effective audio classification into six genres classes of Ethiopian music: Ancihoye Lene, Ambassel Major, Ambassel Minor, Bati, Tizita Major, and Tizita Minor. To achieve this, we first prepared a dataset consisting of 3952 audio recordings, which includes 533 tracks from Ethiopian Orthodox church music and 3419 samples of secular Ethiopian music. A total of 46 unique features, namely chroma short-time Fourier transform (STFT), root mean square error (RMSE), spectral centroid, spectral bandwidth, roll-off, zero crossing rate, and mel frequency cepstral coefficient (MFCC) 1 up to MFCC40, were extracted both at middle-level and low-level audio features from each sample, focusing on aspects suggested by Ethiopian music experts and preliminary experiments that highlighted the importance of tonality features. A 30-second segment of audio recordings was selected for feature extraction, resulting in datasets formatted in both CSV and JSON for further processing. We proposed deep learning algorithms namely convolutional neural networks (CNN), recurrent neural networks (RNN), a parallel RNN–CNN architecture, and long short-term memory (LSTM) networks for our classification by developing models. Our experiments revealed that the LSTM model achieved the best performance, reaching a classification accuracy of 97% using 40 MFCC features extracted from audio datasets.
Citation: Eshete Derib Emiru, Estifanos Tadele Bogale. Ethiopian music genre classification using deep learning[J]. Applied Computing and Intelligence, 2025, 5(1): 94-111. doi: 10.3934/aci.2025007
The process of genre classification involves the identification of distinctive stylistic elements and musical characteristics that define a particular genre. It assists in developing a comprehensive understanding of the historical context, cultural influences, and musical evolution of a particular genre. This study was conducted to resolve the challenges of classifying Ethiopian music genres according to their melodic structures using deep learning techniques. The main objective was to develop a deep learning model for effective audio classification into six genres classes of Ethiopian music: Ancihoye Lene, Ambassel Major, Ambassel Minor, Bati, Tizita Major, and Tizita Minor. To achieve this, we first prepared a dataset consisting of 3952 audio recordings, which includes 533 tracks from Ethiopian Orthodox church music and 3419 samples of secular Ethiopian music. A total of 46 unique features, namely chroma short-time Fourier transform (STFT), root mean square error (RMSE), spectral centroid, spectral bandwidth, roll-off, zero crossing rate, and mel frequency cepstral coefficient (MFCC) 1 up to MFCC40, were extracted both at middle-level and low-level audio features from each sample, focusing on aspects suggested by Ethiopian music experts and preliminary experiments that highlighted the importance of tonality features. A 30-second segment of audio recordings was selected for feature extraction, resulting in datasets formatted in both CSV and JSON for further processing. We proposed deep learning algorithms namely convolutional neural networks (CNN), recurrent neural networks (RNN), a parallel RNN–CNN architecture, and long short-term memory (LSTM) networks for our classification by developing models. Our experiments revealed that the LSTM model achieved the best performance, reaching a classification accuracy of 97% using 40 MFCC features extracted from audio datasets.
| [1] | E. Abate, Ethiopian Kiñit (scales): analysis of the formation and structure of the Ethiopian scale system, Proceedings of the 16th International Conference of Ethiopian Studies, 2009, 1213–2124. |
| [2] |
M. Sağun, B. Bolat, Classification of classic Turkish music makams by using deep belief networks, Proceedings of International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), 2016, 1–6. https://doi.org/10.1109/INISTA.2016.7571850 doi: 10.1109/INISTA.2016.7571850
|
| [3] |
E. Retta, R. Sutcliffe, E. Almekhlafi, Y. Enku, E. Alemu, T. Gemechu, et al., Kiñit classification in Ethiopian chants, Azmaris and modern music: a new dataset and CNN benchmark, PLoS ONE, 18 (2023), e0284560. https://doi.org/10.1371/journal.pone.0284560 doi: 10.1371/journal.pone.0284560
|
| [4] | F. Terefe, Pentatonic scale (kiñit) characteristics for Ethiopian music genre classification, Ph. D Thesis, Bahir Dar University, 2019. |
| [5] | A. Ramaseshan, Application of multiway methods for dimensinality reduction to music, Ph. D Thesis, Aalto University, 2013. |
| [6] | Eshete Derb, Estifanos Tadele, Ethiopian-music-genre, GitHub Inc., 2025. Available from: https://github.com/EsheteDerbAndEstifanosTadele/Ethiopian-Music-genre/blob/main/Ethiopian%20music%20genres.csv. |
| [7] |
M. Ashraf, F. Abid, I. Din, J. Rasheed, M. Yesiltepe, S. Yeo, A hybrid CNN and RNN variant model for music classification, Appl. Sci., 13 (2023), 1476. https://doi.org/10.3390/app13031476 doi: 10.3390/app13031476
|
| [8] | L. Feng, S. Liu, J. Yao, Music genre classification with paralleling recurrent convolutional neural network, arXiv: 1712.08370. https://doi.org/10.48550/arXiv.1712.08370 |
| [9] | L. Li, Audio musical genre classification using convolutional neural networks and pitch and tempo transformations, Ph. D Thesis, City University of Hong Kong, 2010. |
| [10] |
N. Pelchat, C. Gelowitz, Neural network music genre classification, university of regina, Can. J. Elect. Com., 43 (2020), 170–173. https://doi.org/10.1109/CJECE.2020.2970144 doi: 10.1109/CJECE.2020.2970144
|
| [11] | J. Tulisalmi-Eskola, Automatic music genre classification-supervised learning approach, Ph. D Thesis, Metropolia University of Applied Sciences, 2022. |
| [12] | J. Yang, Music genre classification with neural networks: an examination of several impactful variables, Ph. D Thesis, Trinity University, 2018. |
| [13] | Y. Costa, L. Oliveira, A. Koericb, F. Gouyon, Music genre recognition using spectrograms, Proceedings of 18th International Conference on Systems, Signals and Image Processing, 2011, 1–4. |
| [14] |
B. Ismanto, T. Kusuma, D. Anggraini, Indonesian music classification on folk and dangdut genre based on rolloff spectral feature using support vector machine (SVM) algorithm, IJCCS, 15 (2021), 11–20. https://doi.org/10.22146/ijccs.54646 doi: 10.22146/ijccs.54646
|
| [15] | G. Peeters, A large set of audio features for sound description (similarity and classification) in the CUIDADO project, CUIDADO 1st Project Report, 54 (2004), 1–25. |
| [16] | Z. Raś, A. Wieczorkowska, Advances in music information retrieval, Berlin: Springer-Verlag, 2010. https://doi.org/10.1007/978-3-642-11674-2 |
| [17] |
R. Devi, D. Pugazhenthi, Ideal sampling rate to reduce distortion in audio steganography, Procedia Computer Science, 85 (2016), 418–424. https://doi.org/10.1016/j.procs.2016.05.185 doi: 10.1016/j.procs.2016.05.185
|
| [18] |
S. Mehrotra, W. Chen, Z. Zhang, Interpolation of combined head and room impulse response for audio spatialization, Proceedings of IEEE 13th International Workshop on Multimedia Signal Processing, 2011, 1–6. https://doi.org/10.1109/MMSP.2011.6093794 doi: 10.1109/MMSP.2011.6093794
|
| [19] |
R. Mohammad, M. Kumar, Audio compression using multiple transformation techniques, International Journal of Computer Applications, 86 (2014), 13. https://doi.org/10.5120/15043-3405 doi: 10.5120/15043-3405
|
| [20] | A. Carlacci, Ogg vorbis and MP3 audio stream characterization, Ph. D Thesis, University of Alberta, 2002. |
| [21] |
C. Kim, R. Stern, Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis, Proceedings of 9th Annual Conference of the International Speech Communication Association, 2008, 2598–2601. https://doi.org/10.21437/Interspeech.2008-644 doi: 10.21437/Interspeech.2008-644
|
| [22] |
F. Abro, F. Rauf, B. Chowdhry, M. Rajarajan, Towards security of GSM voice communication, Wireless Pers. Commun., 108 (2019), 1933–1955. https://doi.org/10.1007/s11277-019-06502-y doi: 10.1007/s11277-019-06502-y
|
| [23] |
M. Muller, F. Kurth, M. Clausen, Chroma-based statistical audio features for audio matching, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005,275–278. https://doi.org/10.1109/ASPAA.2005.1540223 doi: 10.1109/ASPAA.2005.1540223
|
| [24] |
C. Creusere, K. Kallakuri, R. Vanam, An objective metric of human subjective audio quality optimized for a wide range of audio fidelities, IEEE Trans. Audio Speech, 16 (2008), 129–136. https://doi.org/10.1109/TASL.2007.907571 doi: 10.1109/TASL.2007.907571
|
| [25] |
J. Seo, M. Jin, S. Lee, D. Jang, S. Lee, C. Yoo, Audio fingerprinting based on normalized spectral subband centroids, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005,213–216. https://doi.org/10.1109/ICASSP.2005.1415684 doi: 10.1109/ICASSP.2005.1415684
|
| [26] | V. Nandedkar, Audio retrieval using multiple feature vectors, IJEEE, 1 (2011), 1–5. |
| [27] | C. Ittichaichareon, S. Suksri, T. Yingthawornsuk, Proceedings of International Conference on Computer Graphics, Simulation and Modeling, 2012,135–138. |
| [28] |
M. Gheisari, F. Ebrahimzadeh, M. Rahimi, M. Moazzamigodarzi, Y. Liu, P. Pramanik, et al., Deep learning: applications, architectures, models, tools, and frameworks: a comprehensive survey, CAAI Trans. Intell. Techno., 8 (2023), 581–606. https://doi.org/10.1049/cit2.12180 doi: 10.1049/cit2.12180
|
| [29] |
M. Anam, S. Defit, H. Haviluddin, L. Efrizoni, M. Firdaus, Early stopping on CNN-LSTM development to improve classification performance, Journal of Applied Data Sciences, 5 (2024), 1175–1188. https://doi.org/10.47738/jads.v5i3.312 doi: 10.47738/jads.v5i3.312
|
| [30] |
Z. Abdul, A. Al-Talabani, Mel frequency cepstral coefficient and its applications: a review, IEEE Access, 10 (2022), 122136–122158. https://doi.org/10.1109/ACCESS.2022.3223444 doi: 10.1109/ACCESS.2022.3223444
|
| [31] |
K. Rezaul, M. Jewel, M. Islam, K. Siddiquee, N. Barua, M. Rahman, et al., Enhancing audio classification through MFCC feature extraction and data augmentation with CNN and RNN models, Int. J. Adv. Comput. Sci., 15 (2024), 37–53. https://doi.org/10.14569/ijacsa.2024.0150704 doi: 10.14569/ijacsa.2024.0150704
|