Early detection of vocal disorders such as laryngeal cancer and dysphonia using voice analysis and machine learning
Keywords:
Voice Disorder, MFCC, Voice analysis, Machine learning, Dysphonia Detection, Laryngeal Cancer DetectionAbstract
Many serious disorders with our throat, such as laryngeal cancer, laryngitis, muscle tension dysphonia, vocal cord paralysis, and so on, are detected after the patient has become critically ill. These disorders can also be life-threatening, as I witnessed with my uncle. Looking at one of the most painful cancers, laryngeal cancer, I wanted to work on a remedy. This occurrence was crucial in directing my attention to this field of study. The majority of these disorders can be discovered early since the voice begins to change due to vocal cord disformations at an early stage. Smoking, drinking, bad eating habits, career, and other factors are all key contributors to these problems. The change in voice is typically the first sign of all of these disorders. People, on the other hand, have a tendency to disregard the very first symptom, which leads them deep into the problem. Voice irregularities, such as variations in frequency, may potentially be too deceiving to the human ear to be taken seriously. Voice disorders such as dysphonia and laryngeal cancer can be detected early using artificial intelligence and machine learning. I worked with Santosh Hospital to collect data and do background study on vocal problems and irregularities. Throughout the procedure, I collected 100+ minutes of audio data from individuals with laryngeal cancer while also researching approaches for detecting voice problems such as laryngoscopy. The project's goal is to distinguish between the voices of a healthy patient and a patient with a vocal cord disorder. A voice analysis comparison between a healthy patient and a patient with a vocal issue was used for this objective. 40 human voice parameters such as frequency, pitch, and zero crossing rate were retrieved using MFCCs and methods such as the discrete cosine transformation and the mel filter bank. A wrapper was used to pick the most important features in determining if the patient has a vocal problem or not. After that, the logistic regression model was used to train a machine learning model to determine if the audio sample was disordered or healthy. The instrument has an incredibly high accuracy of 88%, making it extremely efficient. This is a technology that assists patients at an early stage in order to keep therapy simple and cure cancer and other critical conditions faster. It also lessens stress on doctors and lowers medical costs while decreasing the effect of sedatives on patients. The technique is very simple to use and available in all places where competent doctors and proper equipment to detect such major voice problems are lacking.
References
Gour, G. B., Udayashankara, V., Badakh, D. K., & Kulkarni, Y. A. (2020). Voice-Disorder Identification of Laryngeal Cancer Patients. International Journal of Advanced Computer Science and Applications, 11(11).
Aicha, A. B. (2018). Noninvasive detection of potentially precancerous lesions of vocal fold based on glottal wave signal and SVM approaches. Procedia Computer Science, 126, 586-595.
Akshara, R., & Latchoumi, T. P. (2021). A Survey: Identification of Throat Cancer byMachine Learning. Annals of the Romanian Society for Cell Biology, 6616-6622
Verde, L., De Pietro, G., Alrashoud, M., Ghoneim, A., Al-Mutib, K. N., & Sannino, G. (2019). Dysphonia detection index (DDI): A new multi-parametric marker to evaluate voice quality. IEEE Access, 7, 55689-55697.
Leinonen, L., Hiltunen, T., Kangas, J., Juvas, A., & Rihkanen, H. (1993). Detection of dysphonia by pattern recognition of speech spectra. Scandinavian Journal of Logopedics and Phoniatrics, 18(4), 159-167
Tsuboi, T., Watanabe, H., Tanaka, Y., Ohdake, R., Hattori, M., Kawabata, K., ... & Sobue, G. (2017). Early detection of speech and voice disorders in Parkinson’s disease patients treated with subthalamic nucleus deep brain stimulation: a 1-year follow-up study. Journal of Neural Transmission, 124(12), 1547-1556
Pham, Minh & Lin, Jing & Zhang, Yanjia. (2018). Diagnosing Voice Disorder with Machine Learning. 5263-5266. 10.1109/BigData.2018.8622250.
Mittal, Vikas and R. K. Sharma. "Deep Learning Approach for Voice Pathology Detection and Classification." IJHISI vol.16, no.4 2021: pp.1-30. http://doi.org/10.4018/IJHISI.20211001.oa28
Hu H, Chang S, Wang C, Li K, Cho H, Chen Y, Lu C, Tsai T, Lee O, Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study,J Med Internet Res 2021;23(6):e25247 URL: https://www.jmir.org/2021/6/e25247 DOI: 10.2196/25247
Won Ki Cho, Seung-Ho Choi, Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images, Journal of Voice, 2020,ISSN 0892-1997, https://doi.org/10.1016/j.jvoice.2020.08.003. (https://www.sciencedirect.com/science/article/pii/S0892199720302927)
Jonathan Reid, Preet Parmar, Tyler Lund, Daniel K. Aalto, Caroline C. Jeffery, Development of a machine-learning based voice disorder screening tool, American Journal of Otolaryngology, Volume 43, Issue 2, 2022, 103327, ISSN 0196-0709, https://doi.org/10.1016/j.amjoto.2021.103327. (https://www.sciencedirect.com/science/article/pii/S0196070921004282)
H. Wu, J. Soraghan, A. Lowit and G. Di Caterina, "Convolutional Neural Networks for Pathological Voice Detection," 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2018, pp. 1-4, doi: 10.1109/EMBC.2018.8513222.
B. HalpernJ. FritschEnno HermannR. V. SonO. ScharenborgM. Magimai.-Doss ,An Objective Evaluation Framework for Pathological Speech SynthesisITG Conference On Speech Communication2021,
Kim, H.; Jeon, J.; Han, Y.J.; Joo, Y.; Lee, J.; Lee, S.; Im, S. Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy. J. Clin. Med. 2020, 9, 3415. https://doi.org/10.3390/jcm9113415