Speech Emotion Recognition using Convolutional Neural Networks with Attention Mechanisms

Abstract

Speech Emotion Recognition (SER) is a crucial component in enhancing human- computer interaction by enabling machines to recognize and respond to human emotions effectively. This study proposes a novel SER framework using Convolutional Neural Networks (CNNs) augmented with attention mechanisms. The CNNs are employed to capture hierarchical and spatial features from spectrogram representations of speech signals, while Attention mechanisms focus on emotionally salient regions, improving interpretability and accuracy. The proposed model is evaluated on benchmark datasets, demonstrating superior performance compared to traditional methods. This innovative combination of CNNs and attention mechanisms highlights its potential for advancing real-world SER applications such as virtual assistants, customer support systems, and mental health monitoring. By prioritizing critical emotional features, the model improves its practical utility and reliability. This work underlines the importance of deep learning techniques in developing SER technologies, paving the way for more intuitive and effective human-computer interactions. This approach highlights the potential of combining CNNs with attention for advancing SER applications in real-world scenarios.

Country : India

1 A.Poongodai2 Y.Nandini3 T.Mounika4 A.Karishma5 N.Kevalya Kumar

  1. Assistant Professor, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, India
  2. Student, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, India
  3. Student, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, India
  4. Student, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, India
  5. Student, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, India

IRJIET, Volume 9, Special Issue of ICCIS-2025 May 2025 pp. 162-167

doi.org/10.47001/IRJIET/2025.ICCIS-202526

References

  1. Khalil et al., Edward jones. Speech Emotion Recognition using Deep Learning Techniques https://ieeeaccess.ieee.org/
  2. Aouani & Ben Ayed, Yassine Ben Ayed (2020). Speech Emotion Recognition with Deep Learning https://www.sciencedirect.com/search?qs=speech%20emotion%20recognition
  3. Kaur, Jasmeet & Anil Kumar, Shwethashri k (2021). Speech Emotion Recognition using Machine Learning https://www.irjet.net/archives/V7/i9/IRJETV7I9154
  4. Anastasia Pentari, George Kafentzis, Manolis Tsiknakis (2024). Speech Emotion Recognition via graph based representation. https://www.nature.com/articles/s4159024-52989-2.
  5. Apoorva Sharma, Himanshu Nawani, Shalini Verma (2023) Speech Emotion Recognition using Deep Learning.
  6. Pavithra et al., Sukhanya Ledella, Sirisha Devi (2023). Deep Learning based Speech Emotion Recognition: An Investigation into a sustainably Emotion–speech-Relationship. http://doi.org/10.1051/e3sconf/2023430010.
  7. Congshan Sun, Haifeng Li, Lin Ma (2023). Speech Emotion Recognition based on improved masking EMD and convolutional recurrent neural network. https:/doi.10.3389/fpsyg.2022.1075624.
  8. D. Lakshmi et al., Samuel kakuba et al. (2023). Speech Emotion Recognition using Librosa using hybrid models.
  9. Yunhao Zhao et al. (2023). Speech Emotion Recognition using convolutional Neural Networks (CNN) and gamma classifier-based error correcting output codes (ECOC). http://www.nature.com/scientificreports
  10. Samarth Adkitte et al., Vina Lomte, Mansi Fale, Vaibhavi k Kudale (2023). Speech Emotion Recognitionusing Deep Learning. https://ijcrt.org/papers/IJCRT2105446.pdf
  11. Tae-Wan Kim, keun-Chang Kwak (2024). Speech Emotion Recognition using Deep Learning Transfer Models and Explainable Techniques.
  12. Francesco Ardan Dal Ri, Fabio Cifariello Ciardi, Nicola Conci (2023). Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks.