This model is fine-tuned based on the Wav2Vec2 architecture, specifically designed to recognize six emotional states (sadness, anger, disgust, fear, happiness, neutral) in English speech, with an accuracy of 92.42%.
Audio Processing
SafetensorsEnglish