AI-Based Emotion Recognition from Speech and Text for Emotion-Aware Educational Systems

Par : Monsieur M. Hmad ZENNOU

Le samedi 11 juillet 2026 à 09:00

Le Doyen de la Faculté des Sciences et Techniques de Béni Mellal porte à la connaissance du public que Monsieur M. Hmad ZENNOU, soutiendra une thèse de Doctorat intitulée : «AI-Based Emotion Recognition from Speech and Text for Emotion-Aware Educational Systems».

La soutenance publique aura lieu le Samedi 11 Juillet 2026 à 10h00 à la Salle 1 Pôle des Etudes Doctorales de l’Université Sultan Moulay Slimane de Béni Mellal, devant le jury composé de :

Monsieur Rachid EL AYACHI : Professeur, Faculté des Sciences et Techniques, Université Sultan Moulay Slimane, Béni Mellal, Président ;
Monsieur Lahcen OUGHDIR : Professeur, École Nationale des Sciences Appliquées, Université Sidi Mohamed Ben Abdellah, Fès, Rapporteur ;
Monsieur Nabil ABABOU : Maître de Conférences Habilité, Faculté d'Économie et Gestion, Université Sultan Moulay Slimane, Béni Mellal, Rapporteur ;
Monsieur Hamid OUANAN : Maître de Conférences Habilité, École Nationale des Sciences Appliquées, Université Sultan Moulay Slimane, Béni Mellal, Rapporteur ;
Monsieur Moulay Driss EL OUADGHIRI : Professeur, Faculté des Sciences, Université Moulay Ismail, Meknès, Examinateur ;
Monsieur Abderrahim SALHI : Maître de Conférences Habilité, Faculté des Sciences et Techniques, Université Sultan Moulay Slimane, Béni Mellal, Examinateur ;
Monsieur Mohamed OUHDA : Maître de Conférences Habilité, Ecole Supérieure de Technologie, Université Sultan Moulay Slimane, Khénifra, Co-directeur de thèse ;
Monsieur Mohamed Baslam : Professeur, Faculté des Sciences et Techniques, Université Sultan Moulay Slimane, Béni Mellal, Directeur de thèse.

Résumé:

This thesis addresses the automatic recognition of human emotions from speech signals and learner-generated text, within the broader goal of building a multimodal AI system incorporating facial expression recognition. Three complementary contributions are proposed, each targeting a distinct challenge in affective computing - the field dedicated to systems that can recognize, interpret, and simulate human emotions.

The first contribution (Chapter 2) introduces a lightweight CNN-LSTM architecture for Speech Emotion Recognition (SER) that fuses numerical features – Mel-Frequency Cepstral Coefficients (MFCCs) and Root Mean Square Energy (RMSE) – to jointly capture the spectral and energy dynamics of speech. Evaluated on the RAVDESS benchmark, the model achieves 89.79% accuracy with approximately 240K parameters, outperforming state-of-the-art baselines by 9%, while remaining suitable for real-time deployment.

The second contribution (Chapter 3) extends this architecture by replacing the unidirectional LSTM with a Bidirectional LSTM and adding a soft attention mechanism. This CNN-BiLSTM-Attention model captures forward and backward temporal dependencies and selectively weights emotionally salient speech segments. It achieves 91.74% accuracy on RAVDESS and 98.10% on the combined RavTess dataset - the highest reported results among models of comparable complexity. A systematic ablation study validates the monotonic contribution of each component.

The third contribution (Chapter 4) extends emotion intelligence to MOOC recommendation. A hybrid sentiment-aware system is developed that combines TF-IDF-based sentiment classification (Extra Trees, F1 = 90.2%) with collaborative filtering and hard sentiment filtering. Tested on Coursera Reviews, the system achieves an 11% improvement in F1-score over standard CF and a user satisfaction score of 4.45/5 versus 3.65 for the baseline, confirming the value of affective signals for personalized learning recommendations.

Together, these contributions advance lightweight, interpretable, and emotionally aware AI systems, laying the foundations for a future multimodal framework integrating speech, text, and facial expression modalities.

Mots clés: Speech Emotion Recognition ,CNN-LSTM , BiLSTM ,Attention Mechanism , Feature Fusion ,Sentiment Analysis ,MOOC Recommendation , Affective Computing , Deep Learning.