Towards Efficient Sentiment Analysis in Moroccan Darija with Modern Deep Learning Models and Advanced Text Representations
Par : Madame AMNAY Meriem
Le lundi 15 juin 2026 à 12:00
Le Doyen de la Faculté des Sciences et Techniques de Béni Mellal porte à la connaissance du public que Madame AMNAY Meriem, soutiendra une thèse de Doctorat intitulée : «Towards Efficient Sentiment Analysis in Moroccan Darija with Modern Deep Learning Models and Advanced Text Representations».
La soutenance publique aura lieu le Lundi 15 Juin 2026 à 13h00 à l’Ecole Nationale des Sciences Appliquées de Khouribga, devant le jury composé de :
Monsieur Noureddine ABOUTABIT : Professeur, École Nationale des Sciences Appliquée, Université Sultan Moulay Slimane, Khouribga, Président ;
Monsieur Abdelghafour ATLAS : Professeur, École Nationale des Sciences Appliquée, Université Cadi Ayyad, Marakech, Rapporteur ;
Monsieur Nidal LAMGHARI : Maître de Conférences Habilité, École Nationale des Sciences Appliquée, Université Sultan Moulay Slimane, Khouribga, Rapporteur ;
Madame Nassima SOUSSI : Maître de Conférences Habilité, École Nationale des Sciences Appliquée, Université Sultan Moulay Slimane, Khouribga, Rapporteure ;
Monsieur Youssef EL HADFI : Maître de Conférences Habilité, École Nationale des Sciences Appliquées, Université Sultan Moulay Slimane, Khouribga, Examinateur ;
Monsieur Imad HAFIDI : Professeur, École Nationale des Sciences Appliquée, Université Sultan Moulay Slimane, Khouribga, Directeur de thèse .
Résumé:
Sentiment analysis in Moroccan Arabic (Darija) presents significant challenges due to the scarcity of annotated datasets, the strong linguistic variability of the dialect, and the informal characteristics of user-generated content. This work explores how advanced representation learning techniques and adaptive data annotation strategies can enhance classification performance while minimizing labeling costs. The study begins with an overview of the evolution of Natural Language Processing, emphasizing the fundamental role of text representation methods, from traditional vectorbased approaches to modern contextual embedding models. A comparative experimental analysis is then conducted between conventional representations, such as TF-IDF and static word embeddings, and transformer-based contextual embeddings. These representations are evaluated using multiple classifiers and standard performance metrics. The findings demonstrate the clear advantage of contextual embeddings in modeling the semantic richness and variability of dialectal and noisy texts. To address the limited availability of labeled data, an active learning framework integrated with transformer architectures is proposed. By iteratively selecting the most informative samples for annotation, the approach significantly reduces labeling effort while maintaining strong predictive performance. Furthermore, the framework is extended to incorporate reinforcement learning, enabling an adaptive optimization of the sample selection strategy across iterations. Experimental results on Moroccan Darija sentiment analysis show that this hybrid active–reinforcement learning approach achieves balanced improvements in precision, recall, F1-score, and accuracy compared to both traditional supervised learning and standard active learning strategies, while requiring substantially fewer annotated examples. Overall, this research delivers a structured evaluation and a robust methodological pipeline for sentiment analysis in Moroccan Darija, demonstrating the combined effectiveness of contextual representations, active learning, and reinforcement learning in low-resource dialectal NLP settings.
Keywords: Sentiment Analysis; Moroccan Darija; Transformer Models; Contextual Embeddings; Active Learning; Reinforcement Learning; Natural Language Processing.