Student : KM Mokhethi
About the student
I am a postgraduate Computer Science student specializing in Artificial Intelligence, with research interests in generative modelling, data communications, and ethical aspects of IT. Alongside my studies, I have professional experience as a full-stack integration developer, applying advanced problem-solving and system design skills to both academic research and real-world industry projects.
About the Project
This project focuses on the development of a Gender-Based Violence (GBV) Early-Warning System that leverages Natural Language Processing (NLP) and machine learning techniques to classify different forms of GBV expressed in textual data. The primary objective is to evaluate and compare the effectiveness of multiple models in accurately categorizing GBV-related content into five categories: sexual violence, physical violence, emotional violence, economic violence, and harmful traditional practices. Three distinct approaches were implemented: Latent Dirichlet Allocation (LDA) for topic modeling, Multinomial Naive Bayes (MNB) as a traditional supervised classifier, and DistilBERT, a transformer-based deep learning model known for its contextual understanding of language. The project involved data preprocessing, feature engineering, and splitting datasets into training and testing sets to ensure reliable evaluation. The models were assessed using standard classification metrics, including accuracy, macro-F1 score, and weighted-F1 score, to provide a fair comparison across both balanced and imbalanced datasets. Visualizations such as grouped bar charts and radar plots were employed to analyze performance across the five GBV categories. The study demonstrates that while LDA and MNB provide lightweight, interpretable solutions, they often struggle with contextual nuances, leading to misclassification. In contrast, DistilBERT consistently outperformed the other methods, showing superior ability to capture semantic meaning and classify subtle forms of GBV. By comparing these approaches, the project highlights the trade-offs between classical machine learning methods and modern transformer-based architectures. The results suggest that transformer models, despite requiring more computational resources, are significantly better suited for real-world deployment in GBV detection systems. This research contributes to the advancement of AI-driven tools for social good, providing a foundation for scalable early-warning mechanisms that can support intervention strategies and help address one of society’s most pressing challenges.
