Multilingual Hope Speech Detection for Low-Resource Languages

This project addresses the escalating issue of offensive content on social media platforms, stemming from the exponential growth in user-generated content. Despite the initial creative intent of social media, the prevalence of harmful material poses a significant challenge. In response to this, the project introduces a multilingual hope speech detection framework designed to identify and promote positive and inspiring content amidst the vast volume of online data.

HopeCap utilizes a comprehensive approach, integrating a unique custom capsule network architecture with transformer encoders. This method incorporates various linguistic analysis components, including word-level attention, Bidirectional LSTM (BiLSTM), and a Capsule layer. The framework is adaptable to multiple languages and extends its scope by incorporating translated and transliterated data, enabling cross-language analysis.

The project emphasizes the significance of linguistic diversity in fostering positive communication across different cultural contexts. By fusing outputs from the original, translated, and transliterated data branches, HopeCap provides a robust framework for detecting hope speech, contributing valuable insights to the field.

The study acknowledges the challenges in the linguistic landscape, particularly in languages like Tamil and Malayalam, emphasizing the need for effective training methodologies and comprehensive datasets. The scarcity of resources for Indic languages poses a significant hurdle, requiring collaborative efforts to build linguistic resources. Preserving contextual richness in numerical representations, accounting for spelling variations, and addressing code-mixing challenges are highlighted as essential considerations in the development of accurate hope speech detection models.