An Analysis of Transformer-based Models for Code-mixed Conversational Hate-speech Identification
Document Type
Conference Article
Publication Title
CEUR Workshop Proceedings
Abstract
The current surge in social media usage has resulted in the widespread availability of harmful and hateful content. Such inflammatory content identification in social media is a crucial NLP problem. Recent research has repeatedly demonstrated that context-level semantics matter more than word-level semantics for assessing the existence of hate content. This paper investigates many state-of-the-art transformer-based models for hate content detection in code-mixed datasets. We emphasize transformer-based models since they capture context-level semantics. In particular, we concentrate on Google-MuRIL, XLM-Roberta-base, and Indic-BERT. Additionally, we have experimented with an ensemble of the three mentioned models. Based on substantial empirical evidence, we observe that Google-MuRIL emerges as the top model with macro F1-scores of 0.708 and 0.445 for HASOC shared tasks 1 and 2, placing us 1st and 6tℎ on the overall leaderboard standings respectively.
First Page
513
Last Page
521
Publication Date
1-1-2022
Recommended Citation
Singh, Neeraj Kumar and Garain, Utpal, "An Analysis of Transformer-based Models for Code-mixed Conversational Hate-speech Identification" (2022). Conference Articles. 424.
https://digitalcommons.isical.ac.in/conf-articles/424