An Analysis of Transformer-based Models for Code-mixed Conversational Hate-speech Identification

Document Type

Conference Article

Publication Title

CEUR Workshop Proceedings

Abstract

The current surge in social media usage has resulted in the widespread availability of harmful and hateful content. Such inflammatory content identification in social media is a crucial NLP problem. Recent research has repeatedly demonstrated that context-level semantics matter more than word-level semantics for assessing the existence of hate content. This paper investigates many state-of-the-art transformer-based models for hate content detection in code-mixed datasets. We emphasize transformer-based models since they capture context-level semantics. In particular, we concentrate on Google-MuRIL, XLM-Roberta-base, and Indic-BERT. Additionally, we have experimented with an ensemble of the three mentioned models. Based on substantial empirical evidence, we observe that Google-MuRIL emerges as the top model with macro F1-scores of 0.708 and 0.445 for HASOC shared tasks 1 and 2, placing us 1st and 6tℎ on the overall leaderboard standings respectively.

First Page

513

Last Page

521

Publication Date

1-1-2022

This document is currently not available here.

Share

COinS