Annihilate Hates (Task 4, HASOC 2023): Hate Speech Detection in Assamese, Bengali, and Bodo languages
Document Type
Conference Article
Publication Title
CEUR Workshop Proceedings
Abstract
In today’s world, social media can act as a tool for spreading hate towards a person or group based on their color, caste, sex, sexual orientation, political differences, etc. As social media continues to expand, the proliferation of hate speech is also surging at an alarming rate. Recently, Research on identifying hate speech in social media has gained significant prominence, with a specific need for investigations focused on languages other than English. The HASOC (Hate Speech and Offensive Content Identification) track intends to provide a platform for Hate Speech Detection since 2019 at FIRE (Forum for Information Retrieval Evaluation). HASOC 2023 is coordinating four tasks, with AH (Annihilate Hates, Task 4) being one of them. The AH task aims to develop and assess supervised machine learning systems on the three datasets. The three datasets presented for hate speech in three Indian languages (Assamese, Bengali, and Bodo) are collected from ™YouTube and ™Facebook comments. Each dataset is tagged with the binary classification (hate or non-hate) labels. In the Assamese language, 20 teams made 180 submissions, while 21 teams submitted 214 entries in the Bengali language, and for the Bodo language, 19 teams submitted a total of 175 submissions. The performance of the best classifiers for Assamese, Bengali, and Bodo are measured with the Macro F1 score of 0.73, 0.77, and 0.85, respectively. This article briefly summarizes the tasks, data development, and results. The variant of BERT architecture achieved the best performance in the task. However, other systems have also been successfully applied to the task.
First Page
368
Last Page
382
Publication Date
1-1-2023
Recommended Citation
Ghosh, Koyel; Senapati, Apurbalal; and Pal, Aditya Shankar, "Annihilate Hates (Task 4, HASOC 2023): Hate Speech Detection in Assamese, Bengali, and Bodo languages" (2023). Conference Articles. 541.
https://digitalcommons.isical.ac.in/conf-articles/541