Baseline BERT models for Conversational Hate Speech Detection in Code-mixed tweets utilizing Data Augmentation and Offensive Language Identification in Marathi
Document Type
Conference Article
Publication Title
CEUR Workshop Proceedings
Abstract
In today’s world, social media plays a vital role in spreading hate towards a person or group based on their color, caste, sex, sexual orientation, political differences, etc. Most of the work is done on a single tweet or comment classification, which lacks the conversation’s context. The tweet, corresponding comments, and reply often helps us understand the context of the entire discussion. This paper discusses the used system and the performance of the team CITK_ISI on the first available code-mixed dataset on Hindi-English and German conversation scrapped from Twitter. Data augmentation is used with a baseline transfer-based BERT model and achieved a macro F1 score of 0.6653 for ICHCL Hinglish and German codemix binary classification. The system also identifies hate speech and offensive language in Marathi, a binary classification that secures a macro F1 score of 0.9019.
First Page
563
Last Page
574
Publication Date
1-1-2022
Recommended Citation
Ghosh, Koyel; Senapati, Apurbalal; and Garain, Utpal, "Baseline BERT models for Conversational Hate Speech Detection in Code-mixed tweets utilizing Data Augmentation and Offensive Language Identification in Marathi" (2022). Conference Articles. 425.
https://digitalcommons.isical.ac.in/conf-articles/425