Conference Articles

Baseline BERT models for Conversational Hate Speech Detection in Code-mixed tweets utilizing Data Augmentation and Offensive Language Identification in Marathi

Koyel Ghosh, Central Institute of Technology
Apurbalal Senapati, Central Institute of Technology
Utpal Garain, Indian Statistical Institute, Kolkata

Document Type

Conference Article

Publication Title

CEUR Workshop Proceedings

Abstract

In today’s world, social media plays a vital role in spreading hate towards a person or group based on their color, caste, sex, sexual orientation, political differences, etc. Most of the work is done on a single tweet or comment classification, which lacks the conversation’s context. The tweet, corresponding comments, and reply often helps us understand the context of the entire discussion. This paper discusses the used system and the performance of the team CITK_ISI on the first available code-mixed dataset on Hindi-English and German conversation scrapped from Twitter. Data augmentation is used with a baseline transfer-based BERT model and achieved a macro F1 score of 0.6653 for ICHCL Hinglish and German codemix binary classification. The system also identifies hate speech and offensive language in Marathi, a binary classification that secures a macro F1 score of 0.9019.

First Page

563

Last Page

574

Publication Date

1-1-2022

Recommended Citation

Ghosh, Koyel; Senapati, Apurbalal; and Garain, Utpal, "Baseline BERT models for Conversational Hate Speech Detection in Code-mixed tweets utilizing Data Augmentation and Offensive Language Identification in Marathi" (2022). Conference Articles. 425.
https://digitalcommons.isical.ac.in/conf-articles/425

This document is currently not available here.

COinS

Conference Articles

Baseline BERT models for Conversational Hate Speech Detection in Code-mixed tweets utilizing Data Augmentation and Offensive Language Identification in Marathi

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

Baseline BERT models for Conversational Hate Speech Detection in Code-mixed tweets utilizing Data Augmentation and Offensive Language Identification in Marathi

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

Publication Date

Recommended Citation

Share

Browse

Search

Author Corner

Links