A New Contrastive Learning-Based Vision Transformer for Sentiment Analysis Using Scene Text Images

Article Type

Research Article

Publication Title

International Journal of Pattern Recognition and Artificial Intelligence

Abstract

Sentiment analysis using scene text images is complex and challenging because it has an arbitrary background, and the method should rely on only visual features. Unlike most existing methods that use either text or images or both, this study uses only scene text images for sentiment analysis. The intuition to use only scene text images is that sometimes users express their feelings and emotions or convey their messages by writing text in different shapes with diverse background designs. It is noted that the existing methods ignore such vital cues for sentiment analysis. This work explores a vision transformer to extract visual features that represent contextual information about the appearance of the text image. Further, to strengthen the visual features, the proposed work introduces contrastive learning which maximizes the gap between inter-classes and minimizes the gap between intra-classes of positive, negative, and neutral. To demonstrate the effectiveness of the proposed method, it is tested on our own constructed dataset and benchmark dataset. A comparative study of our method with the existing method shows the proposed method is superior in the classification of positive, negative, and neutral scene text images.

DOI

10.1142/S0218001424520293

Publication Date

12-30-2024

Share

COinS