A new multimodal sentiment analysis for images containing textual information

Article Type

Research Article

Publication Title

Multimedia Tools and Applications

Abstract

Multimodal sentiment analysis on images with textual content is a research area aiming to understand the sentiment conveyed by visual and textual elements in the images. While multimodal sentiment analysis on images and text (reviews) has its own challenges, the combination of textual and visual content in the form of images presents new challenges as well as opportunities. In this research work, we proposed a multimodal sentiment analysis method that works on images incorporating textual elements. In the textual sentiment analysis model, we initially employed a recognition system to extract textual data from input images. Our proposed multimodal method is based on transfer learning, considering two pre-trained deep learning models, Xception, and RoBERTa, to extract features from both visual and textual content from multimedia images. We then implemented a fusion strategy to combine these two modalities (Visual Sentiment Analysis (VSA) and Textual Sentiment Analysis (TSA)) to enhance the accuracy of the proposed method and to provide a more comprehensive understanding of sentiment in multimedia content. In addition, we curated a custom dataset comprising images with associated text labels and sentiments. To ensure accurate labels, we conducted human evaluations involving thirty annotators. Our dataset includes images labeled with negative, neutral, and positive sentiments. Experimental results demonstrated the effectiveness of combining visual and textual features for sentiment analysis. The findings from this research hold promising implications for real-world applications, such as sentiment analysis in social media, product reviews, and marketing campaigns, where both images and text play a significant role in conveying emotional context.

DOI

10.1007/s11042-024-19999-8

Publication Date

1-1-2024

Share

COinS