A Novel Infogain and Multi-Axial Wavelet-Based Transformer for Personality Trait Question Answering

Article Type

Research Article

Publication Title

International Journal of Pattern Recognition and Artificial Intelligence

Abstract

Visual Question Answering (VQA) is one of the attractive topics in the field of multimedia, affective, and empathic computing to garner user interest. Unlike existing models which aim at addressing challenges of VQA for the scene images, this work aims at developing a new model for Personality Trait Question Answering (PQA). It uses Twitter account information, which includes shared images, profile pictures, banners, text in the images, and descriptions of the images. Motivated by the accomplishments of the transformer, for encoding visual features of the images, a new InfoGain Multi-Axial Wavelet Vision Transformer (IgMaWaViT) is explored here. For encoding textual features in the images and descriptions, a new Information Gain BERT (InfoBert) method is introduced, which can handle the variable length encoding of text by choosing the optimal discriminator. Furthermore, the model fuses encodings of images and text according to the questions on different personality traits for question answering. The model is called InfoGain Multi-Axial Wavelet Vision Transformer for Personality Traits Question Answering (IgMaWaViT-PQA). To validate the efficacy of the proposed model, a dataset has been constructed, and it is used along with standard datasets for experimentation. Comprehensive experiments show that the proposed model is better than the state-of-the-art models. The code is available at the link: https://github.com/biswaskunal29/InfoGain_MultiAxial_PQA.

DOI

10.1142/S0218001424510236

Publication Date

1-1-2025

Share

COinS