Isolating Features of Object and Its State for Compositional Zero-Shot Learning
Article Type
Research Article
Publication Title
IEEE Transactions on Emerging Topics in Computational Intelligence
Abstract
The purpose of Compositional Zero Shot Learning (CZSL) is to recognize previously unseen compositions of known objects (e.g. apple, banana) and their states (e.g. ripe, unripe) as seen in an image. The CZSL is a challenging problem as it is difficult to isolate the visual features of object and its states from its compositions in images. The features of a state may have wide variation across different compositions. For example, the state sliced has different visual features in compositions sliced apple and sliced tomato. In this paper, we attempt to solve the problem of CZSL using a two-stage recognition approach. Each stage sequentially performs recognition task utilising two distinct modalities of compositions. The modalities are image features as well as textual features representing features of objects and states respectively. We propose a novel gradient regularized loss term for better disentanglement of object and state features from the visual features of the composition. An appropriate disentanglement of the features of visual primitives (states and objects) leads to correct identification of images of unseen state-object compositions. The proposed approach and the competing methods are evaluated on three benchmark datasets, MIT States, UT-Zappos50 k and CGQA. Our extensive experiments establish the efficacy of our proposed algorithm that outperforms other state-of-the-art approaches.
First Page
1571
Last Page
1583
DOI
https://10.1109/TETCI.2022.3232816
Publication Date
10-1-2023
Recommended Citation
Panda, Aditya; Santra, Bikash; and Mukherjee, Dipti Prasad, "Isolating Features of Object and Its State for Compositional Zero-Shot Learning" (2023). Journal Articles. 3568.
https://digitalcommons.isical.ac.in/journal-articles/3568