Isolating Features of Object and Its State for Compositional Zero-Shot Learning

Article Type

Research Article

Publication Title

IEEE Transactions on Emerging Topics in Computational Intelligence

Abstract

The purpose of Compositional Zero Shot Learning (CZSL) is to recognize previously unseen compositions of known objects (e.g. apple, banana) and their states (e.g. ripe, unripe) as seen in an image. The CZSL is a challenging problem as it is difficult to isolate the visual features of object and its states from its compositions in images. The features of a state may have wide variation across different compositions. For example, the state sliced has different visual features in compositions sliced apple and sliced tomato. In this paper, we attempt to solve the problem of CZSL using a two-stage recognition approach. Each stage sequentially performs recognition task utilising two distinct modalities of compositions. The modalities are image features as well as textual features representing features of objects and states respectively. We propose a novel gradient regularized loss term for better disentanglement of object and state features from the visual features of the composition. An appropriate disentanglement of the features of visual primitives (states and objects) leads to correct identification of images of unseen state-object compositions. The proposed approach and the competing methods are evaluated on three benchmark datasets, MIT States, UT-Zappos50 k and CGQA. Our extensive experiments establish the efficacy of our proposed algorithm that outperforms other state-of-the-art approaches.

First Page

1571

Last Page

1583

DOI

https://10.1109/TETCI.2022.3232816

Publication Date

10-1-2023

This document is currently not available here.

Share

COinS