Fine-Grained Attribute-Object Feature Representation in Compositional Zero-Shot Learning

Document Type

Conference Article

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

Compositional Zero-Shot Learning (CZSL) is designed to recognize unobserved (unseen) compositions of given objects (guava, orange, pear, etc.) and their states (sliced, peeled, ripe, etc.). The CZSL is challenging because it is sometimes difficult to separate the visual aspects of objects and states from their context in the images. In addition, the detailing feature of a state may vary considerably depending on its composition. For instance, the state peeled displays distinct visual characteristics in the peeled apple and peeled guava compositions. Existing research uses linguistic supervision and word embeddings to better segment and composes attribute-object relationships for recognition. We emphasize the visual embedding space and propose a Fine-grained Compositional Learning (FgCL) method capable of separating attributes from object features. We integrate visual fine-grained and Siamese-based features with word embedding into a shared embedding space that is representative of unseen compositions to learn our model more effectively. Extensive experiments are conducted and demonstrate a significant improvement over existing work (SymNet, TMN) on two benchmark datasets: MIT-States & UT-Zappos50K.

First Page

157

Last Page

165

DOI

10.1007/978-3-031-45170-6_17

Publication Date

1-1-2023

This document is currently not available here.

Share

COinS