Fine-Grained Attribute-Object Feature Representation in Compositional Zero-Shot Learning
Document Type
Conference Article
Publication Title
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
Compositional Zero-Shot Learning (CZSL) is designed to recognize unobserved (unseen) compositions of given objects (guava, orange, pear, etc.) and their states (sliced, peeled, ripe, etc.). The CZSL is challenging because it is sometimes difficult to separate the visual aspects of objects and states from their context in the images. In addition, the detailing feature of a state may vary considerably depending on its composition. For instance, the state peeled displays distinct visual characteristics in the peeled apple and peeled guava compositions. Existing research uses linguistic supervision and word embeddings to better segment and composes attribute-object relationships for recognition. We emphasize the visual embedding space and propose a Fine-grained Compositional Learning (FgCL) method capable of separating attributes from object features. We integrate visual fine-grained and Siamese-based features with word embedding into a shared embedding space that is representative of unseen compositions to learn our model more effectively. Extensive experiments are conducted and demonstrate a significant improvement over existing work (SymNet, TMN) on two benchmark datasets: MIT-States & UT-Zappos50K.
First Page
157
Last Page
165
DOI
10.1007/978-3-031-45170-6_17
Publication Date
1-1-2023
Recommended Citation
Shabbir, Nazir; Rout, Ranjeet Kr; Umer, Saiyed; and Mohanta, Partha Pratim, "Fine-Grained Attribute-Object Feature Representation in Compositional Zero-Shot Learning" (2023). Conference Articles. 549.
https://digitalcommons.isical.ac.in/conf-articles/549