Using word embeddings for information retrieval: How collection and term normalization choices affect performance
Document Type
Conference Article
Publication Title
International Conference on Information and Knowledge Management, Proceedings
Abstract
Neural word embedding approaches, due to their ability to capture semantic meanings of vocabulary terms, have recently gained attention of the information retrieval (IR) community and have shown promising results in improving ad hoc retrieval performance. It has been observed that these approaches are sensitive to various choices made during the learning of word embeddings and their usage, often leading to poor reproducibility. We study the effect of varying following two parameters, viz., i) the term normalization and ii) the choice of training collection, on ad hoc retrieval performance with word2vec and fastText embeddings. We present quantitative estimates of similarity of word vectors obtained under different settings, and use embeddings based query expansion task to understand the effects of these parameters on IR effectiveness.
First Page
1835
Last Page
1838
DOI
10.1145/3269206.3269277
Publication Date
10-17-2018
Recommended Citation
Roy, Dwaipayan; Ganguly, Debasis; Bhatia, Sumit; Bedathur, Srikanta; and Mitra, Mandar, "Using word embeddings for information retrieval: How collection and term normalization choices affect performance" (2018). Conference Articles. 47.
https://digitalcommons.isical.ac.in/conf-articles/47