Using word embeddings for information retrieval: How collection and term normalization choices affect performance

Document Type

Conference Article

Publication Title

International Conference on Information and Knowledge Management, Proceedings

Abstract

Neural word embedding approaches, due to their ability to capture semantic meanings of vocabulary terms, have recently gained attention of the information retrieval (IR) community and have shown promising results in improving ad hoc retrieval performance. It has been observed that these approaches are sensitive to various choices made during the learning of word embeddings and their usage, often leading to poor reproducibility. We study the effect of varying following two parameters, viz., i) the term normalization and ii) the choice of training collection, on ad hoc retrieval performance with word2vec and fastText embeddings. We present quantitative estimates of similarity of word vectors obtained under different settings, and use embeddings based query expansion task to understand the effects of these parameters on IR effectiveness.

First Page

1835

Last Page

1838

DOI

10.1145/3269206.3269277

Publication Date

10-17-2018

This document is currently not available here.

Share

COinS