Using word embeddings for information retrieval: How collection and term normalization choices affect performance
International Conference on Information and Knowledge Management, Proceedings
Neural word embedding approaches, due to their ability to capture semantic meanings of vocabulary terms, have recently gained attention of the information retrieval (IR) community and have shown promising results in improving ad hoc retrieval performance. It has been observed that these approaches are sensitive to various choices made during the learning of word embeddings and their usage, often leading to poor reproducibility. We study the effect of varying following two parameters, viz., i) the term normalization and ii) the choice of training collection, on ad hoc retrieval performance with word2vec and fastText embeddings. We present quantitative estimates of similarity of word vectors obtained under different settings, and use embeddings based query expansion task to understand the effects of these parameters on IR effectiveness.
Roy, Dwaipayan; Ganguly, Debasis; Bhatia, Sumit; Bedathur, Srikanta; and Mitra, Mandar, "Using word embeddings for information retrieval: How collection and term normalization choices affect performance" (2018). Conference Articles. 47.