An improved test collection and baselines for bibliographic citation recommendation

Document Type

Conference Article

Publication Title

International Conference on Information and Knowledge Management, Proceedings


The problem of recommending bibliographic citations to an author who is writing an article has been well-studied. However, different researchers have used different datasets to evaluate proposed techniques, and have sometimes reported contradictory findings regarding the relative effectiveness of various approaches. In addition, these datasets are problematic in one way or another (e.g., in terms of size or availability), precluding the possibility of adopting one (or some) of them as standard benchmarks. A recently created test collection that makes use of data from CiteSeerx is large, heterogenous, and publicly available, but has certain other limitations. In this paper, we propose a way to modify this test collection to address these limitations. We also use the improved test collection to establish a set of baseline results using elementary content-based techniques, as well as reference directed indexing.

First Page


Last Page




Publication Date


This document is currently not available here.