From a "scholarly big dataset" to a test collection for bibliographic citation recommendation
Document Type
Conference Article
Publication Title
AAAI Workshop - Technical Report
Abstract
The problem of designing recommender systems for scholarly article citations has been actively researched with more than 200 publications appearing in the last two decades. In spite of this, no definitive results are available about what approaches work best. Arguably the most important reason for this lack of consensus is the dearth of standardised test collections and evaluation protocols, such as those provided by TREC-like forums. CiteSeerx, a "scholarly big dataset" has recently become available. However, this collection provides only the raw material that is yet to be moulded into Cranfield style test collections. In this paper, we discuss the limitations of test collections used in earlier work, and describe how we used CiteSeerx to design a test collection with a well-defined evaluation protocol. The collection consists of over 600,000 research papers and over 2,500 queries. We report some preliminary experimental results using this collection, which are indicative of the performance of elementary content-based techniques. These experiments also made us aware of some shortcomings of CiteSeerx itself.
First Page
705
Last Page
710
Publication Date
1-1-2016
Recommended Citation
Roy, Dwaipayan; Ray, Kunal; and Mitra, Mandar, "From a "scholarly big dataset" to a test collection for bibliographic citation recommendation" (2016). Conference Articles. 718.
https://digitalcommons.isical.ac.in/conf-articles/718