Dataset of scholarly publications for empirical evaluation of research productivity and trends of the ISI over two decades (1991-2010)




Quantification of research performance is an obvious necessity of scientific institutions and enterprises for many academic pursuits. Scientometric measurements are indeed recognized as an indispensable tool for evidential judgment of research activities performed by an individual or institute as well. Scholarly publications are the most acceptable basis for evaluating research productivity, often combined with citation counts. This dataset enumerates quantifiable characteristics of scholarly publications of the Indian Statistical Institute (ISI) from 1991 to 2010. It provides thorough documentation of the publications along with their citations for mapping the research of ISI over two distinct decades. It also presents the publications in different dimensions for empirical evaluation of research productivity and trends correlating to the citation impact of the Institute. The exploration of this publication dataset was time-consuming and made rigorously. Scholarly publications having (at least) an author affiliation of ISI (in the by-line) that appeared during the period were first retrieved from numerous sources (viz. Scopus®, Web of Science™, MathSciNet®, EconLit), then converted and captured in MS-Excel format. The technique of data conversion (from BibTeX to Excel, through CDS/ISIS using Fangorn) was extremely helpful in capturing large amounts of data at the least possible time. However, strenuous efforts were made for data filtration and validation through annual reports of the Institute. Validation of the dataset was no doubt a tedious job, but it gave me an amazing experience on how to make the data elements findable and accessible using robust techniques. A tiresome job was performed to count the citations (as in December 2017) for measuring the academic influence of ISI publications through Google Scholar, Scopus®, and Web of Science™. The empirical dataset (comprising 7188 records) was finally consolidated and organized systematically for sharing and reuse. It includes ‘raw data’ prepared for the doctoral dissertation work. A detailed description of the data collection (i.e. identification, gathering, conversion, filtering, validation, and consolidating) is provided in the Dissertation work (Chapter-4). It has also been used by the researcher in some of his articles. The dataset is, therefore, authoritative for conducting studies on different aspects of evaluative scientometrics. The author firmly believes that potential researchers would find benefits from this interesting dataset, and will be used (re-used) to further widen the coverage of research to develop better insights.

Publication Title

Mendeley Data (Elsevier)

Document Type


Subject Category

Library and Information Science

Specific Subject

Quantitative study, Research evaluation, Scientometrics, Bibliometrics





Application Tool


Data Type




Publication Date

Winter 11-26-2019

Steps to reproduce

The dataset can be downloaded in CSV (Comma Separated Values, as plain text) and XLSX (Microsoft Excel worksheet, as binary) format for academic/research purposes. Besides the dataset, two separate sheets of ‘guide-to-use’ and ‘authorship counting chart’ have been provided in the Excel file. It can’t be used to relate to any personal treatment or psychosocial intervention. The dataset, however, refers to a statement that does not allow third parties (except academic researchers) to use it purposefully and without prior consultation with the Author. Indeed no part of this dataset can be used anyway ignoring the spirit of the researcher and subsequent academic interest therein.

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.