NLP-IISERB@Simpletext2022: To Explore the Performance of BM25 and Transformer Based Frameworks for Automatic Simplification of Scientific Texts

Document Type

Conference Article

Publication Title

CEUR Workshop Proceedings

Abstract

CLEF SimpleText 2022 lab focuses on developing effective systems to identify relevant passages from a given set of scientific articles. The lab has organized three tasks this year. Task 1 is focused on passage retrieval from the given data for a query text. These passages can be complex and hence require further simplification to be carried out in tasks 2 and 3. The BioNLP research group at the Indian Institute of Science Education and Research Bhopal (IISERB) in collaboration with two different information retrieval research groups at IISER Kolkata and ISI Kolkata participated only in Task 1 of this challenge and submitted three runs using three different retrieval models. The paper explores the performance of these retrieval models for the given task. We used a standard BM25 model as our first run to identify 1000 relevant passages for each query. Moreover, the passages for each query were ranked based on their similarity scores generated by the BM25 model. For our second run, we used a BERT (Bidirectional Encoder Representations from Transformers) based re-ranking method, called as Mono-BERT to further rank the 1000 passages retrieved by our first run for each query. A pre-trained sequence to sequence model based re-ranking method, called MonoT5 was used as our third run to reorder the 1000 passages retrieved by the Mono-BERT model for each query. As the official results of this task are not yet announced, we cannot explore the performance of our submissions. However, we have manually checked the retrieved results of many queries for each run, which indicate that the performance improved from run 1 to run 2 and further to run 3.

First Page

2852

Last Page

2857

Publication Date

1-1-2022

This document is currently not available here.

Share

COinS