Recurrent Neural Network Based Information Extraction for Cancer Genetics.

Date of Submission

December 2015

Date of Award

Winter 12-12-2016

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)


Garain, Utpal (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

With the burgeoning load of scholarly articles related to Cancer Genetics being made available every year, its the need of the hour that a robust system be developed in order to extract information from these article so that the actual want of the user, expressed via his/her query, can be processed accordingly and the articles presented have a high relevance to the user. While being similar in nature to other Information Extraction task in the BioNLP domain, the CG extraction task has to be generalized across events capturing interactions between entities across the entire biological hierarchy from simple chemical to organism. This paper is invested in exploring the design and implementation of a supervised learning based sequence classification technique to advance the automatic extraction of information(events and arguments) from statements on the biological processes. The paper discusses the use of efficient word embeddings in vector space via distributed representation [Mikolov et al., 2013a] [Mikolov et al., 2013b] on the pre-processed CG corpus and derive semantic relation between words, which is later fed as input to Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMN) [Hochreiter and Schmidhuber, 1997] for extracting information (viz. events and its arguments), and the results are compared to the current state-of-the-art techniques


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.