Master’s Dissertations

Study of Neural Learning in Text Processing.

Debjyoti Paul, Indian Statistical InstituteFollow

Date of Submission

December 2016

Date of Award

Winter 12-12-2017

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)

Supervisor

Garain, Utpal (CVPR-Kolkata; ISI)

Abstract (Summary of the Work)

This study deals with the exploration of different Neural Learning frameworks in Natural Language Processing and Information Retrieval. Distributed neural language model Word2Vec has been reported to provide elegant word embedding as they capture semantic and syntactic information. Recent studies have also shown that such feature embedding coupled with various Neural Network models have been able to set new benchmarks in various problems of text processing. The aim of this research is to study different neural models and the word embedding framework and explore about their effectiveness and limitations in different challenges in text processing. Three problems have been explored in this study are (i)Learning document embedding from word embedding and analyzing itâ€™s effectiveness in document classification (ii) Automatic query expansion using neural word embedding (iii) Biomedical information extraction for Cancer Genetics. Effective use of neural framework for learning document representation for document classification is challenging as existing techniques performs remarkably well and also, the extension from word embedding model is not straightforward. Our study has found that learning such document embedding doesnâ€™t yield to any advantage in document classification when compared with naive Term FrequencyInverse Document Frequency embedding. Semantically related term can be obtained by finding the most similar terms to the query terms using word embedding. In the second problem, Query expansion using such semantically K- nearest neighbor term in the vocabulary do help in improving the result over the baseline retrieval using language model. But it is found that, query expansion for ad-hoc retrieval requires terms to be occurring with high frequency in the relevant documents along with query terms, in addition to terms which are semantically related. But query expansion using word embedding fails to include terms which co-occurs with high frequency anywhere in the relevant document as Word2Vec model measures co-occurrence in a limited context window. Our third problem, Biomedical information extraction essentially requires identification of events and finding relation among events and entities. It is found that word embedding is extremely useful in biomedical relation extraction. Also neural architecture like Convolutions neural network provide superior result in event identification. We propose a parser architecture for biomedical document concerning cancer genetics, using neural architecture and word vector as feature. The parser outperforms the state of the art results.

Comments

ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843089

Control Number

ISI-DISS-2016-351

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

DOI

http://dspace.isical.ac.in:8080/jspui/handle/10263/6511

Recommended Citation

Paul, Debjyoti, "Study of Neural Learning in Text Processing." (2017). Master’s Dissertations. 76.
https://digitalcommons.isical.ac.in/masters-dissertations/76

This document is currently not available here.

COinS

Master’s Dissertations

Study of Neural Learning in Text Processing.

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Browse

Search

Author Corner

Links

Master’s Dissertations

Study of Neural Learning in Text Processing.

Author (Researcher Name)

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Comments

Control Number

Creative Commons License

DOI

Recommended Citation

Share

Browse

Search

Author Corner

Links