Conference Articles

Retrievability of code mixed microblogs

Debasis Ganguly, Dublin City University
Mandar Mitra, Indian Statistical Institute, Kolkata
Ayan Bandyopadhyay, Indian Statistical Institute, Kolkata
Gareth J.F. Jones, Dublin City University

Document Type

Conference Article

Publication Title

SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

Abstract

Mixing multiple languages within the same document, a phenomenon called (linguistic) code mixing or code switching, is a frequent trend among multilingual users of social media. In the context of information retrieval (IR), code mixing may affect retrieval effectiveness due to the mixing of different vocabularies with different collection statistics within a single collection of documents. In this paper, we investigate the indexing and retrieval strategies for a mixed collection of documents, comprising of code-mixed and the monolingual documents. In particular, we address three alternative modes of indexing, namely (a) a single index for the two sub-collections; (b) a separate index for each sub-collection; and (c) a clustered index with two individual sub-collection statistics coupled with the overall one. We make use of the expected retrievability scores of the two classes of documents to empirically show that indexing strategies (a) and (b) mostly retrieve the monolingual documents at top ranks with standard retrieval approaches. Our experiments show that, by contrast, the clustered index (c) is able to alleviate this problem by improving the retrievability of the code-mixed documents.

First Page

973

Last Page

976

DOI

10.1145/2911451.2914727

Publication Date

7-7-2016

Comments

Open Access; Green Open Access

Recommended Citation

Ganguly, Debasis; Mitra, Mandar; Bandyopadhyay, Ayan; and Jones, Gareth J.F., "Retrievability of code mixed microblogs" (2016). Conference Articles. 793.
https://digitalcommons.isical.ac.in/conf-articles/793

Link to Full Text

COinS

Conference Articles

Retrievability of code mixed microblogs

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Browse

Search

Author Corner

Links

Conference Articles

Retrievability of code mixed microblogs

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Share

Browse

Search

Author Corner

Links