Author (Researcher Name)

Date of Submission

6-11-2026

Date of Award

6-17-2026

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Linguistic Research Unit (LRU-Kolkata)

Supervisor

Dash, Niladri Sekhar

Co-Supervisor (if any)

Bapuji, Mendem

Abstract (Summary of the Work)

Conversational artificial intelligence has become the primary interface through which hundreds of millions of users in India seek information and customer support. Yet the way these users actually write and speak is fundamentally at odds with the monolingual assumptions baked into most retrieval and generation systems: they code-switch, fluidly mixing one or more of the twenty-two scheduled languages of India with English, frequently typing Indic words in the Roman script ("mera refund kab tak aayega"). Standard Retrieval-Augmented Generation (RAG) pipelines silently fail on such input — the retriever returns off-topic passages because the query and the knowledge base live in different representation spaces, and the generator replies in a register that does not match the user. This thesis presents SETU-RAG (from setu, a bridge), a code-switching-aware multilingual RAG system engineered to run end-to-end on a single commodity GPU (a Google Co lab T4 with 16 GB of memory). The system makes three novel contributions. First, a CMI-Adaptive Retrieval Router routes the retrieval strategy by the linguistic profile of the query its Code-Mixing Index (CMI) and matrix language — rather than by reasoning complexity, so monolingual queries stay cheap while genuinely code-mixed queries trigger a cross lingual fan-out. Second, a Transliteration-Robust Multi-View Query expands every query into up to four parallel views(surface, native-script, matrix-canonical, and English-pivot), each embedded separately, so at least one view lands in the know l edge base's representation space regardless of script. Third, a CMI-Conditioned Generation stage conditions the answer on the user's measured matrix language and mix ratio so that the reply mirrors their register while remaining grounded in retrieved evidence. Around this text core we build a speech-to-speech (VANI) layer that adds two further contributions — acoustic–lexical language-identification fusion and CMI-conditioned text-to-speech — enabling code-switched voice in and style-matched voice out. We additionally introduce CS-RAGAS, an evaluation harness that augments the standard RAGAS quality axes (faithfulness, answer relevancy, context precision/re call) with code-switching-native metrics: CMI-alignment, language-consistency, and transliteration-robustness. Every model in the pipeline is wired in a real-with-fallback manner — the live path loads strong open-weight models (BGE-M3, BGE-reranker-v2 m3, Indic LID, Indic Xlit, IndicTrans2, and a 4-bit instruction-tuned generator), while a deterministic stand-in keeps the entire system runnable offline and on CPU for testing and reproducibility. We describe the design, the mathematical formulation of each stage, the corrective (CRAG) and faithfulness (Self-RAG) gates that guard against the iii SETU-RAG documented failure modes of code-switched retrieval, and the memory policy that keeps peak usage under 16 GB. Experiments on a code-switched customer-support corpus demonstrate that the router produces well-calibrated routing decisions, that the multi-view expansion recovers retrieval hits that a single-view retriever misses, and that the CS-native metrics capture register-mirroring behaviour that conventional metrics miss entirely. The work is a step toward conversational AI that meets Indian users in the language — and the script, and the register — in which they actually speak.

Control Number

CS2410

DOI

https://dspace.isical.ac.in/items/a5737c2e-4861-4e94-b33b-3c97d5e42653

DSpace Identifier

http://hdl.handle.net/10263/7722

Share

COinS