Date of Submission
6-2025
Date of Award
6-2025
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Master's Dissertation
Degree Name
Master of Technology
Subject Name
Computer Science
Department
Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)
Supervisor
Majumdar, Debapriyo
Abstract (Summary of the Work)
Due to the limited capabilities of single Large Language Models (LLMs), multiple LLMs can be employed in tandem for better reliability of answers. Blending refers to combining the strengths of various LLMs to make use of their complementary capabilities for generating high-quality responses. It is a non-trivial problem, and the task becomes even more difficult when aiming for minimal latency and supervising the blending components. The standard framework, LLM- Blender, approaches this in three stages: response generation, candidate selection via ranking, and response fusion through summarization. However, this pipeline faces two critical limita- tions—high latency due to repeated ranking steps, and heavy reliance on external, supervised components including a learned encoder for ranking and a separate sequence-to-sequence sum- marizer for fusion. In this thesis, we propose novel, efficient alternatives to overcome these challenges. This thesis comprises two works. First, we show that reducing the frequency of ranking within multi- turn conversations significantly improves latency with minimal degradation in output quality. Second, we introduce a peer-review-based response fusion mechanism, where LLMs collectively evaluate and revise each other’s responses, removing the need for any externally trained rankers or summarizers. This collaborative method enables fully self-contained LLM blending without additional training or supervision. We assess our proposed methods on the task of Conversational Question Answering across five multi-turn conversational benchmarks — ConvQuestions, Atlas-Converse, CoQA, QuAC, and DoQA—using ten diverse, publicly available open-weight LLMs. Experimental results demon- strate that our peer-review-driven framework with reduced ranking achieves quality on par with existing approaches while being substantially more efficient. Our work presents a step toward scalable, modular LLM ensembling for real-world open-domain dialogue systems.
Control Number
CS2318
DOI
https://dspace.isical.ac.in/items/6fd6baba-c61d-47bc-9c80-85c7827f2e95
DSpace Identifier
http://hdl.handle.net/10263/7592
Recommended Citation
Chatterjee, Sandeep, "Efficient Blending of Large Language Models" (2025). Master’s Dissertations. 425.
https://digitalcommons.isical.ac.in/masters-dissertations/425