Date of Submission

6-2025

Date of Award

6-2025

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)

Supervisor

Majumdar, Debapriyo

Abstract (Summary of the Work)

Due to the limited capabilities of single Large Language Models (LLMs), multiple LLMs can be employed in tandem for better reliability of answers. Blending refers to combining the strengths of various LLMs to make use of their complementary capabilities for generating high-quality responses. It is a non-trivial problem, and the task becomes even more difficult when aiming for minimal latency and supervising the blending components. The standard framework, LLM- Blender, approaches this in three stages: response generation, candidate selection via ranking, and response fusion through summarization. However, this pipeline faces two critical limita- tions—high latency due to repeated ranking steps, and heavy reliance on external, supervised components including a learned encoder for ranking and a separate sequence-to-sequence sum- marizer for fusion. In this thesis, we propose novel, efficient alternatives to overcome these challenges. This thesis comprises two works. First, we show that reducing the frequency of ranking within multi- turn conversations significantly improves latency with minimal degradation in output quality. Second, we introduce a peer-review-based response fusion mechanism, where LLMs collectively evaluate and revise each other’s responses, removing the need for any externally trained rankers or summarizers. This collaborative method enables fully self-contained LLM blending without additional training or supervision. We assess our proposed methods on the task of Conversational Question Answering across five multi-turn conversational benchmarks — ConvQuestions, Atlas-Converse, CoQA, QuAC, and DoQA—using ten diverse, publicly available open-weight LLMs. Experimental results demon- strate that our peer-review-driven framework with reduced ranking achieves quality on par with existing approaches while being substantially more efficient. Our work presents a step toward scalable, modular LLM ensembling for real-world open-domain dialogue systems.

Control Number

CS2318

DOI

https://dspace.isical.ac.in/items/6fd6baba-c61d-47bc-9c80-85c7827f2e95

DSpace Identifier

http://hdl.handle.net/10263/7592

Recommended Citation

Chatterjee, Sandeep, "Efficient Blending of Large Language Models" (2025). Master’s Dissertations. 425.
https://digitalcommons.isical.ac.in/masters-dissertations/425

Download

Included in

Computer Sciences Commons

COinS

Master’s Dissertations

Efficient Blending of Large Language Models

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Control Number

DOI

DSpace Identifier

Recommended Citation

Included in

Browse

Search

Author Corner

Links

Master’s Dissertations

Efficient Blending of Large Language Models

Author (Researcher Name)

Date of Submission

Date of Award

Institute Name (Publisher)

Document Type

Degree Name

Subject Name

Department

Supervisor

Abstract (Summary of the Work)

Control Number

DOI

DSpace Identifier

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links