Author (Researcher Name)

Date of Submission

6-10-2026

Date of Award

6-16-2026

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)

Supervisor

Bhattacharya, Ujjwal

Abstract (Summary of the Work)

Parameter-efficient fine-tuning (PEFT) adapts a frozen pre-trained language model by training only a small number of additional parameters. Among PEFT approaches, prompt tuning prepends trainable continuous vectors (soft prompts) to the input. A recurring finding in the literature is that prompt tuning is strongly scale dependent: it rivals full fine-tuning on very large models but lags on smaller ones. This dissertation studies prompt tuning specifically in the small-language-model (SLM) regime. We (i) re-implement a representative set of prompt-tuning methods—Prompt Tuning, P-Tuning v2, LoPT, DPT, DePT, ACCEPT, Residual Prompt Tuning, and PARA—within a single controlled harness, enabling a fair head-to-head comparison against full fine-tuning; (ii) propose IA-DePT, a lightweight instance-aware extension of Decomposed Prompt Tuning that conditions the short soft prompt on each input through a small, zero-initialised gate; and (iii) extend the benchmark beyond a single backbone and task, evaluating the full method suite on six backbone/task settings that span encoder–decoder (t5-small), encoder-only (BERT-base, RoBERTa-base, ELECTRA-small), and decoder-only (DistilGPT-2) architectures across the GLUE/SuperGLUE tasks RTE, WSC, CB, COPA, WiC, and MRPC. On RTE with t5-small, IA-DePT is the strongest parameter-efficient method in our benchmark (55.6% single-seed accuracy) and improves over its own base, DePT, by 6.5 points (53.6% vs. 47.1%, mean over three seeds) while adding only ≈16.9k parameters—a total trainable footprint of 0.05% of the backbone. Because the gate degrades exactly to DePT at initialisation, the comparison is a clean single-variable ablation. The cross-architecture study shows that the instance gate improves on DePT in five of the six settings on each setting’s primary metric (it ties or marginally regresses only on WiC, where every PEFT method sits at chance), so the benefit is broad but not universal. Our analysis characterises the accuracy/parameter trade-offs across method families, the strong effect of task difficulty on the small-model regime, and the role of instance-conditioning, including an honest discussion of why many prompt-tuning methods remain close to the chance baseline at this scale.

Control Number

CS2419

DOI

https://dspace.isical.ac.in/items/742cb355-eda2-4e04-9301-3fca3f31d3db

DSpace Identifier

http://hdl.handle.net/10263/7740

Share

COinS