Author (Researcher Name)

Date of Submission

5-2025

Date of Award

6-11-2025

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Computer Vision and Pattern Recognition Unit (CVPR-Kolkata)

Supervisor

Bhattacharya, Ujjwal

Abstract (Summary of the Work)

The rise of generative models and affordable video editing tools has fueled the spread of fake and manipulated videos, undermining information reliabilityespecially on social media. Traditional detection methods, focused on single modalities like visual artifacts or text cues, often struggle with diverse, user-generated content. This dissertation presents a unified framework for fake video detection that integrates multimodal semantics, narrative structure, and propagation behavior. Visual, audio, text, and OCR features are extracted using pretrained models (CLIP, Wav2Vec2), and segment-level graphs are built to model narrative flow using Graph Attention Networks (GATv2Conv). User engagement dynamics are modeled via a bidirectional LSTM. A cross-modal consistency loss encourages semantic alignment across modalities, improving representational coherence. The end-to-end model is evaluated on heterogeneous datasets like FakeTT, demonstrating strong generalization and robustness. Results show the proposed system outperforms existing baselines, especially in challenging cases with asynchronous or fragmented content. By combining content, structure, and behavioral cues, the framework enables more reliable and interpretable fake video detection.

Control Number

CS2311

DOI

https://dspace.isical.ac.in/items/3fc70ae1-8996-4f2b-9e71-7005f1673d3a

DSpace Identifier

http://hdl.handle.net/10263/7586

Share

COinS