Author (Researcher Name)

Date of Submission

6-11-2026

Date of Award

6-17-2026

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science

Department

Machine Intelligence Unit (MIU-Kolkata)

Supervisor

Triesch, Jochen

Co-Supervisor (if any)

Bhattacharyya, Malay

Abstract (Summary of the Work)

Traditional artificial intelligence models learn by passively digesting large datasets. In contrast, human infants discover skills by actively interacting with their bodies and environments without explicit external rewards. This thesis introduces the Composer Architecture, a machine learning framework designed to mimic this autonomous, open-ended development. The Composer architecture operates in a multi-stage loop, the latent model using Contrastive Learning Through Time (CLTT) to compress high-dimensional raw data from visual, proprioceptive, and touch sensors into a low-dimensional space. To preserve data relationships and prevent topological collapse, a Softmax activation forces these latent representations to lie smoothly on a probability simplex. A goal-conditioned reinforcement learning policy then trains on this space by targeting randomly sampled one-hot goals. We evaluated the architecture on the MIMo platform, a highly realistic simulation of an 18-month-old child embedded in the MuJoCo physics engine. Testing progressed from primitive shapes to complex multi-joint control channels on the robot. On a single-finger setup, the architecture mapped latent extrema directly to full extension and flexion, while five-finger trials isolated whole-hand opening and closing configurations. To scale control, a hierarchical extension called Hi-Composer successfully coordinates complex limb motions by routing high level commands as sub-goals to localized lower-level finger policies. Finally, full body exploration benchmarks on a rollover task validated the "thin pancake" hypothesis. Compared to white-noise walks, the Composer architecture expands the agent’s spatial reach by 134 percent while successfully restricting local exploration to an organized, lower-dimensional submanifold. This demonstrates that self generated latent goals effectively guide open-ended exploration into structured movements.

Control Number

CS2435

DOI

https://dspace.isical.ac.in/items/365952e6-7590-4263-8b9e-f4f5b9130739

DSpace Identifier

http://hdl.handle.net/10263/7717

Share

COinS