Rethinking Latent Representations in
Behavior Cloning:

An Information Bottleneck Approach for Robot Manipulation

1Xi'an Jiaotong University 2Westlake University

Abstract

Behavior Cloning (BC) is a widely adopted visual imitation learning method in robot manipulation. Current BC approaches often enhance generalization by leveraging large datasets and incorporating additional visual and textual modalities to capture more diverse information. However, these methods overlook whether the learned representations contain redundant information and lack a solid theoretical foundation to guide the learning process. To address these limitations, we adopt an information-theoretic perspective and introduce mutual information to quantify and mitigate redundancy in latent representations. Building on this, we incorporate the Information Bottleneck (IB) principle into BC, which extends the idea of reducing redundancy by providing a structured framework for compressing irrelevant information while preserving task-relevant features. This work presents the first comprehensive study on redundancy in latent representations across various methods, backbones, and experimental settings, while extending the generalizability of the IB to BC. Extensive experiments and analyses on the CortexBench and LIBERO benchmarks demonstrate significant performance improvements with IB, underscoring the importance of reducing input data redundancy and highlighting its practical value for more practical applications.

Introduction

  • We extend the Information Bottleneck (IB) principle to Behavior Cloning (BC) and provide a comprehensive study on the impact of latent representation redundancy in Behavior Cloning for robot manipulation.
  • We empirically demonstrate that minimizing redundancy in latent representations helps existing BC algorithms significantly improve generalization performance on the Cortexbench and LIBERO benchmarks across various settings, indirectly highlighting the considerable redundancy present in current robot trajectory datasets.
  • We provide a detailed theoretical analysis explaining why Information Bottleneck principle enhances the transferability of Behavior Cloning methods.

Description of the image
Policy architecture of BC and BC+IB.

Vanilla Behavior Cloning Loss:

\[ \mathcal{L}_{\mathrm{BC}}=\mathbb{E}_{\left(x_t, a_t\right) \sim \mathcal{D}_e}\left[\left\|\pi\left(x_t\right)-a_t\right\|^2\right]. \]

Behavior Cloning with Information Bottleneck (BC-IB):

\[ \mathcal{L}_{\mathrm{BC-IB}}=\mathbb{E}_{(x_t, a_t) \sim \mathcal{D}_e}\left[\beta I(x_t, z_t) + \|\pi(x_t)-a_t\|^2\right]. \]
We apply these two optimization objectives to the commonly used BC algorithms. We categorize existing BC algorithms based on the fusion method into two types: spatial fusion and temporal fusion.
Description of the image

Simulation Experiments

We first evaluate baselines and basleines with IB on single-task benchmark CortexBench. We selected four imitation learning-related simulators, encompassing a total of 14 tasks.
Assembly Task
Bin-picking Task
Button-press-topdown Task
Drawer-open Task
Hammer Task
MetaWorld
Although only MetaWorld is strictly a robot manipulation benchmark, yet we include all tasks to comprehensively demonstrate the effectiveness of IB.
Cheetah Run Task
Finger Spin Task
Reacher Easy Task
Walker Stand Task
Walker Walk Task
DMControl
Reorient Pen Task
Relocate Task
Move Cube Task
Reach Cube Task
Adroit & Trifinger
Results on CortexBench. Description of the image
We then evaluate both the baselines and the baselines enhanced with IB on the language-conditioned multi-task benchmark LIBERO. We selected four suites, with each suite comprising a total of 10 tasks.
GIF 1
GIF 2
GIF 3
GIF 4
GIF 5
GIF 6
GIF 7
GIF 8
GIF 9
GIF 10
LIBERO-Goal
GIF 1
GIF 2
GIF 3
GIF 4
GIF 5
GIF 6
GIF 7
GIF 8
GIF 9
GIF 10
LIBERO-Object
GIF 1
GIF 2
GIF 3
GIF 4
GIF 5
GIF 6
GIF 7
GIF 8
GIF 9
GIF 10
LIBERO-Spatial
GIF 1
GIF 2
GIF 3
GIF 4
GIF 5
GIF 6
GIF 7
GIF 8
GIF 9
GIF 10
LIBERO-Long
Results on LIBERO. Description of the image

Analysis Experiments

1. Reducing redundancy improves performance.
Description of the image
Comparison of vanilla BC and BC+IB on the LIBERO in terms of success rate (sr) and mutual information \( I(X, Z) \).
2. This also holds in the few-shot setting.
Description of the image
3. \( \beta \) is the Lagrange multiplier that balances the trade-off between the compression ability and the predictive power. Setting it to an appropriate value can achieve peak performance.
Description of the image

Real-world Experiments

Description of the image
Real-world robot experiments.

BibTeX

@article{bai2025rethinking,
        title={Rethinking Latent Representations in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation},
        author={Bai, Shuanghao and Zhou, Wanqi and Ding, Pengxiang and Zhao, Wei and Wang, Donglin and Chen, Badong},
        journal={arXiv preprint arXiv:2502.02853},
        year={2025}
      }