BC-IB for Robot Manipulation

Abstract

Behavior Cloning (BC) is a widely adopted visual imitation learning method in robot manipulation. Current BC approaches often enhance generalization by leveraging large datasets and incorporating additional visual and textual modalities to capture more diverse information. However, these methods overlook whether the learned representations contain redundant information and lack a solid theoretical foundation to guide the learning process. To address these limitations, we adopt an information-theoretic perspective and introduce mutual information to quantify and mitigate redundancy in latent representations. Building on this, we incorporate the Information Bottleneck (IB) principle into BC, which extends the idea of reducing redundancy by providing a structured framework for compressing irrelevant information while preserving task-relevant features. This work presents the first comprehensive study on redundancy in latent representations across various methods, backbones, and experimental settings, while extending the generalizability of the IB to BC. Extensive experiments and analyses on the CortexBench and LIBERO benchmarks demonstrate significant performance improvements with IB, underscoring the importance of reducing input data redundancy and highlighting its practical value for more practical applications.

Introduction

We extend the Information Bottleneck (IB) principle to Behavior Cloning (BC) and provide a comprehensive study on the impact of latent representation redundancy in Behavior Cloning for robot manipulation.
We empirically demonstrate that minimizing redundancy in latent representations helps existing BC algorithms significantly improve generalization performance on the Cortexbench and LIBERO benchmarks across various settings, indirectly highlighting the considerable redundancy present in current robot trajectory datasets.
We provide a detailed theoretical analysis explaining why Information Bottleneck principle enhances the transferability of Behavior Cloning methods.

Policy architecture of BC and BC+IB.

Vanilla Behavior Cloning Loss:

\[ \mathcal{L}_{\mathrm{BC}}=\mathbb{E}_{\left(x_t, a_t\right) \sim \mathcal{D}_e}\left[\left\|\pi\left(x_t\right)-a_t\right\|^2\right]. \]

Behavior Cloning with Information Bottleneck (BC-IB):

\[ \mathcal{L}_{\mathrm{BC-IB}}=\mathbb{E}_{(x_t, a_t) \sim \mathcal{D}_e}\left[\beta I(x_t, z_t) + \|\pi(x_t)-a_t\|^2\right]. \]

We apply these two optimization objectives to the commonly used BC algorithms. We categorize existing BC algorithms based on the fusion method into two types: spatial fusion and temporal fusion.

Simulation Experiments

We first evaluate baselines and baselines with IB on single-task benchmark CortexBench. We selected four imitation learning-related simulators, encompassing a total of 14 tasks.

MetaWorld

Although only MetaWorld is strictly a robot manipulation benchmark, yet we include all tasks to comprehensively demonstrate the effectiveness of IB.

DMControl

Adroit & Trifinger

Results on CortexBench. Description of the image

We then evaluate both the baselines and the baselines enhanced with IB on the language-conditioned multi-task benchmark LIBERO. We selected four suites, with each suite comprising a total of 10 tasks.

LIBERO-Goal

LIBERO-Object

LIBERO-Spatial

LIBERO-Long

Results on LIBERO. Description of the image

Analysis Experiments

1. Reducing redundancy improves performance.

Comparison of vanilla BC and BC+IB on the LIBERO in terms of success rate (sr) and mutual information \( I(X, Z) \).

2. This also holds in the few-shot setting.

3. \( \beta \) is the Lagrange multiplier that balances the trade-off between the compression ability and the predictive power. Setting it to an appropriate value can achieve peak performance.

Real-world Experiments

Similar to the simulation experiments, we conducted real-world experiments for both single-task and language-conditioned multi-task settings.

Real-world robot experiments.

1. Single-task Setting

Pick the red cup

✅ VC1 case 1

❌ VC1 case 2

✅ VC1+IB case 1

✅ VC1+IB case 2

Put the red cup into a bowl

✅ VC1 case 1

❌ VC1 case 2

✅ VC1+IB case 1

✅ VC1+IB case 2

2. Language-conditioned Multi-task Setting

Put wood block into red bowl

✅ CogAct

✅ CogAct+IB

Put corn into red bowl

❌ CogAct

✅ CogAct+IB

Other cases of our CogAct+IB

Put wood block into blue bowl

✅ CogAct+IB

Put corn into blue bowl

✅ CogAct+IB

Put wood block into green bowl (unseen)

✅ CogAct+IB

Put corn into green bowl (unseen)

✅ CogAct+IB

BibTeX

@article{bai2025rethinking,
        title={Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation},
        author={Bai, Shuanghao and Zhou, Wanqi and Ding, Pengxiang and Zhao, Wei and Wang, Donglin and Chen, Badong},
        journal={arXiv preprint arXiv:2502.02853},
        year={2025}
      }