CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

Code Show Publication Info Hide Publication Info

CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

(ICLR 2024)

Liyiming Ke*¹, Yunchu Zhang*¹, Abhay Deshpande¹, Siddhartha S. Srinivasa¹, Abhishek Gupta¹,

¹University of Washington

Paper arXiv

Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels

(IROS 2024)

Abhay Deshpande¹, Liyiming Ke¹, Quinn Pfeifer¹, Abhishek Gupta¹, Siddhartha S. Srinivasa¹,

¹University of Washington

Paper arXiv

CCIL: Continuity-based Corrective Labels for Imitation Learning

(Unified narrative of CCIL, Preprint)

Liyiming Ke*¹, Abhay Deshpande*¹, Quinn Pfeifer¹, Yunchu Zhang¹, Abhishek Gupta¹, Siddhartha S. Srinivasa¹,

¹University of Washington

Paper

Abstract

We study imitation learning to teach robots from expert demonstrations. During robots' execution, compounding errors from hardware noise and external disturbances, coupled with incomplete data coverage, can drive the agent into unfamiliar states and cause unpredictable behavior. To address this challenge, we propose a framework, CCIL: Continuity-based data augmentation for Corrective Imitation Learning. It leverages the local continuity inherent in dynamic systems to synthesize corrective labels. CCIL learns a dynamics model from the expert data and uses it to generate labels guiding the agent back to expert states. Our approach makes minimal assumptions, requiring neither expert re-labeling nor ground truth dynamics models. By exploiting local continuity, we derive provable bounds on the errors of the synthesized labels. Through evaluations across diverse robotic domains in simulation and the real world, we demonstrate CCIL's effectiveness in improving imitation learning performance.

Impact of Corrective Labels

Grasp Cube

With Corrective Labels

With Corr. Labels

Without Corrective Labels

Without Corr. Labels

Without corrective labels, the agent can knock over the cube while trying to grasp it.

Gear Insert

Without corrective labels, the agent is not precise enough to reliably insert the gear.

Grasp Coin

With Corr. Labels

Without Corr. Labels

Without corrective labels, the agent is not able to precisely grasp the coin in the right place.

CCIL Framework Overview

Our label generation algorithm consists of three steps: learning a dynamics model, generating corrective labels, and filtering out high-error labels.

Learning a Dynamics Model

We learn a dynamics model by minimizing the following loss: $$\mathbb{E}_{(s_t^*,a_t^*,s_{t+1}^*)\sim\mathcal{D}^*}\left[\hat{f}(s_t^*,a_t^*)+s_t^*-s_{t+1}^*\right]$$ Notably, a learned dynamics model can only yield reliable predictions near its data support - but not on arbitrary states and actions. CCIL decides where to query the learned dynamics models by leveraging the presence of local Lipschitz continuity in the system dynamics. CCIL encourages the learned dynamics function to exhibit local Lipschitz continuity by modifying the training objective, specifically by regularizing the continuity of the learned model with spectral normalization. Concretely, to train a dynamics model $\hat{f}$ using a neural network of $n$-layers with weight matrices $W_1,\ldots,W_n$, one can iteratively minimize the above training objective while regularizing the model by setting $$W_i\leftarrow \frac{W_i}{\max\left(\|W_i\|_2,K^{-n}\right)}\cdot K^{-n}$$ for every $W_i$, where $K$ is the Lipschitz constraint hyperparameter.

Generating Corrective Labels

With a learned dynamics model $\hat{f}$, we can generate a corrective label $(s_t^\mathcal{G}, a_t^\mathcal{G})$ for every expert data point $(s_t^*, a_t^*)$ such that $s_t^\mathcal{G}+\hat{f}(s_t^\mathcal{G},a_t^\mathcal{G})\approx s_t^*$. One of our label generation methods is BackTrack, inspired by the backwards Euler method used in modern simulators: \begin{align*} s_t^\mathcal{G} &\leftarrow s_t^* - \hat{f}(s_t^*, a_t^*) \\ a_t^\mathcal{G} &\leftarrow a_t^* \end{align*}

Filtering High-Error Labels

By leveraging the local continuity in the environment dynamics, we can derive provable bounds on the correctness of the generated labels. Armed with this error bound, we can filter out high-error labels and only use the ones that are likely to be correct. Concretely, we set a maximum allowable error, which naturally creates a maximum allowable distance between the generated state and the expert state. This can be viewed as a trust region around each expert data point, within which we can trust the generated labels to be accurate.

Real Robot Experiments

CCIL improves imitation learning, especially in low-data regimes

CCIL can yield a prominent performance boost in low-data regimes compared to using standard behavior cloning, showcasing its data efficiency and robustness.

CCIL's Robustness to Lipschitz Constraint

CCIL makes a critical assumption that the system dynamics contain local continuity. In practice, however, its application is relatively insensitive to the hyper-parameter choice of Lipschitz constraint in learning the dynamics model. As long as we filter generated labels using appropriate label error threshold, CCIL could yield a significant performance boost.

CCIL's Robustness to Disturbance

With Corrective Labels

Without Corrective Labels

CCIL's corrective labels expand the support of the demonstration data, allowing the policy to recover from significant disturbances that push it outside of the original expert state distribution.

Simulation Experiments

F1Tenth

Method	Success Rate	Avg. Score
Expert	100.0%	1.00
BC	31.9%	0.58 ± 0.25
MOReL	0.0%	0.001 ± 0.001
MILO	0.0%	0.21 ± 0.003
NoiseBC	39.3%	0.62 ± 0.28
CCIL	56.4%	0.75 ± 0.25

Drone

Method	Hover	Circle	FlyThrough
Expert	-1104	-10	-4351
BC	-1.08 × 10⁸	-9.56 × 10⁷	-1.06 × 10⁸
MOReL	-1.25 × 10⁸	-1.24 × 10⁸	-1.25 × 10⁸
MILO	-1.26 × 10⁸	-1.25 × 10⁸	-1.25 × 10⁸
NoiseBC	-1.13 × 10⁸	-9.88 × 10⁷	-1.07 × 10⁸
CCIL	-0.96 × 10⁸	-8.03 × 10⁷	-0.78 × 10⁸

Mujoco and Metaworld

	Mujoco				Metaworld
	Hopper	Walker	Ant	Halfcheetah	CoffeePull	ButtonPress	CoffeePush	DrawerClose
Expert	3234.30	4592.30	3879.70	12135.00	4409.95	3895.82	4488.29	4329.34
BC	1983.98 ± 672.66	1922.55 ± 1410.09	2965.20 ± 202.71	8309.31 ± 795.30	3552.59 ± 233.41	3693.02 ± 104.99	1288.19 ± 746.37	3247.06 ± 468.73
MOReL	152.19 ± 34.12	70.27 ± 3.59	1000.77 ± 15.21	-2.24 ± 0.02	18.78 ± 0.09	14.85 ± 17.08	18.66 ± 0.02	1222.23 ± 1241.47
MILO	566.98 ± 100.32	526.72 ± 127.99	1006.53 ± 160.43	151.08 ± 117.06	232.49 ± 110.44	986.46 ± 105.79	230.62 ± 19.37	4621.11 ± 39.68
NoiseBC	1563.56 ± 1012.02	2893.21 ± 1076.89	3776.65 ± 442.13	8468.98 ± 738.83	3072.86 ± 785.91	3663.44 ± 63.10	2551.11 ± 857.79	4226.71 ± 18.90
CCIL	2631.25 ± 303.86	3538.48 ± 573.23	3338.35 ± 474.17	8757.38 ± 379.12	4168.46 ± 192.98	3775.22 ±91.24	2484.19 ± 976.03	4145.45 ± 76.23

BibTeX

@inproceedings{
    ke2024ccil,
    title={CCIL: Continuity-Based Data Augmentation for Corrective Imitation Learning},
    author={Liyiming Ke and Yunchu Zhang and Abhay Deshpande and Siddhartha Srinivasa and Abhishek Gupta},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=LQ6LQ8f4y8}
}

@inproceedings{
  deshpande2025data,
  title={Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels},
  author={Abhay Deshpande and Liyiming Ke and Quinn Pfeifer and Abhishek Gupta and Siddhartha Srinivasa},
  booktitle={The IEEE/RSJ International Conference on Intelligent Robots and Systems},
  year={2024},
  url={https://ieeexplore.ieee.org/document/10801414}
}

@misc{
  ke2025ccil,
  title={CCIL: Continuity-based Corrective Labels for Imitation Learning},
  author={Liyiming Ke and Abhay Deshpande and Quinn Pfeifer and Yunchu Zhang and Abhishek Gupta and Siddhartha Srinivasa},
  year={2025},
  url={personalrobotics.github.io/CCIL/static/pdf/ke2025ccil.pdf}
}