CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning

Code Show Publication Info


Abstract

We study imitation learning to teach robots from expert demonstrations. During robots' execution, compounding errors from hardware noise and external disturbances, coupled with incomplete data coverage, can drive the agent into unfamiliar states and cause unpredictable behavior. To address this challenge, we propose a framework, CCIL: Continuity-based data augmentation for Corrective Imitation Learning. It leverages the local continuity inherent in dynamic systems to synthesize corrective labels. CCIL learns a dynamics model from the expert data and uses it to generate labels guiding the agent back to expert states. Our approach makes minimal assumptions, requiring neither expert re-labeling nor ground truth dynamics models. By exploiting local continuity, we derive provable bounds on the errors of the synthesized labels. Through evaluations across diverse robotic domains in simulation and the real world, we demonstrate CCIL's effectiveness in improving imitation learning performance.

Impact of Corrective Labels

Grasp Cube

With Corrective Labels

With Corr. Labels

Without Corrective Labels

Without Corr. Labels

Without corrective labels, the agent can knock over the cube while trying to grasp it.

Gear Insert

Without corrective labels, the agent is not precise enough to reliably insert the gear.

Grasp Coin

With Corr. Labels

Without Corr. Labels

Without corrective labels, the agent is not able to precisely grasp the coin in the right place.


CCIL Framework Overview

CCIL Overview

Our label generation algorithm consists of three steps: learning a dynamics model, generating corrective labels, and filtering out high-error labels.

Learning a Dynamics Model

We learn a dynamics model by minimizing the following loss: $$\mathbb{E}_{(s_t^*,a_t^*,s_{t+1}^*)\sim\mathcal{D}^*}\left[\hat{f}(s_t^*,a_t^*)+s_t^*-s_{t+1}^*\right]$$ Notably, a learned dynamics model can only yield reliable predictions near its data support - but not on arbitrary states and actions. CCIL decides where to query the learned dynamics models by leveraging the presence of local Lipschitz continuity in the system dynamics. CCIL encourages the learned dynamics function to exhibit local Lipschitz continuity by modifying the training objective, specifically by regularizing the continuity of the learned model with spectral normalization. Concretely, to train a dynamics model $\hat{f}$ using a neural network of $n$-layers with weight matrices $W_1,\ldots,W_n$, one can iteratively minimize the above training objective while regularizing the model by setting $$W_i\leftarrow \frac{W_i}{\max\left(\|W_i\|_2,K^{-n}\right)}\cdot K^{-n}$$ for every $W_i$, where $K$ is the Lipschitz constraint hyperparameter.

Generating Corrective Labels

With a learned dynamics model $\hat{f}$, we can generate a corrective label $(s_t^\mathcal{G}, a_t^\mathcal{G})$ for every expert data point $(s_t^*, a_t^*)$ such that $s_t^\mathcal{G}+\hat{f}(s_t^\mathcal{G},a_t^\mathcal{G})\approx s_t^*$. One of our label generation methods is BackTrack, inspired by the backwards Euler method used in modern simulators: \begin{align*} s_t^\mathcal{G} &\leftarrow s_t^* - \hat{f}(s_t^*, a_t^*) \\ a_t^\mathcal{G} &\leftarrow a_t^* \end{align*}

Filtering High-Error Labels

By leveraging the local continuity in the environment dynamics, we can derive provable bounds on the correctness of the generated labels. Armed with this error bound, we can filter out high-error labels and only use the ones that are likely to be correct. Concretely, we set a maximum allowable error, which naturally creates a maximum allowable distance between the generated state and the expert state. This can be viewed as a trust region around each expert data point, within which we can trust the generated labels to be accurate.


Real Robot Experiments

Grasp cube task
Lego insert task
Grasp coin task

CCIL improves imitation learning, especially in low-data regimes

Cube data ablation
Lego data ablation
Coin data ablation

CCIL can yield a prominent performance boost in low-data regimes compared to using standard behavior cloning, showcasing its data efficiency and robustness.

CCIL's Robustness to Lipschitz Constraint

Hyperparameter ablation grid
Local L

CCIL makes a critical assumption that the system dynamics contain local continuity. In practice, however, its application is relatively insensitive to the hyper-parameter choice of Lipschitz constraint in learning the dynamics model. As long as we filter generated labels using appropriate label error threshold, CCIL could yield a significant performance boost.

CCIL's Robustness to Disturbance

With Corrective Labels
Without Corrective Labels

CCIL's corrective labels expand the support of the demonstration data, allowing the policy to recover from significant disturbances that push it outside of the original expert state distribution.


Simulation Experiments

F1Tenth

F1Tenth graphic
F1Tenth LiDAR
Method Success Rate Avg. Score
Expert 100.0% 1.00
BC 31.9% 0.58 ± 0.25
MOReL 0.0% 0.001 ± 0.001
MILO 0.0% 0.21 ± 0.003
NoiseBC 39.3% 0.62 ± 0.28
CCIL 56.4% 0.75 ± 0.25

Drone

Drone tasks
Method Hover Circle FlyThrough
Expert -1104 -10 -4351
BC -1.08 × 108 -9.56 × 107 -1.06 × 108
MOReL -1.25 × 108 -1.24 × 108 -1.25 × 108
MILO -1.26 × 108 -1.25 × 108 -1.25 × 108
NoiseBC -1.13 × 108 -9.88 × 107 -1.07 × 108
CCIL -0.96 × 108 -8.03 × 107 -0.78 × 108

Mujoco and Metaworld

Mujoco Metaworld
Hopper Walker Ant Halfcheetah CoffeePull ButtonPress CoffeePush DrawerClose
Expert 3234.30 4592.30 3879.70 12135.00 4409.95 3895.82 4488.29 4329.34
BC 1983.98 ± 672.66 1922.55 ± 1410.09 2965.20 ± 202.71 8309.31 ± 795.30 3552.59 ± 233.41 3693.02 ± 104.99 1288.19 ± 746.37 3247.06 ± 468.73
MOReL 152.19 ± 34.12 70.27 ± 3.59 1000.77 ± 15.21 -2.24 ± 0.02 18.78 ± 0.09 14.85 ± 17.08 18.66 ± 0.02 1222.23 ± 1241.47
MILO 566.98 ± 100.32 526.72 ± 127.99 1006.53 ± 160.43 151.08 ± 117.06 232.49 ± 110.44 986.46 ± 105.79 230.62 ± 19.37 4621.11 ± 39.68
NoiseBC 1563.56 ± 1012.02 2893.21 ± 1076.89 3776.65 ± 442.13 8468.98 ± 738.83 3072.86 ± 785.91 3663.44 ± 63.10 2551.11 ± 857.79 4226.71 ± 18.90
CCIL 2631.25 ± 303.86 3538.48 ± 573.23 3338.35 ± 474.17 8757.38 ± 379.12 4168.46 ± 192.98 3775.22 ±91.24 2484.19 ± 976.03 4145.45 ± 76.23

BibTeX

@inproceedings{
    ke2024ccil,
    title={CCIL: Continuity-Based Data Augmentation for Corrective Imitation Learning},
    author={Liyiming Ke and Yunchu Zhang and Abhay Deshpande and Siddhartha Srinivasa and Abhishek Gupta},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=LQ6LQ8f4y8}
}