Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

Zengjie Zhang, Jayden Hong, Amir M. Soufi Enayati, Homayoun Najjaran (Corresponding author)

Research output: Contribution to journalArticleAcademicpeer-review

2 Downloads (Pure)

Abstract

Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this article, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.

Original languageEnglish
Pages (from-to)4733-4749
Number of pages17
JournalIEEE Transactions on Robotics
Volume40
DOIs
Publication statusPublished - 26 Sept 2024

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • Behavior cloning (BC)
  • heuristic method
  • human motion
  • learning from demonstration
  • motion primitive
  • reinforcement learning (RL)
  • robot motion planning

Fingerprint

Dive into the research topics of 'Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning'. Together they form a unique fingerprint.

Cite this