Loading...
Learning-based adaptive and stable in-hand manipulation in simulation and real-world environments
Tao, Lingfeng
Tao, Lingfeng
Citations
Altmetric:
Advisor
Editor
Date
Date Issued
2023
Date Submitted
Keywords
Collections
Research Projects
Organizational Units
Journal Issue
Embargo Expires
2025-06-24
Abstract
Dexterous in-hand manipulation is one of the essential functions for intelligent robots but also challenging to solve due to the high degrees of freedom in control and the complex interaction with objects. Deep Reinforcement Learning (DRL) has shown its abilities to solve dexterous in-hand manipulation, which enables the robot to learn a control policy by interacting with the environment. Even though learning-based in-hand manipulation is promising for the broad adoption of dexterous robot hands, the training and deployment of the DRL policy still hold significant challenges: 1) existing approaches focus on training a single robot-structure-specific policy through the centralized learning mechanism, lacking adaptability to changes like robot malfunction; 2) the sparse reward is preferred to dense rewards because it focuses on task completion with no constraints to the manipulation behaviors, making the training easier. However, training without behavior constraints may lead to aggressive and unstable policies, which are insufficient for safety-critical tasks; 3) current simulation to real-world transfer methods force the DRL to accommodate and adapt to the limited information in the real world with ambiguous and high-dimensional input that cannot maximize the benefit of the simulation’s rich information, reducing data explicitly and learning efficiency. Three research objectives are developed to address these issues: 1) developing a multi-agent approach that models the in-hand manipulation as a cooperation task and enables local observation and experience synchronization to improve policy adaptability and generalizability; 2) constraining the manipulation behavior with finger-specific shadow rewards constructed from the state-action occupancy measure and enabling information sharing across the policies updating for consensus training; 3) proposing a curriculum-based sensing reduction method to enable the DRL to start the training with the rich feature space for higher training performance, then get rid of the hard-to-extract features step-by-step to gradually adapt to the real-world. The reduced sensor signals are replaced with random signals generated by a deep random generator to remove the dependency between the output and the reduced sensors and avoid creating new dependencies. Overall, this dissertation presents a comprehensive learning-based framework that improves the practicability of the training-to-deployment process of the DRL-based in-hand manipulation.
Associated Publications
Rights
Copyright of the original work is retained by the author.