Transfer Learning for Prosthetics Using Imitation Learning

Abstract

In this paper, We Apply Reinforcement learning (RL) techniques to train arealistic biomechanical model to work with different people and on differentwalking environments. We benchmarking 3 RL algorithms: Deep DeterministicPolicy Gradient (DDPG), Trust Region Policy Optimization (TRPO) and ProximalPolicy Optimization (PPO) in OpenSim environment, Also we apply imitationlearning to a prosthetics domain to reduce the training time needed to designcustomized prosthetics. We use DDPG algorithm to train an original expertagent. We then propose a modification to the Dataset Aggregation (DAgger)algorithm to reuse the expert knowledge and train a new target agent toreplicate that behaviour in fewer than 5 iterations, compared to the 100iterations taken by the expert agent which means reducing training time by 95%.Our modifications to the DAgger algorithm improve the balance betweenexploiting the expert policy and exploring the environment. We show empiricallythat these improve convergence time of the target agent, particularly whenthere is some degree of variation between expert and naive agent.

Quick Read (beta)

loading the full paper ...