Physics-informed reinforcement learning via probabilistic co-adjustment functions

Abstract

Reinforcement learning of real-world tasks is very data inefficient, andextensive simulation-based modelling has become the dominant approach fortraining systems. However, in human-robot interaction and many other real-worldsettings, there is no appropriate one-model-for-all due to differences inindividual instances of the system (e.g. different people) or necessaryoversimplifications in the simulation models. This requires two approaches: 1.either learning the individual system's dynamics approximately from data whichrequires data-intensive training or 2. using a complete digital twin of theinstances, which may not be realisable in many cases. We introduce twoapproaches: co-kriging adjustments (CKA) and ridge regression adjustment (RRA)as novel ways to combine the advantages of both approaches. Our adjustmentmethods are based on an auto-regressive AR1 co-kriging model that we integratewith GP priors. This yield a data- and simulation-efficient way of usingsimplistic simulation models (e.g., simple two-link model) and rapidly adaptingthem to individual instances (e.g., biomechanics of individual people). UsingCKA and RRA, we obtain more accurate uncertainty quantification of the entiresystem's dynamics than pure GP-based and AR1 methods. We demonstrate theefficiency of co-kriging adjustment with an interpretable reinforcementlearning control example, learning to control a biomechanical human arm usingonly a two-link arm simulation model (offline part) and CKA derived from asmall amount of interaction data (on-the-fly online). Our method unlocks anefficient and uncertainty-aware way to implement reinforcement learning methodsin real world complex systems for which only imperfect simulation models exist.

Quick Read (beta)

loading the full paper ...