Enhancing Hardware Fault Tolerance in Machines with Reinforcement Learning Policy Gradient Algorithms

Abstract

Industry is rapidly moving towards fully autonomous and interconnectedsystems that can detect and adapt to changing conditions, including machinehardware faults. Traditional methods for adding hardware fault tolerance tomachines involve duplicating components and algorithmically reconfiguring amachine's processes when a fault occurs. However, the growing interest inreinforcement learning-based robotic control offers a new perspective onachieving hardware fault tolerance. However, limited research has explored thepotential of these approaches for hardware fault tolerance in machines. Thispaper investigates the potential of two state-of-the-art reinforcement learningalgorithms, Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC), toenhance hardware fault tolerance into machines. We assess the performance ofthese algorithms in two OpenAI Gym simulated environments, Ant-v2 andFetchReach-v1. Robot models in these environments are subjected to sixsimulated hardware faults. Additionally, we conduct an ablation study todetermine the optimal method for transferring an agent's knowledge, acquiredthrough learning in a normal (pre-fault) environment, to a (post-)faultenvironment in a continual learning setting. Our results demonstrate thatreinforcement learning-based approaches can enhance hardware fault tolerance insimulated machines, with adaptation occurring within minutes. Specifically, PPOexhibits the fastest adaptation when retaining the knowledge within its models,while SAC performs best when discarding all acquired knowledge. Overall, thisstudy highlights the potential of reinforcement learning-based approaches, suchas PPO and SAC, for hardware fault tolerance in machines. These findings pavethe way for the development of robust and adaptive machines capable ofeffectively operating in real-world scenarios.

Quick Read (beta)

loading the full paper ...