Abstract
Despite the numerous advances, reinforcement learning remains away fromwidespread acceptance for autonomous controller design as compared to classicalmethods due to lack of ability to effectively tackle the reality gap. Thereliance on absolute or deterministic reward as a metric for optimizationprocess renders reinforcement learning highly susceptible to changes in problemdynamics. We introduce a novel framework that effectively quantizes theuncertainty of the design space and induces robustness in controllers byswitching to a reliability-based optimization routine. The data efficiency ofthe method is maintained to match reward based optimization methods byemploying a model-based approach. We prove the stability of learnedneuro-controllers in both static and dynamic environments on classicalreinforcement learning tasks such as Cart Pole balancing and Inverted Pendulum.