Gym-Ignition: Reproducible Robotic Simulations for Reinforcement Learning

  • 2019-11-05 11:19:58
  • Diego Ferigo, Silvio Traversaro, Daniele Pucci
  • 15

Abstract

In this paper we present Gym-Ignition, a new framework to create reproduciblerobotic environments for reinforcement learning research. It interfaces withthe new generation of Gazebo, part of the Ignition Robotics suite. The newIgnition Gazebo simulator mainly provides three improvements for reinforcementlearning applications compared to the alternatives: 1) the modular architectureenables using the simulator as a C++ library, simplifying the interconnectionwith external software; 2) multiple physics and rendering engines are supportedas plugins, and they can be switched during runtime; 3) the new distributedsimulation capability permits simulating complex scenarios while sharing theload on multiple workers and machines. The core of Gym-Ignition is a componentthat contains the Ignition Gazebo simulator, and it simplifies itsconfiguration and usage. We provide a Python package that permits developers tocreate robotic environments simulated in Ignition Gazebo. Environments exposethe common OpenAI Gym interface, making them compatible out-of-the-box withthird-party frameworks containing reinforcement learning algorithms.Simulations can be executed in both headless and GUI mode, and the physicsengine can run in accelerated mode and instances can be parallelized.Furthermore, the Gym-Ignition software architecture provides abstraction of theRobot and the Task, making environments agnostic from the specific runtime.This allows their execution also in a real-time setting on actual roboticplatforms.

 

Quick Read (beta)

Gym-Ignition: Reproducible Robotic Simulations for Reinforcement Learning

Diego Ferigo ,
[email protected]
\AndSilvio Traversaro *
[email protected]
\AndDaniele Pucci *
[email protected]
Dynamic Interaction Control, Italian Institute of Technology, Genova, IT, 16163University of Manchester, Machine Learning and Optimisation, Manchester, UK, M13 9PL
Abstract

In this paper we present Gym-Ignition, a new framework to create reproducible robotic environments for reinforcement learning research. It interfaces with the new generation of Gazebo, part of the Ignition Robotics suite. The new Ignition Gazebo simulator mainly provides three improvements for reinforcement learning applications compared to the alternatives: 1) the modular architecture enables using the simulator as a C++ library, simplifying the interconnection with external software; 2) multiple physics and rendering engines are supported as plugins, and they can be switched during runtime; 3) the new distributed simulation capability permits simulating complex scenarios while sharing the load on multiple workers and machines. The core of Gym-Ignition is a component that contains the Ignition Gazebo simulator, and it simplifies its configuration and usage. We provide a Python package that permits developers to create robotic environments simulated in Ignition Gazebo. Environments expose the common OpenAI Gym interface, making them compatible out-of-the-box with third-party frameworks containing reinforcement learning algorithms. Simulations can be executed in both headless and GUI mode, and the physics engine can run in accelerated mode and instances can be parallelized. Furthermore, the Gym-Ignition software architecture provides abstraction of the Robot and the Task, making environments agnostic from the specific runtime. This allows their execution also in a real-time setting on actual robotic platforms.

\usetikzlibrary

arrows

 

Gym-Ignition: Reproducible Robotic Simulations for Reinforcement Learning


 A Preprint
Diego Ferigothanks: Dynamic Interaction Control, Italian Institute of Technology, Genova, IT, 16163 , thanks: University of Manchester, Machine Learning and Optimisation, Manchester, UK, M13 9PL [email protected] Silvio Traversaro * [email protected] Daniele Pucci * [email protected]


Keywords simulation   robotics   reinforcement learning   system integration

1 Introduction

Simulations have always been a key component in robotics. Over the years, their accuracy and efficiency constantly improved, and nowadays there are numerous valid physics engines and simulators. They became part of every roboticist toolbox and always collected great interest and contributions from the entire community.

Agents trained with Reinforcement Learning (RL) algorithms, in order to solve their decision making problems, need to dynamically interact with an environment by taking actions and gathering information of their consequences, i.e. sampling experience. Classical benchmarks used in this research field typically involve grid-worlds or simple toy problems. However, the advent of Deep Learning (DL) and its combination with RL, allowed machines to solve complex decision making tasks that have been out of their reach until now.

New benchmarks involving harder and more complex scenarios — environments — have been developed, mainly originating from the gaming realm. A virtuous cycle we experienced in this domain is the constant push of novel algorithms thanks to more complex environments and vice versa. The community has been very prolific in constantly extending the offer of these environments. Typical examples are the Arcade Learning Environment [1], OpenAI Gym [2], DeepMind Lab [3], DeepMind Control Suite [4], to name just a few.

The new boosted capability of Deep RL attracted much interested from many research topics. Robotics is one of those that can benefit at most from the freedom framed by the formulation of the Reinforcement Learning problem. Applications range from manipulation to locomotion, both affected by a very high complexity which generally demands tedious heuristic tuning. However, the interaction with real world poses a set of challenges that differs considerably from the typical experimental setup of reinforcement learning, that involves gaming-like simulations. In [5], authors outline nine high-level challenges of applying RL to the real world. Robotics suffers of all of them.

Analogously to classic robotic research, also the application of RL to robotics has always tried to take advantage of simulated environments. The motivations, in this case, are even more critical since the systems to control are costly and delicate. The intrinsic need of exploration during the training phase might be dangerous to the robotic platforms and their surrounding. Moreover, even if safe constraints are enforced while training, collecting experience only using real-world interactions is often not sufficient. Simulations can generate large amount of synthetic experience that can be used to train policies. From one hand, simulations help overcoming the limitations of real-time data collection, but from the other they introduce a bias caused by intrinsic and unavoidable modeling approximations. The process of training a policy in simulation and then transfer it to the real world is better known as sim-to-real [6, 7], and the difference between simulation and reality is typically addressed as reality gap.

In the recent literature, many are the examples of successful attempts to port simulated policies to the real world. Common techniques to bypass the reality gap include improving the description of the simulated robot  [8], learning effects difficult to characterize from real-world data and then using their models in simulation [9, 10], massively randomizing simulation parameters [11, 12], imitating behaviour from experts or existing controllers [13, 14], applying hierarchical architecture to decompose complex tasks [15].

A common line shared by all these works is the extensive use of complex simulations, most of the time using open-source software. However, in many cases authors didn’t release their experimental setup, making the reproduction of their result very difficult if not impossible. Reproducibility can be improved following two directions. Firstly, the entire community would benefit from a standardized platform to apply reinforcement learning techniques to simulated robots. We believe that also in the robotics domain, standardized environments would trigger the same virtuous cycle that characterized the past breakthrough in reinforcement learning. Secondly, the community would benefit from a platform that is versatile enough to minimize the system integration effort always required when dealing with real robotic platforms. Roboticists know that real-world applicability involves a considerable amount of custom code and heuristic tuning. Though, simulating frameworks might at least try to abstract as much as possible low-level details and provide generic interfaces that can then be customized as needed.

In this work we present Gym-Ignition11 1 https://github.com/robotology/gym-ignition, a framework to create reproducible reinforcement learning environments for robotic research. The environments created with Gym-Ignition target both simulated and real-time settings. The modular project architecture enables to seamlessly switch between the two domains without the need to adapt the logic of the decision-making task to the real platform. Simulated environments run in the new generation of the Gazebo simulator [16] called Ignition Gazebo, part of the Ignition Robotics22 2 https://ignitionrobotics.org suite. It features a new abstraction layer that makes easy to integrate new physics engines in C++ and switch them on-the-fly during the simulation. Alternatively, new physics engines that provide Python bindings can be integrated at the Python level exploiting the Gym-Ignition abstractions.

Gym-Ignition aims to narrow the gap between reinforcement learning and robotic research. It permits roboticists to create simulated environments using familiar tools like Gazebo, SDF, and URDF files to model both the robot and the scenario. Environment developers can choose either Python or C++, depending on the stage of the development. In fact, it can benefit of the former for quick prototyping and the latter for deployment. From both domains, we provide the support of exposing the common OpenAI Gym interface [2]. This makes the environments compatible with the majority of projects developed by the reinforcement learning community that provide algorithms.

To the best of our knowledge, Gym-Ignition is the first project that integrates with the new Ignition Robotics suite developed by Open Robotics33 3 https://www.openrobotics.org. We believe that it will progressively take over the current generation of the simulator, providing new features, enhanced performance, and improved user experience.

This paper is structured as follows. Firstly, we identify a set of useful properties that characterize projects that provide robotic environments. Then, we selected the main available projects that provide robotic environments compatible with OpenAI Gym and briefly describe their properties and shortcomings. We proceed presenting Gym-Ignition, explaining its architecture and describing its features. Finally, we conclude discussing the current limitations and outlining the future activities.

2 Similar Software

The rise of Deep Reinforcement Learning has constantly been accompanied with the development of novel environments. In fact, a virtuous cycle we experienced in past years is the constant push of novel algorithms thanks to more complex environments and vice versa. Also the application to robotics has seen a similar trend, starting from simple control problems in 2D toy environments [2], to more accurate 3D simulations with detailed models and realistic physics [12].

In this section we selected few common suites that provide robotic environments for reinforcement learning research. A complete comparison is reported in Table 1. Here below, for each project we briefly comment notable properties and shortcomings.

OpenAI Robotic Environments44 4 https://openai.com/blog/ingredients-for-robotics-research are part of the official OpenAI enviroments, which became the de-facto standard solution commonly used to benchmark algorithms. They are simulated with the Mujoco simulator, that became one of the most common solutions for continuous control tasks. Unfortunately, the simulator is proprietary software, constraint that greatly limits its use. Furthermore, the simulator has realistic rendering limitations.

Bullet3 Environments [22] are part of the Bullet3 project and use Bullet as physics engine. Given the active development and open-source nature of the project, a big community circles around this physics engine. Simulations are reliable and fast, but the default rendering capabilities are not photorealistic. The provided robotic environments are valid, even if documentation and modularity can be improved. There are few third-party projects that use both the Bullet engine and the PyBullet Python bindings to provide alternative robotic environments.

PyBullet Gymperium55 5 https://github.com/benelot/pybullet-gym is a collection of environments similar to the official ones from OpenAI and the PyBullet. In fact, it provides robotic environments that use PyBullet and Mujoco. It was developed with a modular software architecture that abstracts the robot. Though, both supported simulators lack realistic rendering.

Gym-Gazebo2 [23] has been developed since the beginning with the target of robotic applications. It interfaces with the Gazebo simulator, widely used in robotics. The environments are very easy to create thanks to the SDF format. Gazebo is a familiar tool to many roboticists and many laboratories can reuse many existing resources to interface their robotic platforms with the RL framework. The bigger drawback of the software architecture is the socket-based communication between the Python code of the environment and the simulator, that makes the simulation not fully reproducible.

OpenAI ROS66 6 http://wiki.ros.org/openai_ros provides RL environments for ROS robots running in the Gazebo simulator. Beyond sharing the same Gazebo drawbacks described for gym-gazebo2, Open Robotics didn’t yet implement the network segmentation that enables parallel simulations. Since the communication with the robot is based on the ROS middleware, this project in theory might support the application on real robots. Though, there is no suitable environment nor any documentation of how to execute or improve a policy on real robotic devices.

Nvidia ISAAC77 7 https://www.nvidia.com/en-us/deep-learning-ai/industries/robotics is the new Nvidia toolbox for AI applications on robotics. The simulations are executed in their PhysX engine and they provide impressive photorealistic rendering. It has been announced very recently and there aren’t yet any mature examples nor a complete documentation. The environments compatible with OpenAI Gym haven’t been yet released, and Table 1 was filled from the declared specifications. ISAAC is one of the most promising projects that will provide a unified framework for robotics and AI, but unfortunately its closed source nature might limit the possibility to extend it.

Unity ML Agent [19] is another novel and promising toolkit for creating environments using the Unity platform. It supports Nvidia PhysX out-of-the-box and plugins exist for Bullet and Mujoco. Being based on a gaming engine, rendering is very photorealistic. Despite agent code and physics engine reside on different processes, the selected gRPC communication protocol in its synchronous variant would ensure determinism. However, custom actions and observations require the manual creation of the protobuf message.

Gibson [24] is another recent framework for active agents with main focus on real-world perception. Its rendering capabilities are highly photorealistic and they can be considered state-of-the-art.

RaiSim [25] is a recently released simulator specific for robotics. Its main advantage is an efficient contact solver that greatly speeds up the simulation. Due to its very recent release, there are not many examples available. As other frameworks, its closed-source nature might limit applications.

\resizebox

0.7!

\tikzset

every picture/.style=line width=0.75pt

{tikzpicture}

[x=0.75pt,y=0.75pt,yscale=-1,xscale=1]

\draw

[fill=rgb, 255:red, 219; green, 207; blue, 176 ,fill opacity=1 ] (20,30) .. controls (20,24.48) and (24.48,20) .. (30,20) – (530,20) .. controls (535.52,20) and (540,24.48) .. (540,30) – (540,60) .. controls (540,65.52) and (535.52,70) .. (530,70) – (30,70) .. controls (24.48,70) and (20,65.52) .. (20,60) – cycle ; \draw[color=rgb, 255:red, 0; green, 0; blue, 0 ,draw opacity=1 ][fill=rgb, 255:red, 255; green, 255; blue, 255 ,fill opacity=1 ] (530,14) .. controls (530,11.79) and (531.79,10) .. (534,10) – (546,10) .. controls (548.21,10) and (550,11.79) .. (550,14) – (550,26) .. controls (550,28.21) and (548.21,30) .. (546,30) – (534,30) .. controls (531.79,30) and (530,28.21) .. (530,26) – cycle ; \draw[draw opacity=0][fill=rgb, 255:red, 93; green, 95; blue, 113 ,fill opacity=1 ] (20,90) – (540,90) – (540,110) – (20,110) – cycle ; \draw[dash pattern=on 0.84pt off 2.51pt] (280,120) – (280,390) ;

\draw

[fill=rgb, 255:red, 158; green, 188; blue, 158 ,fill opacity=1 ] (20,154) .. controls (20,140.75) and (30.75,130) .. (44,130) – (116,130) .. controls (129.25,130) and (140,140.75) .. (140,154) – (140,246) .. controls (140,259.25) and (129.25,270) .. (116,270) – (44,270) .. controls (30.75,270) and (20,259.25) .. (20,246) – cycle ; \draw[fill=rgb, 255:red, 158; green, 188; blue, 158 ,fill opacity=1 ] (150,154) .. controls (150,140.75) and (160.75,130) .. (174,130) – (246,130) .. controls (259.25,130) and (270,140.75) .. (270,154) – (270,246) .. controls (270,259.25) and (259.25,270) .. (246,270) – (174,270) .. controls (160.75,270) and (150,259.25) .. (150,246) – cycle ; \draw[draw opacity=0][fill=rgb, 255:red, 93; green, 95; blue, 113 ,fill opacity=1 ] (20,290) – (270,290) – (270,310) – (20,310) – cycle ; \draw[draw opacity=0][fill=rgb, 255:red, 93; green, 95; blue, 113 ,fill opacity=1 ] (20,370) – (270,370) – (270,390) – (20,390) – cycle ; \draw[draw opacity=0][fill=rgb, 255:red, 93; green, 95; blue, 113 ,fill opacity=1 ] (290,190) – (540,190) – (540,210) – (290,210) – cycle ; \draw[fill=rgb, 255:red, 184; green, 216; blue, 216 ,fill opacity=1 ] (20,328) .. controls (20,323.58) and (23.58,320) .. (28,320) – (72,320) .. controls (76.42,320) and (80,323.58) .. (80,328) – (80,352) .. controls (80,356.42) and (76.42,360) .. (72,360) – (28,360) .. controls (23.58,360) and (20,356.42) .. (20,352) – cycle ; \draw[fill=rgb, 255:red, 184; green, 216; blue, 216 ,fill opacity=1 ] (90,328) .. controls (90,323.58) and (93.58,320) .. (98,320) – (142,320) .. controls (146.42,320) and (150,323.58) .. (150,328) – (150,352) .. controls (150,356.42) and (146.42,360) .. (142,360) – (98,360) .. controls (93.58,360) and (90,356.42) .. (90,352) – cycle ; \draw[fill=rgb, 255:red, 184; green, 216; blue, 216 ,fill opacity=1 ] (160,328) .. controls (160,323.58) and (163.58,320) .. (168,320) – (212,320) .. controls (216.42,320) and (220,323.58) .. (220,328) – (220,352) .. controls (220,356.42) and (216.42,360) .. (212,360) – (168,360) .. controls (163.58,360) and (160,356.42) .. (160,352) – cycle ; \draw[fill=rgb, 255:red, 184; green, 216; blue, 216 ,fill opacity=1 ] (230,328) .. controls (230,323.58) and (233.58,320) .. (238,320) – (262,320) .. controls (266.42,320) and (270,323.58) .. (270,328) – (270,352) .. controls (270,356.42) and (266.42,360) .. (262,360) – (238,360) .. controls (233.58,360) and (230,356.42) .. (230,352) – cycle ; \draw[fill=rgb, 255:red, 204; green, 204; blue, 204 ,fill opacity=1 ] (20,418) .. controls (20,408.06) and (28.06,400) .. (38,400) – (522,400) .. controls (531.94,400) and (540,408.06) .. (540,418) – (540,472) .. controls (540,481.94) and (531.94,490) .. (522,490) – (38,490) .. controls (28.06,490) and (20,481.94) .. (20,472) – cycle ; \draw[fill=rgb, 255:red, 158; green, 188; blue, 158 ,fill opacity=1 ] (290,138) .. controls (290,133.58) and (293.58,130) .. (298,130) – (532,130) .. controls (536.42,130) and (540,133.58) .. (540,138) – (540,162) .. controls (540,166.42) and (536.42,170) .. (532,170) – (298,170) .. controls (293.58,170) and (290,166.42) .. (290,162) – cycle ; \draw[fill=rgb, 255:red, 191; green, 139; blue, 133 ,fill opacity=1 ] (290,238) .. controls (290,233.58) and (293.58,230) .. (298,230) – (402,230) .. controls (406.42,230) and (410,233.58) .. (410,238) – (410,262) .. controls (410,266.42) and (406.42,270) .. (402,270) – (298,270) .. controls (293.58,270) and (290,266.42) .. (290,262) – cycle ; \draw[color=rgb, 255:red, 0; green, 0; blue, 0 ,draw opacity=1 ][fill=rgb, 255:red, 191; green, 139; blue, 133 ,fill opacity=1 ] (420,238) .. controls (420,233.58) and (423.58,230) .. (428,230) – (532,230) .. controls (536.42,230) and (540,233.58) .. (540,238) – (540,262) .. controls (540,266.42) and (536.42,270) .. (532,270) – (428,270) .. controls (423.58,270) and (420,266.42) .. (420,262) – cycle ; \draw[draw opacity=0][fill=rgb, 255:red, 93; green, 95; blue, 113 ,fill opacity=1 ] (290,290) – (540,290) – (540,310) – (290,310) – cycle ; \draw[fill=rgb, 255:red, 184; green, 216; blue, 216 ,fill opacity=1 ] (290,328) .. controls (290,323.58) and (293.58,320) .. (298,320) – (342,320) .. controls (346.42,320) and (350,323.58) .. (350,328) – (350,352) .. controls (350,356.42) and (346.42,360) .. (342,360) – (298,360) .. controls (293.58,360) and (290,356.42) .. (290,352) – cycle ; \draw[fill=rgb, 255:red, 184; green, 216; blue, 216 ,fill opacity=1 ] (360,328) .. controls (360,323.58) and (363.58,320) .. (368,320) – (412,320) .. controls (416.42,320) and (420,323.58) .. (420,328) – (420,352) .. controls (420,356.42) and (416.42,360) .. (412,360) – (368,360) .. controls (363.58,360) and (360,356.42) .. (360,352) – cycle ; \draw[fill=rgb, 255:red, 184; green, 216; blue, 216 ,fill opacity=1 ] (430,328) .. controls (430,323.58) and (433.58,320) .. (438,320) – (482,320) .. controls (486.42,320) and (490,323.58) .. (490,328) – (490,352) .. controls (490,356.42) and (486.42,360) .. (482,360) – (438,360) .. controls (433.58,360) and (430,356.42) .. (430,352) – cycle ; \draw[fill=rgb, 255:red, 184; green, 216; blue, 216 ,fill opacity=1 ] (500,328) .. controls (500,323.58) and (503.58,320) .. (508,320) – (532,320) .. controls (536.42,320) and (540,323.58) .. (540,328) – (540,352) .. controls (540,356.42) and (536.42,360) .. (532,360) – (508,360) .. controls (503.58,360) and (500,356.42) .. (500,352) – cycle ; \draw[draw opacity=0][fill=rgb, 255:red, 93; green, 95; blue, 113 ,fill opacity=1 ] (290,370) – (540,370) – (540,390) – (290,390) – cycle ; \draw[fill=rgb, 255:red, 255; green, 255; blue, 255 ,fill opacity=1 ] (30,424) .. controls (30,416.27) and (36.27,410) .. (44,410) – (256,410) .. controls (263.73,410) and (270,416.27) .. (270,424) – (270,466) .. controls (270,473.73) and (263.73,480) .. (256,480) – (44,480) .. controls (36.27,480) and (30,473.73) .. (30,466) – cycle ; \draw[fill=rgb, 255:red, 255; green, 255; blue, 255 ,fill opacity=1 ] (290,424) .. controls (290,416.27) and (296.27,410) .. (304,410) – (516,410) .. controls (523.73,410) and (530,416.27) .. (530,424) – (530,466) .. controls (530,473.73) and (523.73,480) .. (516,480) – (304,480) .. controls (296.27,480) and (290,473.73) .. (290,466) – cycle ; \draw(540,20) node \includegraphics[width=15pt,height=15pt]python_logo.png; \draw[color=rgb, 255:red, 0; green, 0; blue, 0 ,draw opacity=1 ][fill=rgb, 255:red, 255; green, 255; blue, 255 ,fill opacity=1 ] (530,184) .. controls (530,181.79) and (531.79,180) .. (534,180) – (546,180) .. controls (548.21,180) and (550,181.79) .. (550,184) – (550,196) .. controls (550,198.21) and (548.21,200) .. (546,200) – (534,200) .. controls (531.79,200) and (530,198.21) .. (530,196) – cycle ; \draw(540,190) node \includegraphics[width=15pt,height=15pt]cpp.png; \draw[color=rgb, 255:red, 0; green, 0; blue, 0 ,draw opacity=1 ][fill=rgb, 255:red, 255; green, 255; blue, 255 ,fill opacity=1 ] (530,84) .. controls (530,81.79) and (531.79,80) .. (534,80) – (546,80) .. controls (548.21,80) and (550,81.79) .. (550,84) – (550,96) .. controls (550,98.21) and (548.21,100) .. (546,100) – (534,100) .. controls (531.79,100) and (530,98.21) .. (530,96) – cycle ; \draw(540,90) node \includegraphics[width=15pt,height=15pt]python_logo.png; \draw[color=rgb, 255:red, 0; green, 0; blue, 0 ,draw opacity=1 ][fill=rgb, 255:red, 255; green, 255; blue, 255 ,fill opacity=1 ] (530,124) .. controls (530,121.79) and (531.79,120) .. (534,120) – (546,120) .. controls (548.21,120) and (550,121.79) .. (550,124) – (550,136) .. controls (550,138.21) and (548.21,140) .. (546,140) – (534,140) .. controls (531.79,140) and (530,138.21) .. (530,136) – cycle ; \draw(540,130) node \includegraphics[width=15pt,height=15pt]python_logo.png; \draw[color=rgb, 255:red, 0; green, 0; blue, 0 ,draw opacity=1 ][fill=rgb, 255:red, 255; green, 255; blue, 255 ,fill opacity=1 ] (250,124) .. controls (250,121.79) and (251.79,120) .. (254,120) – (266,120) .. controls (268.21,120) and (270,121.79) .. (270,124) – (270,136) .. controls (270,138.21) and (268.21,140) .. (266,140) – (254,140) .. controls (251.79,140) and (250,138.21) .. (250,136) – cycle ; \draw(260,130) node \includegraphics[width=15pt,height=15pt]python_logo.png; \draw[color=rgb, 255:red, 0; green, 0; blue, 0 ,draw opacity=1 ][fill=rgb, 255:red, 255; green, 255; blue, 255 ,fill opacity=1 ] (120,124) .. controls (120,121.79) and (121.79,120) .. (124,120) – (136,120) .. controls (138.21,120) and (140,121.79) .. (140,124) – (140,136) .. controls (140,138.21) and (138.21,140) .. (136,140) – (124,140) .. controls (121.79,140) and (120,138.21) .. (120,136) – cycle ; \draw(130,130) node \includegraphics[width=15pt,height=15pt]python_logo.png; \draw(452,445) node \includegraphics[width=37.5pt,height=37.5pt]icub.png; \draw(180,445) node \includegraphics[width=75pt,height=52.5pt]icub_ignition.png;

\draw

(280,45) node [align=left] Agent; \draw(280,100) node [color=rgb, 255:red, 222; green, 222; blue, 222 ,opacity=1 ] [align=left] Gym.Env; \draw(80,200) node [align=left] GazeboEnv; \draw(210,200) node [align=left] RTEnv; \draw(145,300) node [scale=1,color=rgb, 255:red, 222; green, 222; blue, 222 ,opacity=1 ] [align=left] gym_ignition.Task; \draw(145,380) node [scale=1,color=rgb, 255:red, 222; green, 222; blue, 222 ,opacity=1 ] [align=left] gym_ignition.Robot; \draw(50,340) node [align=left] Task #1; \draw(120,340) node [align=left] Task #2; \draw(190,340) node [align=left] Task #3; \draw(250,340) node [align=left] …; \draw(415,150) node [align=left] GymppEnv; \draw(415,200) node [color=rgb, 255:red, 222; green, 222; blue, 222 ,opacity=1 ] [align=left] gympp::Environment; \draw(350,250) node [scale=0.7] [align=left] IgnitionEnvironment; \draw(480,250) node [scale=0.7,color=rgb, 255:red, 0; green, 0; blue, 0 ,opacity=1 ] [align=left] RealTimeEnvironment; \draw(415,300) node [color=rgb, 255:red, 222; green, 222; blue, 222 ,opacity=1 ] [align=left] gympp::Task; \draw(320,340) node [align=left] Task #1; \draw(390,340) node [align=left] Task #2; \draw(520,340) node [align=left] …; \draw(460,340) node [align=left] Task #3; \draw(415,380) node [color=rgb, 255:red, 222; green, 222; blue, 222 ,opacity=1 ] [align=left] gympp::Robot; \draw(85,445) node [align=left] Gazebo; \draw(373,445) node [align=left] Real Robot;

Figure 1: Gym-Ignition software architecture. The two columns show the Python and C++ components and, following a top-down view, the language logos mark the separation between the two domains. The grey rectangles show the abstraction layers of the software architecture. Tasks can be created in both Python and C++, and thanks to the Robot abstraction, they can be executed both on simulated and real robotic platforms.
Table 1: Comparison of frameworks that provide robotic environments compatible with OpenAI Gym.

Software Multiple Physics Engines Photorealistic Rendering Accelerated Embedded Simulator Parallel Real-Time Compatible Modular Open Source OpenAI Robotic Environments Gym-Gazebo2 openai ros Bullet3 Environments Nvidia ISAAC*\text ? ? Unity ML-Agents PyBullet-Gymperium Gibson ? RaiSim Gym-Ignition

3 Architecture

In this section we describe the software architecture of Gym-Ignition. Our aim is to fully implement all the properties identified above. Few of them are currently only partially supported since the are dependent on the development status of Ignition Gazebo.

The architecture of Gym-Ignition permits to train the same Python agent on both the simulated and the real robot, without the need of changing the logic of the learned task. This transparency is achieved mainly exploiting three levels of abstraction, illustrated in Figure 1:

  • Environment: The environment is the interface exposed to the agent. The agent can set an action, and gather the observation and the associated scalar reward. Actions and observations are samples belonging to a specific space.

  • Task: The task is the interface that defines the logic about how to process the action received from the agent, and to create the observation sample. It also calculates the reward and evaluates if the simulation reached its terminal state. In learning problems where the state is only partially observable, the task has typically access to the complete state, exposing to the agent only what necessary.

  • Robot: The robot abstraction allows reading data from a robot and setting the references to be actuated. This unified interface permits to develop tasks agnostic from the robot. In fact, from the point of view of the task, the same method can gather data from either the simulated or real robot, depending on the active implementation. It also allows interfacing transparently with robots that use different middlewares. The implemented functionality include joints, links, sensors, contacts, and base information. Since a unified interface will be too big to maintain, we implemented a composable feature mechanism such that tasks can only require a subset of them.

Taking a common toy problem in reinforcement learning, the robot can be a cartpole robot (either simulated or real), tasks can be either pole balancing or swing-up, both accepting either discrete or continuous forces applied to the cart.

Gym-Ignition is a project that aims to connect reinforcement learning libraries containing algorithms to common tools used in robotic research and industry. These two domains are historically grounded to Python and C++ languages, respectively. Gym-Ignition allows implementing environments in both languages through the following three main components:

  • gympp: A C++ port of Python OpenAI Gym that provides the same environment and spaces apis (e.g. gympp::Environment, gympp::spaces::Box). It also contains gympp::Robot and gympp::Task, the interfaces that abstract the robot and the task respectively.

  • gympp-gazebo: A C++ library containing the gympp::gazebo::GazeboWrapper and the gympp::gazebo::IgnitionEnvironment classes. The former is a class that wraps Ignition Gazebo, simplifies its configuration, and the allows stepping the physics. The latter is an implementation of the gympp::Environment interface to create simulated environments in pure C++.

  • gym-ignition: The main Python package containing the resources necessary to create environments compatible with OpenAI Gym. It includes the Python version of the robot and task interfaces, the classes that interface with Ignition Gazebo and the other supported physics engines, few implementations of the robot interface. It also contains some examples of tasks and robot models.

This architecture allows taking the best from the two domains: Python and C++. In fact, from one hand, the majority of the open source machine learning frameworks and the libraries of reinforcement learning algorithms typically target Python. They are mature and well documented, with an active community behind their development. On the other hand, instead, C++ is the main language of robotics. The majority of the robotic middlewares is implemented in C++, and most of the robotic laboratories have their entire infrastructure implemented in this language. The possibility to interact and natively interface with such infrastructure might be helpful for accelerating system integration and real-time applicability.

Environment developers can choose either Python or C++. C++ environments are then loaded into Python, wrapped by the OpenAI Gym interface, and registered in the environment factory. To expose C++ code to Python, such as the environments implemented as gympp::Environment and the gympp::gazebo::GazeboWrapper, we use SWIG [26].

Most of the alternative solutions to create robotic environments involve network communication to the simulator. This approach has both benefits and limitations. The network acts as an abstraction layer, and Python code does not need to be directly interfaced with low-level code. However, scaling up simulations with distributed and parallel environments might require complex network segmentation. Ignition Gazebo can be used as a library, and hence integrated into the same software component of the agent. As a consequence, obtaining a reproducible simulation becomes considerably easier.

The tasks developed with our abstractions layers, as explained above, are independent from the robot on which they are operating. The environment developers don’t need to mind any simulator-specific operation. In fact, a new environment can be created only by implementing the task interface, that provides only five methods: set_action, get_observation, get_reward, reset, and is_done. In full compliance with the OpenAI Gym approach, the task class is then wrapped by the selected runtime, which can be the simulated or real-time execution. The runtime is the real class that implements the gym.Env interface and it is the object returned to the user through the environment factory. In the simulated case, the runtime is the only class that contains the logic about the simulator. This separation allows to apply the same class on any task, and hence, on any environment. In the real-time case, instead, the runtime enforces the Python code to run at the configured rate.

Currently, Gym-Ignition supports the following three runtimes: GazeboEnv to step the task using Ignition Gazebo simulator, RTEnv to step the task for real time robots, and PyBulletEnv to step the task with PyBullet. Waiting the implementation of the Bullet3 physics engine inside Ignition Gazebo, we developed the integration with PyBullet for benchmarking purpose.

As a final note, despite Gym-Ignition is mainly centered around the Ignition Gazebo simulator and promotes its usage, it is not necessarily locked to its usage. In fact, similarly to the PyBullet implementation, integrating a new physics engine passes through the implementation of just the corresponding runtime. The modular software architecture ensures that all the existing tasks (all the environments) stay compatible and can be executed in all the supported runtimes. This opens up the possibility to extend in the future the offer also to those physics engines that will never be implemented inside Ignition Gazebo.

4 Limitations, Future Work, and Conclusions

This work presents Gym-Ignition, a novel framework to create robotic environments for reinforcement learning applications. The main aim of the project is to narrow the gap between reinforcement learning and robotic research, allowing roboticists to create environments with familiar tools. We believe that the quality and difficulty of the environments (i.e. the problems to solve) provided to the community are related to the scientific advances of this domain. We would also like to push the research outside the simulation realm, step extremely delicate in the field of robotics. We hope that Gym-Ignition can motivate researchers on these two important directions.

Gym-Ignition enables the development of robotic environments with great flexibility. Environments can be created either in C++ or Python languages and they can target either simulated or real robotic platforms. In simulation, thanks to Ignition Gazebo, physics engines can be switched on-the-fly, and the modular nature of the simulator ensures fully reproducible results. The framework supports most of the simulation properties that enable to scale up, such as accelerated, parallel, and distributed execution.

The limitations of the project are still relevant. Both Ignition Gazebo and Gym-Ignition didn’t reach yet a mature nor stable development status. Both projects are still in preview stage, and the installation procedure can be simplified. Currently, only the DART [27] physics engine is officially supported in Ignition Gazebo, but the support of Bullet3 is almost finalized. Waiting the integration inside Ignition Gazebo, we implemented Bullet3 in Gym-Ignition using the PyBullet bindings. From the perception point of view, Ignition Gazebo already supports most of the common sensors typically mounted on robots, such as IMUs, lidars, and cameras. Few more sensors like force-torque will be implemented in future releases. For what concern rendering capabilities, Ignition Gazebo fully supports OGRE88 8 http://www.ogre3d.org. The implementation of Nvidia OptiX Engine [28], a photorealistic rendering engine, is only partial.

The ongoing and future activities on Gym-Ignition will introduce generic low-level controllers that can be used on all the supported physics engines. We also plan to expose to Python the physics engine parameters in order to simplify the implementation of domain randomization. Given the early development status, the documentation of many components is either missing or improvable. Last but not least, we want to explore the integration with Ignition Fuel99 9 https://app.ignitionrobotics.org, the new database maintained by Open Robotics containing worlds ready to be used and 3D models of common objects and robots. This integration can ease the creation of unstructured scenarios where robots operate and interact.

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731540 (An.Dy).
The content of this publication is the sole responsibility of the authors. The European Commission or its services cannot be held responsible for any use that may be made of the information it contains.

References

  • [1] Bellemare MG, Naddaf Y, Veness J, Bowling M. The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research. 2013 Jun;47:253–279. Available from: https://www.jair.org/index.php/jair/article/view/10819.
  • [2] Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, et al. OpenAI Gym. arXiv:160601540 [cs]. 2016 Jun;ArXiv: 1606.01540. Available from: http://arxiv.org/abs/1606.01540.
  • [3] Beattie C, Leibo JZ, Teplyashin D, Ward T, Wainwright M, Küttler H, et al. DeepMind Lab. arXiv:161203801 [cs]. 2016 Dec;ArXiv: 1612.03801. Available from: http://arxiv.org/abs/1612.03801.
  • [4] Tassa Y, Doron Y, Muldal A, Erez T, Li Y, Casas DdL, et al. DeepMind Control Suite. arXiv:180100690 [cs]. 2018 Jan;ArXiv: 1801.00690. Available from: http://arxiv.org/abs/1801.00690.
  • [5] Dulac-Arnold G, Mankowitz D, Hester T. Challenges of Real-World Reinforcement Learning. arXiv:190412901 [cs, stat]. 2019 Apr;ArXiv: 1904.12901. Available from: http://arxiv.org/abs/1904.12901.
  • [6] Christiano P, Shah Z, Mordatch I, Schneider J, Blackwell T, Tobin J, et al. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model. arXiv:161003518 [cs]. 2016 Oct;ArXiv: 1610.03518. Available from: http://arxiv.org/abs/1610.03518.
  • [7] Muratore F, Gienger M, Peters J. Assessing Transferability from Simulation to Reality for Reinforcement Learning. arXiv:190704685 [cs]. 2019 Jul;ArXiv: 1907.04685. Available from: http://arxiv.org/abs/1907.04685.
  • [8] Tan J, Zhang T, Coumans E, Iscen A, Bai Y, Hafner D, et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots. arXiv:180410332 [cs]. 2018 Apr;ArXiv: 1804.10332. Available from: http://arxiv.org/abs/1804.10332.
  • [9] Chebotar Y, Handa A, Makoviychuk V, Macklin M, Issac J, Ratliff N, et al. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience. arXiv:181005687 [cs]. 2018 Oct;ArXiv: 1810.05687. Available from: http://arxiv.org/abs/1810.05687.
  • [10] Hwangbo J, Lee J, Dosovitskiy A, Bellicoso D, Tsounis V, Koltun V, et al. Learning agile and dynamic motor skills for legged robots. Science Robotics. 2019 Jan;4(26):eaau5872. Available from: http://robotics.sciencemag.org/lookup/doi/10.1126/scirobotics.aau5872.
  • [11] Peng XB, Andrychowicz M, Zaremba W, Abbeel P. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. 2018 IEEE International Conference on Robotics and Automation (ICRA). 2018 May;p. 1–8. ArXiv: 1710.06537. Available from: http://arxiv.org/abs/1710.06537.
  • [12] OpenAI, Andrychowicz M, Baker B, Chociej M, Jozefowicz R, McGrew B, et al. Learning Dexterous In-Hand Manipulation. arXiv:180800177 [cs, stat]. 2018 Aug;ArXiv: 1808.00177. Available from: http://arxiv.org/abs/1808.00177.
  • [13] Li T, Rai A, Geyer H, Atkeson CG. Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped. arXiv:180910811 [cs]. 2018 Sep;ArXiv: 1809.10811. Available from: http://arxiv.org/abs/1809.10811.
  • [14] Xie Z, Clary P, Dao J, Morais P, Hurst J, van de Panne M. Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie. arXiv:190309537 [cs]. 2019 Mar;ArXiv: 1903.09537. Available from: http://arxiv.org/abs/1903.09537.
  • [15] Jain D, Iscen A, Caluwaerts K. Hierarchical Reinforcement Learning for Quadruped Locomotion. arXiv:190508926 [cs]. 2019 May;ArXiv: 1905.08926. Available from: http://arxiv.org/abs/1905.08926.
  • [16] Koenig N, Howard A. Design and use paradigms for gazebo, an open-source multi-robot simulator; 2004. .
  • [17] Ivaldi S, Peters J, Padois V, Nori F. Tools for simulating humanoid robot dynamics: A survey based on user feedback. In: 2014 IEEE-RAS International Conference on Humanoid Robots; 2014. p. 842–849.
  • [18] Erez T, Tassa Y, Todorov E. Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX. In: 2015 IEEE International Conference on Robotics and Automation (ICRA); 2015. p. 4397–4404.
  • [19] Juliani A, Berges VP, Vckay E, Gao Y, Henry H, Mattar M, et al. Unity: A General Platform for Intelligent Agents. arXiv:180902627 [cs, stat]. 2018 Sep;ArXiv: 1809.02627. Available from: http://arxiv.org/abs/1809.02627.
  • [20] Krammer M, Schuch K, Kater C, Alekeish K, Blochwitz T, Materne S, et al. Standardized Integration of Real-Time and Non-Real-Time Systems: The Distributed Co-Simulation Protocol; 2019. p. 87–96. Available from: http://www.ep.liu.se/ecp/article.asp?issue=157%26article=9.
  • [21] Ramos F, Possas RC, Fox D. BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators. arXiv:190601728 [cs]. 2019 Jun;ArXiv: 1906.01728. Available from: http://arxiv.org/abs/1906.01728.
  • [22] Coumans E, Bai Y. Pybullet, a python module for physics simulation in robotics, games and machine learning; 2016.
  • [23] Lopez NG, Nuin YLE, Moral EB, Juan LUS, Rueda AS, Vilches VM, et al. gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo. arXiv:190306278 [cs]. 2019 Mar;ArXiv: 1903.06278. Available from: http://arxiv.org/abs/1903.06278.
  • [24] Xia F, Zamir AR, He Z, Sax A, Malik J, Savarese S. Gibson Env: Real-World Perception for Embodied Agents. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT: IEEE; 2018. p. 9068–9079. Available from: https://ieeexplore.ieee.org/document/8579043/.
  • [25] Hwangbo J, Lee J, Hutter M. Per-Contact Iteration Method for Solving Contact Dynamics. IEEE Robotics and Automation Letters. 2018 Apr;3(2):895–902.
  • [26] Beazley DM. SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++. In: Tcl/Tk Workshop; 1996. p. 43.
  • [27] Lee J, X Grey M, Ha S, Kunz T, Jain S, Ye Y, et al. DART: Dynamic Animation and Robotics Toolkit. The Journal of Open Source Software. 2018 Feb;3(22):500. Available from: http://joss.theoj.org/papers/10.21105/joss.00500.
  • [28] Parker SG, Bigler J, Dietrich A, Friedrich H, Hoberock J, Luebke D, et al. OptiX: A General Purpose Ray Tracing Engine. In: ACM SIGGRAPH 2010 Papers. SIGGRAPH ’10. New York, NY, USA: ACM; 2010. p. 66:1–66:13. Event-place: Los Angeles, California. Available from: http://doi.acm.org/10.1145/1833349.1778803.