Symbolic Network: Generalized Neural Policies for Relational MDPs

Abstract

A Relational Markov Decision Process (RMDP) is a first-order representationto express all instances of a single probabilistic planning domain withpossibly unbounded number of objects. Early work in RMDPs outputs generalized(instance-independent) first-order policies or value functions as a means tosolve all instances of a domain at once. Unfortunately, this line of work metwith limited success due to inherent limitations of the representation spaceused in such policies or value functions. Can neural models provide the missinglink by easily representing more complex generalized policies, thus making themeffective on all instances of a given domain? We present SymNet, the first neural approach for solving RMDPs that areexpressed in the probabilistic planning language of RDDL. SymNet trains a setof shared parameters for an RDDL domain using training instances from thatdomain. For each instance, SymNet first converts it to an instance graph andthen uses relational neural models to compute node embeddings. It then scoreseach ground action as a function over the first-order action symbols and nodeembeddings related to the action. Given a new test instance from the samedomain, SymNet architecture with pre-trained parameters scores each groundaction and chooses the best action. This can be accomplished in a singleforward pass without any retraining on the test instance, thus implicitlyrepresenting a neural generalized policy for the whole domain. Our experimentson nine RDDL domains from IPPC demonstrate that SymNet policies aresignificantly better than random and sometimes even more effective thantraining a state-of-the-art deep reactive policy from scratch.

Quick Read (beta)

loading the full paper ...