Scalable Submodular Policy Optimization via Pruned Submodularity Graph

Abstract

In Reinforcement Learning (abbreviated as RL), an agent interacts with theenvironment via a set of possible actions, and a reward is generated from someunknown distribution. The task here is to find an optimal set of actions suchthat the reward after a certain time step gets maximized. In a traditionalsetup, the reward function in an RL Problem is considered additive. However, inreality, there exist many problems, including path planning, coverage control,etc., the reward function follows the diminishing return, which can be modeledas a submodular function. In this paper, we study a variant of the RL Problemwhere the reward function is submodular, and our objective is to find anoptimal policy such that this reward function gets maximized. We have proposeda pruned submodularity graph-based approach that provides a provablyapproximate solution in a feasible computation time. The proposed approach hasbeen analyzed to understand its time and space requirements as well as aperformance guarantee. We have experimented with a benchmark agent-environmentsetup, which has been used for similar previous studies, and the results arereported. From the results, we observe that the policy obtained by our proposedapproach leads to more reward than the baseline methods.

Quick Read (beta)

loading the full paper ...