Deep Ordinal Reinforcement Learning

Abstract

Reinforcement learning usually makes use of numerical rewards, which havenice properties but also come with drawbacks and difficulties. Using rewards onan ordinal scale (ordinal rewards) is an alternative to numerical rewards thathas received more attention in recent years. In this paper, a general approachto adapting reinforcement learning problems to the use of ordinal rewards ispresented and motivated. We show how to convert common reinforcement learningalgorithms to an ordinal variation by the example of Q-learning and introduceOrdinal Deep Q-Networks, which adapt deep reinforcement learning to ordinalrewards. Additionally, we run evaluations on problems provided by the OpenAIGym framework, showing that our ordinal variants exhibit a performance that iscomparable to the numerical variations for a number of problems. We also givefirst evidence that our ordinal variant is able to produce better results forproblems with less engineered and simpler-to-design reward signals.

Quick Read (beta)

loading the full paper ...