Abstract
We study the adaption of soft actor-critic (SAC) from continuous action spaceto discrete action space. We revisit vanilla SAC and provide an in-depthunderstanding of its Q value underestimation and performance instability issueswhen applied to discrete settings. We thereby propose entropy-penalty anddouble average Q-learning with Q-clip to address these issues. Extensiveexperiments on typical benchmarks with discrete action space, including Atarigames and a large-scale MOBA game, show the efficacy of our proposed method.Our code is at:https://github.com/coldsummerday/Revisiting-Discrete-SAC.
Quick Read (beta)
loading the full paper ...