### Abstract

Value estimation is one key problem in Reinforcement Learning. Albeit manysuccesses have been achieved by Deep Reinforcement Learning (DRL) in differentfields, the underlying structure and learning dynamics of value function,especially with complex function approximation, are not fully understood. Inthis paper, we report that decreasing rank of $Q$-matrix widely exists duringlearning process across a series of continuous control tasks for differentpopular algorithms. We hypothesize that the low-rank phenomenon indicates thecommon learning dynamics of $Q$-matrix from stochastic high dimensional spaceto smooth low dimensional space. Moreover, we reveal a positive correlationbetween value matrix rank and value estimation uncertainty. Inspired by aboveevidence, we propose a novel Uncertainty-Aware Low-rank Q-matrix Estimation(UA-LQE) algorithm as a general framework to facilitate the learning of valuefunction. Through quantifying the uncertainty of state-action value estimation,we selectively erase the entries of highly uncertain values in state-actionvalue matrix and conduct low-rank matrix reconstruction for them to recovertheir values. Such a reconstruction exploits the underlying structure of valuematrix to improve the value approximation, thus leading to a more efficientlearning process of value function. In the experiments, we evaluate theefficacy of UA-LQE in several representative OpenAI MuJoCo continuous controltasks.