Abstract
Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge ininterest, fueled by the empirical success achieved in applications ofsingle-agent reinforcement learning (RL). In this study, we consider adistributed Q-learning scenario, wherein a number of agents cooperatively solvea sequential decision making problem without access to the central rewardfunction which is an average of the local rewards. In particular, we studyfinite-time analysis of a distributed Q-learning algorithm, and provide a newsample complexity result of $\tilde{\mathcal{O}}\left(\min\left\{\frac{1}{\epsilon^2}\frac{t_{\text{mix}}}{(1-\gamma)^6 d_{\min}^4 },\frac{1}{\epsilon}\frac{\sqrt{|\gS||\gA|}}{(1-\sigma_2(\boldsymbol{W}))(1-\gamma)^4d_{\min}^3} \right\}\right)$ under tabular lookup