### Abstract

This paper addresses the average cost minimization problem for discrete-timesystems with multiplicative and additive noises via reinforcement learning. Byusing Q-function, we propose an online learning scheme to estimate the kernelmatrix of Q-function and to update the control gain using the data along thesystem trajectories. The obtained control gain and kernel matrix are proved toconverge to the optimal ones. To implement the proposed learning scheme, anonline model-free reinforcement learning algorithm is given, where recursiveleast squares method is used to estimate the kernel matrix of Q-function. Anumerical example is presented to illustrate the proposed approach.

