In this paper, we consider the state estimation problem for nonlinearstochastic discrete-time systems. We combine Lyapunov's method in controltheory and deep reinforcement learning to design the state estimator. Wetheoretically prove the convergence of the bounded estimate error solely usingthe data simulated from the model. An actor-critic reinforcement learningalgorithm is proposed to learn the state estimator approximated by a deepneural network. The convergence of the algorithm is analysed. The proposedLyapunov-based reinforcement learning state estimator is compared with a numberof existing nonlinear filtering methods through Monte Carlo simulations,showing its advantage in terms of estimate convergence even under some systemuncertainties such as covariance shift in system noise and randomly missingmeasurements. To the best of our knowledge, this is the first reinforcementlearning based nonlinear state estimator with bounded estimate errorperformance guarantee.