There are great interests as well as many challenges in applyingreinforcement learning (RL) to recommendation systems. In this setting, anonline user is the environment; neither the reward function nor the environmentdynamics are clearly defined, making the application of RL challenging. In thispaper, we propose a novel model-based reinforcement learning framework forrecommendation systems, where we develop a generative adversarial network toimitate user behavior dynamics and learn her reward function. Using this usermodel as the simulation environment, we develop a novel Cascading DQN algorithmto obtain a combinatorial recommendation policy which can handle a large numberof candidate items efficiently. In our experiments with real data, we show thisgenerative adversarial user model can better explain user behavior thanalternatives, and the RL policy based on this model can lead to a betterlong-term reward for the user and higher click rate for the system.