Inspired by the developments in deep generative models, we propose amodel-based RL approach, coined Reinforced Deep Markov Model (RDMM), designedto integrate desirable properties of a reinforcement learning algorithm actingas an automatic trading system. The network architecture allows for thepossibility that market dynamics are partially visible and are potentiallymodified by the agent's actions. The RDMM filters incomplete and noisy data, tocreate better-behaved input data for RL planning. The policy searchoptimisation also properly accounts for state uncertainty. Due to thecomplexity of the RKDF model architecture, we performed ablation studies tounderstand the contributions of individual components of the approach better.To test the financial performance of the RDMM we implement policies usingvariants of Q-Learning, DynaQ-ARIMA and DynaQ-LSTM algorithms. The experimentsshow that the RDMM is data-efficient and provides financial gains compared tothe benchmarks in the optimal execution problem. The performance improvementbecomes more pronounced when price dynamics are more complex, and this has beendemonstrated using real data sets from the limit order book of Facebook, Intel,Vodafone and Microsoft.