Reinforcement learning has driven impressive advances in machine learning.Simultaneously, quantum-enhanced machine learning algorithms using quantumannealing underlie heavy developments. Recently, a multi-agent reinforcementlearning (MARL) architecture combining both paradigms has been proposed. Thisnovel algorithm, which utilizes Quantum Boltzmann Machines (QBMs) for Q-valueapproximation has outperformed regular deep reinforcement learning in terms oftime-steps needed to converge. However, this algorithm was restricted tosingle-agent and small 2x2 multi-agent grid domains. In this work, we proposean extension to the original concept in order to solve more challengingproblems. Similar to classic DQNs, we add an experience replay buffer and usedifferent networks for approximating the target and policy values. Theexperimental results show that learning becomes more stable and enables agentsto find optimal policies in grid-domains with higher complexity. Additionally,we assess how parameter sharing influences the agents behavior in multi-agentdomains. Quantum sampling proves to be a promising method for reinforcementlearning tasks, but is currently limited by the QPU size and therefore by thesize of the input and Boltzmann machine.