Reinforcement Learning in large action spaces is a challenging problem.Cooperative multi-agent reinforcement learning (MARL) exacerbates matters byimposing various constraints on communication and observability. In this work,we consider the fundamental hurdle affecting both value-based andpolicy-gradient approaches: an exponential blowup of the action space with thenumber of agents. For value-based methods, it poses challenges in accuratelyrepresenting the optimal value function. For policy gradient methods, it makestraining the critic difficult and exacerbates the problem of the laggingcritic. We show that from a learning theory perspective, both problems can beaddressed by accurately representing the associated action-value function witha low-complexity hypothesis class. This requires accurately modelling the agentinteractions in a sample efficient way. To this end, we propose a noveltensorised formulation of the Bellman equation. This gives rise to our methodTesseract, which views the Q-function as a tensor whose modes correspond to theaction spaces of different agents. Algorithms derived from Tesseract decomposethe Q-tensor across agents and utilise low-rank tensor approximations to modelagent interactions relevant to the task. We provide PAC analysis forTesseract-based algorithms and highlight their relevance to the class of richobservation MDPs. Empirical results in different domains confirm Tesseract'sgains in sample efficiency predicted by the theory.