Applying probabilistic models to reinforcement learning (RL) has become anexciting direction of research owing to powerful optimisation tools such asvariational inference becoming applicable to RL. However, due to theirformulation, existing inference frameworks and their algorithms posesignificant challenges for learning optimal policies, for example, the absenceof mode capturing behaviour in pseudo-likelihood methods and difficulties inoptimisation of learning objective in maximum entropy RL based approaches. Wepropose VIREL, a novel, theoretically grounded probabilistic inferenceframework for RL that utilises the action-value function in a parametrised formto capture future dynamics of the underlying Markov decision process. Owing toit's generality, our framework lends itself to current advances in variationalinference. Applying the variational expectation-maximisation algorithm to ourframework, we show that actor-critic algorithm can be reduced toexpectation-maximization. We derive a family of methods from our framework,including state-of-the-art methods based on soft value functions. We evaluatetwo actor-critic algorithms derived from this family, which perform on par withsoft actor critic, demonstrating that our framework offers a promisingperspective on RL as inference.