Deploying the idea of long-term cumulative return, reinforcement learning hasshown remarkable performance in various fields. We propose a formulation of thelandmark localization in 3D medical images as a reinforcement learning problem.Whereas value-based methods have been widely used to solve similar problems, weadopt an actor-critic based direct policy search method framed in a temporaldifference learning approach. Successful behavior learning is challenging inlarge state and/or action spaces, requiring many trials. We introduce a partialpolicy-based reinforcement learning to enable solving the large problem oflocalization by learning the optimal policy on smaller partial domains.Independent actors efficiently learn the corresponding partial policies, eachutilizing their own independent critic. The proposed policy reconstruction fromthe partial policies ensures a robust and efficient localization utilizing thesub-agents solving simple binary decision problems in their correspondingpartial action spaces. The proposed reinforcement learning requires a smallnumber of trials to learn the optimal behavior compared with the originalbehavior learning scheme.