### Abstract

Model based reinforcement learning has proven to be more sample efficientthan model free methods. On the other hand, the construction of a dynamicsmodel in model based reinforcement learning has increased complexity. Dataprocessing tasks in radio astronomy are such situations where the originalproblem which is being solved by reinforcement learning itself is the creationof a model. Fortunately, many methods based on heuristics or signal processingdo exist to perform the same tasks and we can leverage them to propose the bestaction to take, or in other words, to provide a `hint'. We propose to use`hints' generated by the environment as an aid to the reinforcement learningprocess mitigating the complexity of model construction. We modify the softactor critic algorithm to use hints and use the alternating direction method ofmultipliers algorithm with inequality constraints to train the agent. Resultsin several environments show that we get the increased sample efficiency byusing hints as compared to model free methods.