Modern radio telescopes produce unprecedented amounts of data, which arepassed through many processing pipelines before the delivery of scientificresults. Hyperparameters of these pipelines need to be tuned by hand to produceoptimal results. Because many thousands of observations are taken during alifetime of a telescope and because each observation will have its uniquesettings, the fine tuning of pipelines is a tedious task. In order to automatethis process of hyperparameter selection in data calibration pipelines, weintroduce the use of reinforcement learning. We test two reinforcement learningtechniques, twin delayed deep deterministic policy gradient (TD3) and softactor-critic (SAC), to train an autonomous agent to perform this fine tuning.For the sake of generalization, we consider the pipeline to be a black-boxsystem where the summarized state of the performance of the pipeline is used bythe autonomous agent. The autonomous agent trained in this manner is able todetermine optimal settings for diverse observations and is therefore able toperform 'smart' calibration, minimizing the need for human intervention.