We propose a reinforcement-learning algorithm to tackle the challenge ofreconstructing phylogenetic trees. The search for the tree that best describesthe data is algorithmically challenging, thus all current algorithms forphylogeny reconstruction use various heuristics to make it feasible. In thisstudy, we demonstrate that reinforcement learning can be used to learn anoptimal search strategy, thus providing a novel paradigm for predicting themaximum-likelihood tree. Our proposed method does not require likelihoodcalculation with every step, nor is it limited to greedy uphill moves in thelikelihood space. We demonstrate the use of the developed deep-Q-learning agenton a set of unseen empirical data, namely, on unseen environments defined bynucleotide alignments of up to 20 sequences. Our results show that thelikelihood scores of the inferred phylogenies are similar to those obtainedfrom widely-used software. It thus establishes a proof-of-concept that it isbeneficial to optimize a sequence of moves in the search-space, rather thanoptimizing the progress made in every single move only. This suggests that areinforcement-learning based method provides a promising direction forphylogenetic reconstruction.