Abstract
This paper evaluates and compares the performance of model-free andmodel-based reinforcement learning for the attitude control of fixed-wingunmanned aerial vehicles using PID as a reference point. The comparison focuseson their ability to handle varying flight dynamics and wind disturbances in asimulated environment. Our results show that the Temporal Difference ModelPredictive Control agent outperforms both the PID controller and othermodel-free reinforcement learning methods in terms of tracking accuracy androbustness over different reference difficulties, particularly in nonlinearflight regimes. Furthermore, we introduce actuation fluctuation as a key metricto assess energy efficiency and actuator wear, and we test two differentapproaches from the literature: action variation penalty and conditioning foraction policy smoothness. We also evaluate all control methods when subject tostochastic turbulence and gusts separately, so as to measure their effects ontracking performance, observe their limitations and outline their implicationson the Markov decision process formalism.