The study and benchmarking of Deep Reinforcement Learning (DRL) models hasbecome a trend in many industries, including aerospace engineering andcommunications. Recent studies in these fields propose these kinds of models toaddress certain complex real-time decision-making problems in which classicapproaches do not meet time requirements or fail to obtain optimal solutions.While the good performance of DRL models has been proved for specific use casesor scenarios, most studies do not discuss the compromises and generalizabilityof such models during real operations. In this paper we explore the tradeoffsof different elements of DRL models and how they might impact the finalperformance. To that end, we choose the Frequency Plan Design (FPD) problem inthe context of multibeam satellite constellations as our use case and propose aDRL model to address it. We identify 6 different core elements that have amajor effect in its performance: the policy, the policy optimizer, the state,action, and reward representations, and the training environment. We analyzedifferent alternatives for each of these elements and characterize theireffect. We also use multiple environments to account for different scenarios inwhich we vary the dimensionality or make the environment nonstationary. Ourfindings show that DRL is a potential method to address the FPD problem in realoperations, especially because of its speed in decision-making. However, nosingle DRL model is able to outperform the rest in all scenarios, and the bestapproach for each of the 6 core elements depends on the features of theoperation environment. While we agree on the potential of DRL to solve futurecomplex problems in the aerospace industry, we also reflect on the importanceof designing appropriate models and training procedures, understanding theapplicability of such models, and reporting the main performance tradeoffs.