Abstract
Crop production management is essential for optimizing yield and minimizing afield's environmental impact to crop fields, yet it remains challenging due tothe complex and stochastic processes involved. Recently, researchers haveturned to machine learning to address these complexities. Specifically,reinforcement learning (RL), a cutting-edge approach designed to learn optimaldecision-making strategies through trial and error in dynamic environments, hasemerged as a promising tool for developing adaptive crop management policies.RL models aim to optimize long-term rewards by continuously interacting withthe environment, making them well-suited for tackling the uncertainties andvariability inherent in crop management. Studies have shown that RL cangenerate crop management policies that compete with, and even outperform,expert-designed policies within simulation-based crop models. In the gym-DSSATcrop model environment, one of the most widely used simulators for cropmanagement, proximal policy optimization (PPO) and deep Q-networks (DQN) haveshown promising results. However, these methods have not yet beensystematically evaluated under identical conditions. In this study, weevaluated PPO and DQN against static baseline policies across three differentRL tasks, fertilization, irrigation, and mixed management, provided by thegym-DSSAT environment. To ensure a fair comparison, we used consistent defaultparameters, identical reward functions, and the same environment settings. Ourresults indicate that PPO outperforms DQN in fertilization and irrigationtasks, while DQN excels in the mixed management task. This comparative analysisprovides critical insights into the strengths and limitations of each approach,advancing the development of more effective RL-based crop managementstrategies.