Abstract
We present a novel reinforcement learning (RL) approach for solving theclassical 2-level atom non-LTE radiative transfer problem by framing it as acontrol task in which an RL agent learns a depth-dependent source function$S(\tau)$ that self-consistently satisfies the equation of statisticalequilibrium (SE). The agent's policy is optimized entirely via reward-basedinteractions with a radiative transfer engine, without explicit knowledge ofthe ground truth. This method bypasses the need for constructing approximatelambda operators ($\Lambda^*$) common in accelerated iterative schemes.Additionally, it requires no extensive precomputed labeled datasets to extracta supervisory signal, and avoids backpropagating gradients through the complexRT solver itself. Finally, we show through experiment that a simple feedforwardneural network trained greedily cannot solve for SE, possibly due to the movingtarget nature of the problem. Our $\Lambda^*-\text{Free}$ method offerspotential advantages for complex scenarios (e.g., atmospheres with enhancedvelocity fields, multi-dimensional geometries, or complex microphysics) where$\Lambda^*$ construction or solver differentiability is challenging.Additionally, the agent can be incentivized to find more efficient policies bymanipulating the discount factor, leading to a reprioritization of immediaterewards. If demonstrated to generalize past its training data, this RLframework could serve as an alternative or accelerated formalism to achieve SE.To the best of our knowledge, this study represents the first application ofreinforcement learning in solar physics that directly solves for a fundamentalphysical constraint.