Abstract
Stochastic optimizers are central to deep learning, yet widely used methodssuch as Adam and Adan can degrade in non-stationary or noisy environments,partly due to their reliance on momentum-based magnitude estimates. Weintroduce Ano, a novel optimizer that decouples direction and magnitude:momentum is used for directional smoothing, while instantaneous gradientmagnitudes determine step size. This design improves robustness to gradientnoise while retaining the simplicity and efficiency of first-order methods. Wefurther propose Anolog, which removes sensitivity to the momentum coefficientby expanding its window over time via a logarithmic schedule. We establishnon-convex convergence guarantees with a convergence rate similar to othersign-based methods, and empirically show that Ano provides substantial gains innoisy and non-stationary regimes such as reinforcement learning, whileremaining competitive on low-noise tasks such as standard computer visionbenchmarks.