Covariant Gradient Descent

Abstract

We present a manifestly covariant formulation of the gradient descent method,ensuring consistency across arbitrary coordinate systems and general curvedtrainable spaces. The optimization dynamics is defined using a covariant forcevector and a covariant metric tensor, both computed from the first and secondstatistical moments of the gradients. These moments are estimated throughtime-averaging with an exponential weight function, which preserves linearcomputational complexity. We show that commonly used optimization methods suchas RMSProp and Adam correspond to special limits of the covariant gradientdescent (CGD) and demonstrate how these methods can be further generalized andimproved.

Quick Read (beta)

loading the full paper ...