### Abstract

What is the information leakage of an iterative learning algorithm about itstraining data, when the internal state of the algorithm is \emph{not}observable? How much is the contribution of each specific training epoch to thefinal leakage? We study this problem for noisy gradient descent algorithms, andmodel the \emph{dynamics} of R\'enyi differential privacy loss throughout thetraining process. Our analysis traces a provably tight bound on the R\'enyidivergence between the pair of probability distributions over parameters ofmodels with neighboring datasets. We prove that the privacy loss convergesexponentially fast, for smooth and strongly convex loss functions, which is asignificant improvement over composition theorems. For Lipschitz, smooth, andstrongly convex loss functions, we prove optimal utility for differentialprivacy algorithms with a small gradient complexity.