Training Data Attribution via Approximate Unrolled Differentation

Abstract

Many training data attribution (TDA) methods aim to estimate how a model'sbehavior would change if one or more data points were removed from the trainingset. Methods based on implicit differentiation, such as influence functions,can be made computationally efficient, but fail to account forunderspecification, the implicit bias of the optimization algorithm, ormulti-stage training pipelines. By contrast, methods based on unrolling addressthese issues but face scalability challenges. In this work, we connect theimplicit-differentiation-based and unrolling-based approaches and combine theirbenefits by introducing Source, an approximate unrolling-based TDA method thatis computed using an influence-function-like formula. While beingcomputationally efficient compared to unrolling-based approaches, Source issuitable in cases where implicit-differentiation-based approaches struggle,such as in non-converged models and multi-stage training pipelines.Empirically, Source outperforms existing TDA techniques in counterfactualprediction, especially in settings where implicit-differentiation-basedapproaches fall short.

Quick Read (beta)

loading the full paper ...