Abstract
Backpropagation provides a generalized configuration for overcomingcatastrophic forgetting. Like, SGD and Adam are commonly used for weightupdates in continual learning and continual pre-training. In practice,permission to access gradient information is not always granted (the gradientban), such as black-box APIs, hardware limitations, and non-differentiablesystems. To bridge this gap, we introduce the first benchmark ZeroFlow toevaluate gradient-free optimization algorithms for overcoming forgetting. Thisbenchmark examines a suite of forward pass methods across multiple methods,forgetting scenarios, and datasets. We find that forward passes alone areenough to overcome forgetting. Our findings reveal new optimization principlesthat highlight the potential of forward-pass in mitigating forgetting, managingtask conflicts, and reducing memory demands, alongside novel enhancements thatfurther mitigate forgetting with just one forward pass. This work providesessential insights and tools for advancing forward pass methods to overcomeforgetting.