Catastrophic forgetting remains a severe hindrance to the broad applicationof artificial neural networks (ANNs), however, it continues to be a poorlyunderstood phenomenon. Despite the extensive amount of work on catastrophicforgetting, we argue that it is still unclear how exactly the phenomenon shouldbe quantified, and, moreover, to what degree all of the choices we make whendesigning learning systems affect the amount of catastrophic forgetting. We usevarious testbeds from the reinforcement learning and supervised learningliterature to (1) provide evidence that the choice of which moderngradient-based optimization algorithm is used to train an ANN has a significantimpact on the amount of catastrophic forgetting and show that-surprisingly-inmany instances classical algorithms such as vanilla SGD experience lesscatastrophic forgetting than the more modern algorithms such as Adam. Weempirically compare four different existing metrics for quantifyingcatastrophic forgetting and (2) show that the degree to which the learningsystems experience catastrophic forgetting is sufficiently sensitive to themetric used that a change from one principled metric to another is enough tochange the conclusions of a study dramatically. Our results suggest that a muchmore rigorous experimental methodology is required when looking at catastrophicforgetting. Based on our results, we recommend inter-task forgetting insupervised learning must be measured with both retention and relearning metricsconcurrently, and intra-task forgetting in reinforcement learning must-at thevery least-be measured with pairwise interference.