Towards Scaling Deep Neural Networks with Predictive Coding: Theory and Practice

Abstract

Backpropagation (BP) is the standard algorithm for training the deep neuralnetworks that power modern artificial intelligence including large languagemodels. However, BP is energy inefficient and unlikely to be implemented by thebrain. This thesis studies an alternative, potentially more efficientbrain-inspired algorithm called predictive coding (PC). Unlike BP, PC networks(PCNs) perform inference by iterative equilibration of neuron activities beforelearning or weight updates. Recent work has suggested that this iterativeinference procedure provides a range of benefits over BP, such as fastertraining. However, these advantages have not been consistently observed, theinference and learning dynamics of PCNs are still poorly understood, and deepPCNs remain practically untrainable. Here, we make significant progress towardsscaling PCNs by taking a theoretical approach grounded in optimisation theory.First, we show that the learning dynamics of PC can be understood as anapproximate trust-region method using second-order information, despiteexplicitly using only first-order local updates. Second, going beyond thisapproximation, we show that PC can in principle make use of arbitrarilyhigher-order information, such that for feedforward networks the effectivelandscape on which PC learns is far more benign and robust to vanishinggradients than the (mean squared error) loss landscape. Third, motivated by astudy of the inference dynamics of PCNs, we propose a new parameterisationcalled "$\mu$PC", which for the first time allows stable training of 100+ layernetworks with little tuning and competitive performance on simple tasks.Overall, this thesis significantly advances our fundamental understanding ofthe inference and learning dynamics of PCNs, while highlighting the need forfuture research to focus on hardware co-design if PC is to compete with BP atscale.

Quick Read (beta)

loading the full paper ...