Tucker Tensor Layer in Fully Connected Neural Networks

Abstract

We introduce the Tucker Tensor Layer (TTL), an alternative to the denseweight-matrices of the fully connected layers of feed-forward neural networks(NNs), to answer the long standing quest to compress NNs and improve theirinterpretability. This is achieved by treating these weight-matrices as theunfolding of a higher order weight-tensor. This enables us to introduce aframework for exploiting the multi-way nature of the weight-tensor in order toefficiently reduce the number of parameters, by virtue of the compressionproperties of tensor decompositions. The Tucker Decomposition (TKD) is employedto decompose the weight-tensor into a core tensor and factor matrices. Were-derive back-propagation within this framework, by extending the notion ofmatrix derivatives to tensors. In this way, the physical interpretability ofthe TKD is exploited to gain insights into training, through the process ofcomputing gradients with respect to each factor matrix. The proposed frameworkis validated on synthetic data and on the Fashion-MNIST dataset, emphasizingthe relative importance of various data features in training, hence mitigatingthe "black-box" issue inherent to NNs. Experiments on both MNIST andFashion-MNIST illustrate the compression properties of the TTL, achieving a66.63 fold compression whilst maintaining comparable performance to theuncompressed NN.

Quick Read (beta)

loading the full paper ...