Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Abstract

Recently, the connectionist temporal classification (CTC) model coupled withrecurrent (RNN) or convolutional neural networks (CNN), made it easier to trainspeech recognition systems in an end-to-end fashion. However in real-valuedmodels, time frame components such as mel-filter-bank energies and the cepstralcoefficients obtained from them, together with their first and second orderderivatives, are processed as individual elements, while a natural alternativeis to process such components as composed entities. We propose to group suchelements in the form of quaternions and to process these quaternions using theestablished quaternion algebra. Quaternion numbers and quaternion neuralnetworks have shown their efficiency to process multidimensional inputs asentities, to encode internal dependencies, and to solve many tasks with lesslearning parameters than real-valued models. This paper proposes to integratemultiple feature views in quaternion-valued convolutional neural network(QCNN), to be used for sequence-to-sequence mapping with the CTC model.Promising results are reported using simple QCNNs in phoneme recognitionexperiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phonemeerror rate (PER) with less learning parameters than a competing model based onreal-valued CNNs.

Quick Read (beta)

loading the full paper ...