Neuroimaging studies based on magnetic resonance imaging (MRI) typicallyemploy rigorous forms of preprocessing. Images are spatially normalized to astandard template using linear and non-linear transformations. Thus, one canassume that a patch at location (x, y, height, width) contains the same brainregion across the entire data set. Most analyses applied on brain MRI usingconvolutional neural networks (CNNs) ignore this distinction from naturalimages. Here, we suggest a new layer type called patch individual filter (PIF)layer, which trains higher-level filters locally as we assume that moreabstract features are locally specific after spatial normalization. We evaluatePIF layers on three different tasks, namely sex classification as well aseither Alzheimer's disease (AD) or multiple sclerosis (MS) detection. Wedemonstrate that CNNs using PIF layers outperform their counterparts inseveral, especially low sample size settings.
Quick Read (beta)
Harnessing spatial MRI normalization: patch individual filter layers for CNNs
Neuroimaging studies based on magnetic resonance imaging (MRI) typically employ rigorous forms of preprocessing. Images are spatially normalized to a standard template using linear and non-linear transformations. Thus, one can assume that a patch at location contains the same brain region across the entire data set. Most analyses applied on brain MRI using convolutional neural networks (CNNs) ignore this distinction from natural images. Here, we suggest a new layer type called patch individual filter (PIF) layer, which trains higher-level filters locally as we assume that more abstract features are locally specific after spatial normalization. We evaluate PIF layers on three different tasks, namely sex classification as well as either Alzheimer’s disease (AD) or multiple sclerosis (MS) detection. We demonstrate that CNNs using PIF layers outperform their counterparts in several, especially low sample size settings.
Harnessing spatial MRI normalization: patch individual filter layers for CNNs
Fabian Eitel Humboldt Universität Berlin Berlin, 10117 Jan Philipp Albrecht Freie Universität Berlin Berlin, 14195 Friedemann Paul Charité - Universitätsmedizin Berlin Berlin, 10117 Kerstin Ritter Charité - Universitätsmedizin Berlin Berlin, 10117 [email protected]
noticebox[b]Preprint. Under review.\[email protected]
CNNs have been successfully applied on neuroimaging data Jo2019Review; VIEIRA2017Review. However, several challenges have been discussed: First, sample sizes are low and public data sets typically contain no more than 1,000 patients of a specific disease. Second, MR sequences are three dimensional and can contain up to 1 million non-zero voxels, making the number of features much greater than the number of samples. Lastly, many features of the brain and neurological or psychiatric diseases are not fully understood. Even though there are guidelines for neurological assessment of diseases, these change over time (e.g. the McDonald criteria polman2011diagnostic; thompson2018diagnosis) and only represent our current understanding of a disease.
Previous approaches using CNNs in neuroimaging tend to convert architectures which are successful on natural images to 3D MRI classifiers Guan2019comprehensive by replacing 2D with 3D operations. In those, the special features of MRI data are typically ignored. Here, we specifically make use of the spatial homogeneity of brain MRI data. Through linear and non-linear transformations, MR images are normalized to a shared template within the MNI space such as the ICBM 152 atlas john_phd; AVANTS2008; FONOV2011313. This ensures that a voxel at location contains more or less the same brain region in every image and allows researchers to investigate a specific region (e.g. the hippocampus) across subjects. We suggest to address the spatial homogeneity by training within patches of the image and evaluate this approach in three different tasks: sex classification, AD and MS detection. Unlike patch-based approaches (see Section Related Work) we only intend to train patch-wise in higher layers. This is motivated by the idea of abstraction: whereas lower level features such as edge detectors might be globally relevant, some higher level features might be more locally relevant. Because higher level filters in the PIF setting train on patches which contain less information and noise than the entire image, we show here that they require fewer iterations over the training set as well as fewer samples to converge as compared to vanilla CNNs.
2 Related Work
PIF layers are different from patch-based training. In patch-based training Kamnitsas2016; Ghafoorian2017; Yoo2018, multiple patches are sampled from the dataset and fed into the same classifier regardless of the position of each patch. Therefore, the classifier’s filters share weights between different patches. Conversely, within PIF layers, weights are only shared within a spatially restricted patch. PatchGANs li2016precomputed; isola2017image use Markovian patches as input for a discriminator network in order to focus penalization on high-frequency structure.
PIF layers are a generalization of local convolutions as implemented in Lasagne11 1 https://lasagne.readthedocs.io/en/latest/modules/layers/local.html and Keras22 2 https://keras.io/layers/local/. Local convolutions are similar to regular convolutions but do not share weights. Local convolutions are a special case of PIF layers where , where is the patch size, is the padding size and is the kernel size. Thus, the convolution kernel does not slide over the selected patch (because they are congruent).
The heterogeneity of natural images depicting the same object requires filters to be convolved with the entire image. In a cat detection model, for example, we would expect that some higher level filters detect cat ears. In this case, it is necessary to convolve those cat ear filters with the entire image, because cat ears might be located anywhere in the image. However, when all images are spatially standardized, e.g. objects are in the same angle, viewpoint and distance and all major facial features of the cats are at the same location in each image, it would suffice if the cat ear filter searches around a small subspace of the image. This drastic form of spatial normalization is unlikely to achieve in natural images, but in neuroimaging it is the de facto standard and a major requirement for mass-univariate and multivariate pattern analysis (MVPA).
For the analysis of spatially normalized MRI data, we suggest a new CNN architecture relying on PIF layers. PIF layers consist of 3 stages: (i) split, (ii) process and (iii) reassemble. Each output feature map of the previous layer is first split (i) into patches of size . Next, the patches at row and column of all feature maps are processed (ii) with a series of local convolutions of kernel size . This is repeated for all patches. When , weights are shared within each patch but not across patches. Lastly, all patches are reassembled (iii) in the same order as they were split. Figure 1 shows an overview of the layer design. The final model consists of 4 convolutional blocks (Conv-BatchNorm-ReLU) followed by a PIF layer with a single convolutional block between split and reassemble phases.
We evaluated PIF layers on three different datasets/tasks: 1) sex classification on a subset of the UK Biobank (F=1005, M=849), 2) AD detection on data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI; AD=475, HC=494) and 3) MS detection on a small private data set (MS=76, HC=71). We evaluate the performance in terms of balanced accuracy, iterations until early stopping and performance on a smaller subset. The subset contains 20% of randomly drawn samples from the full data set. As the number of samples for the MS data set is already small, we did not use a subset here. We compare results to a simple 5-layer CNN architecture that has shown good results on the ADNI data set and is adjusted slightly for each task. For UK Biobank and ADNI the number of parameters is smaller in the PIF architecture, whereas in MS it is larger. After finding suitable hyperparameters (learning rate, weight decay, number of filters, dropout) for each task, all experiments were repeated 10 times and averages over all repetitions are reported. Data for baseline model training was augmented using horizontal flips and translation, whereas for PIF model training only horizontal flips were used to avoid misaligned images.
Table 1 shows the balanced accuracy and iteration in which training finished using early stopping. On the sex classification task (UK Biobank), the PIF model works almost identical on the full data set with an accuracy of almost 90%. When using the small subset of only 20% from the original data set, the balanced accuracy of the PIF model increased from 64.47% to 78.11%. On the full data set, the required iterations until early stopping almost halve, whereas on the subset they increase from 40.7 to 69.5 iterations. This is the only case where the PIF architecture has a higher number of iterations and might be due to the task-specific PIF model having a higher amount of parameters than on the other data sets. On the AD classification task (ADNI), both baseline and PIF model perform similarly with a balanced accuracy of around 84.5% but with a reduced set of required iterations in case of the PIF model (31.8 to 22.3). On the ADNI subset, the baseline outperformed the PIF architecture with an accuracy of 81.09% over 76.65% but early stopping with the PIF model occurred on average in iteration 71.9 compared to iteration 106.4 in the baseline model. Lastly, for MS detection balanced accuracy increased from 75.04% to 80.92% when using the PIF architecture and the required number of iterations decreased on average from 83.7 to 53.5.
|Large data set||Small data set|
|Data||Model||Bal. acc.||Early stopping iter.||Bal. acc.||Early stopping iter.|
In this work, we have introduced a new CNN architecture relying on PIF layers to harness the established techniques of spatial normalization in neuroimaging. In multiple experiments, we have shown that PIF layers can outperform simple CNNs, especially in low sample scenarios. Further experiments are required to investigate whether the success of PIF layers is task-specific, such as the learning of regional differences in MS lesions in comparison to global atrophy in AD.