Electroencephalography (EEG) allows for source measurement of electricalbrain activity. Particularly for inverse localization, the electrode positionson the scalp need to be known. Often, systems such as optical digitizingscanners are used for accurate localization with a stylus. However, theapproach is time-consuming as each electrode needs to be scanned manually andthe scanning systems are expensive. We propose using an RGBD camera to directlytrack electrodes in the images using deep learning methods. Studying andevaluating deep learning methods requires large amounts of labeled data. Toovercome the time-consuming data annotation, we generate a large number ofground-truth labels using a robotic setup. We demonstrate that deeplearning-based electrode detection is feasible with a mean absolute error of5.69 +- 6.1mm and that our annotation scheme provides a useful environment forstudying deep learning methods for electrode detection.
Quick Read (beta)
Towards Deep Learning-Based Electrode Tracking Using Automatically Generated Weak Labels
Towards Deep Learning-Based EEG Electrode Detection Using Automatically Generated Labels
N. Gessert, M. Gromniak, M. Bengs, L. Matthäus, A. Schlaefer
Institute of Medical Technology, Hamburg University of Technology, Hamburg, Germany
eemagine Medical Imaging Solutions GmbH, Berlin, Germany
Contact: [email protected]
Electroencephalography (EEG) allows for source measurement of electrical brain activity. Particularly for inverse localization, the electrode positions on the scalp need to be known. Often, systems such as optical digitizing scanners are used for accurate localization with a stylus. However, the approach is time-consuming as each electrode needs to be scanned manually and the scanning systems are expensive. We propose using an RGBD camera to directly track electrodes in the images using deep learning methods. Studying and evaluating deep learning methods requires large amounts of labeled data. To overcome the time-consuming data annotation, we generate a large number of ground-truth labels using a robotic setup. We demonstrate that deep learning-based electrode detection is feasible with a mean absolute error of and that our annotation scheme provides a useful environment for studying deep learning methods for electrode detection.
Keywords: Deep Learning, CNN, Electrode Detection, Generated Labels
Electroencephalography (EEG) is a method that allows for measuring electrical brain activity, e.g., to assess patients’ motor function impairment or monitor progress in patients’ recovery process . For accurate brain current estimation based on the measured signals on the scalp, knowledge of the electrodes’ location is required .
A typical method for electrode placement is the 10-20 system  or its refined variants  where the positions are determined based on anatomical landmarks. Identification of anatomical landmarks relies on visual inspection and palpation by the practitioner which is error-prone. Instead, using accurate localization systems, e.g., using optical digitizing scanners with a stylus  or MRI-based localization have been proposed . Often, these systems are expensive and recording all electrodes’ location is time-consuming. Therefore, photogrammetric methods have been proposed where a single  or multiple cameras  are used to localize the electrodes on the head. These methods are advantageous as cheap cameras can be used for accurate localization. Recent methods often rely on depth (time-of-flight) and/or multiple RGB images for reconstruction of the 3D electrode positions .
Previous photogrammetric methods come with two major drawbacks. First, computer vision techniques for 3D reconstruction and electrode detection rely on handcrafted features and algorithms which are often limited to the specific scenarios they were engineered for and the algorithms often come with long execution times. Second, previous approaches usually assume a fixed head location. Both hinder application in mobile and changing environments such as ambulances where head movement is inevitable and fast detection is needed. Thus, fast algorithms that deal with large head pose variation are required. In recent years, deep learning methods have shown remarkable performance for a variety of computer vision tasks such as real-time object detection  and head pose estimation . In this paper we study the feasibility of electrode detection using convolutional neural networks (CNNs). To facilitate and study this approach, large amounts of annotated data are required. Therefore, we propose a setup using a robot with a head phantom attached to the robot’s endeffector. An EEG electrode cap is placed on the head phantom. Then, an RGBD camera acquires images of the head phantom which is moved to different positions and orientations. The electrodes are first labeled in a single image. Then, the electrode locations are transformed to each head pose using a hand-eye calibration and the initial markings. We study whether these automatically generated labels can be learned by a CNN which directly predicts the electrode locations from the images.
2 Material and Methods
2.1 Experimental Setup
Our experimental setup for data acquisition is shown in Figure 1. First, we perform a hand-eye calibration between the robot (UR3, Universal Robots) and the camera (Kinect V2, Microsoft) using a checkerboard mounted to the robot. Camera poses of the checkerboard are obtained with OpenCV  and the calibration transformations are obtained with QR24  using robot and camera poses. Then, the head phantom wearing the electrode cap (waveguard touch, eemagine) is mounted to the robot. We now move the robot into different endeffector positions and orientations while continuously acquiring RGB and depth images while also logging the endeffector poses.
2.2 Automatic Data Annotation
After acquiring a set of images, we map the RGB images to the depth sensor’s coordinate frame using calibrations provided by the manufacturer. The RGB images now have the same resolution as the depth images (). Next, we annotate all electrodes in a single image. As the RGB image was transformed to the depth sensor’s coordinate frame, we can now obtain the 3D coordinates from the 2D depth image and its corresponding point cloud using the 2D pixel locations from the RGB image. Using this 3D position and assuming identity orientation we obtain the poses of all electrodes . Using the hand-eye calibration between robot and camera and the current endeffector pose we obtain the electrode pose with respect to the robot endeffector:
Next, we can automatically obtain the electrode poses for all other robot poses in the dataset:
The position of the pose is now used as a 3D label for each electrode position for each RGBD image in the dataset. Besides the 3D labels, we also consider image-level (pixel) labels by projecting the 3D points back into the RGB images. Here, we perform nearest-neighbor matching, i.e., we assign the electrodes’ transformed 3D location to the closest point in the point cloud. Then, we project this point back on the RGB image. While the 3D labels are ultimately used for EEG, the pixel labels can be useful for purely image-based algorithms. The automatically generated labels will likely be affected by calibration errors. Thus, we also compare to a more accurate ground-truth by manually labeling a small set of images. To ensure consistent labels, the annotator selects the center pixel of each electrode.
2.3 Deep Learning Models
We employ two state-of-the-art CNNs, Densenet121  and SE-Resnext50 . The input to the network is an RGB image, the depth image or a full RGBD image. The images are cropped to the relevant region around the robot workspace based on the extent of the ground-truth annotations. Including a margin, this results in a network input size of pixels. All models are pretrained on ImageNet to overcome relatively small dataset sizes. Using an EEG cap with electrodes, the model output is of size for 3D point prediction and for 2D pixel location prediction. As we solve a regression problem, the loss is the mean squared error, minimized using the Adam algorithm with an initial learning rate of and a batch size of . We train for epochs and halve the learning after epochs each. For implementation we use PyTorch . Training, evaluation and inference time measurement is performed on an NVIDIA GTX1080 TI.
In terms of evaluation metrics we follow  and use the mean absolute error (MAE) as an absolute metric, either in pixels or for 2D and 3D positions, respectively. To compare 2D and 3D labels, we consider the relative MAE (rMAE) which is the absolute error divided by the targets’ standard deviation. The metric does not have a unit as it is relative. Last, we consider the average correlation coefficient (aCC) between predictions and targets as a relative metrics. Values close to indicate that a regression task was generally learned well.
Note that our fixed-size CNN output always forces the CNN to make a prediction for all electrode locations, even when they are not visible. Also, our automatic labeling strategy can provide annotations for learning even if some electrode locations are not visible as they are still transformed to their corresponding 3D location. Thus, our model is given the capability to obtain robustness towards partial electrode occlusion.
|Generated Labels||Manual Labels|
To evaluate our setup we generate a set of images for training and validation and we manually annotate images for testing. The positions cover a range of which corresponds to pixels in 2D images.
Using pairs of poses for calibration and pairs for evaluation, the hand-eye calibration between the robot and the camera results in a position error of and a rotation error of .
Quantitative results are shown in Table 1. We provide errors with respect to both the generated labels and the manually annotated labels. In general, our automatically generated labels are close to the real labels. Comparing Densenet121 and SE-Resnext50, both models perform similar with Densenet121 showing the best performance on 2D labels and SE-Resnext50 showing the best performance on 3D labels. With respect to color channels, using RGB and RGBD images performs similar. Both our pixel labels and the real-world 3D coordinates are learned well by the CNNs with aCCs close to . For pixel labels, the relative metrics indicate a higher performance than for real-world 3D coordinates. Inference times are for Densenet121 and for SE-Resnext50. Training times are and for Densenet121 and SE-Resnext50, respectively.
Qualitative results are shown in Figure 2. We show an RGB image and a point cloud of the head with the EEG cap and the manually annotated electrode locations, the generated labels and the predicted locations. Qualitatively, the predicted electrode locations are close to the manually annotated labels. Also, note that our approach is able to provide a reasonable prediction although one of the electrodes at the back of the head is only partially visible.
In this paper we address deep learning-based electrode detection using 2D camera images. This approach is particular promising as CNNs can provide fast predictions and they can be adjusted to versatile environments without requiring manual feature handcrafting. However, their main drawback is the large amount of annotated data that is usually required. We address this issue with a robotic setup for automatic data and label acquisition. We evaluate the approach by using different types of input images, labels and CNN architectures.
In general, our automatic label generation framework works well although the setup is affected by calibration errors between the robot and the camera. The generated labels closely match the more accurate manual ground-truth with an MAE of and pixels while our labels cover a range of approximately and pixels. Also the point cloud plots in Figure 2 demonstrate that the actual electrode locations are well matched. Overall, the deep learning models approximate the ground-truth well, although there is a large variation in the target positions. Notably, there is a performance difference between using the generated and the manual labels for evaluation. This reflects the calibration errors in the setup which mainly cause the difference between generated and manual labels. Thus, the error between the generated labels and the manual labels can be seen as an upper bound for model performance.
At the same time, predictions are fast with a range of to which indicates real-time capability. Other photogrammetric methods typically require seconds up to minutes for detection .
Using either pixel or real-world 3D coordinates works well while predicting 2D labels appears to be easier with an average aCC of compared to . Intuitively, deriving 3D coordinates from a 2D image is more difficult and thus the results match expectations. In terms of application, the 3D coordinates are more relevant as the overall goal is to obtain the electrode locations with respect to a 3D head coordinate frame. Adding head coordinates for deriving a head coordinate frame is straight forward with our approach and could be addressed in future work.
In terms of CNN models, the performance with respect to the actual labels is very similar as both models achieve aCCs close to . SE-Resnext50 performs slightly better with respect to the 3D labels while Densenet121 shows the best performance for 2D pixel labels. Notably, the task of predicting 3D real-world coordinates appears to be more difficult as the rMAE and aCC are generally lower for this task. Considering that SE-Resnext50 has more parameters than Densenet121, the additional capacity might be beneficial for solving the more difficult problem. However, the slight increase in performance is bought with a substantial increase in inference time which needs to be carefully traded off for application.
For the different types of input modalities the performance with respect to the manual labels is very similar. Adding the depth channel to the RGB images does not appear to be beneficial in our setup. This appears to indicate that depth information is not helpful, however, our current setup utilizes a particular head shape and EEG cap model. When generalizing to different head sizes, shapes or EEG caps, depth information will be more important as it is difficult to differentiate between a smaller head, close to the camera and a larger head, further away. Our automatic acquisition and labeling approach is well suited for covering more diverse scenarios, e.g., with different head models which can be addressed in future work.
Overall, we demonstrate that predicting electrode locations works well and enables studying deep learning methods for EEG electrode detection. The setup allows for considering variations such as head shape and size or different EEG caps. However, a clear drawback of our method is the fact that it is limited to the use of head phantoms for automatic label generation. Thus, future work needs to incorporate our approach in real-world settings. This could be facilitated by applying a pre-segmentation of the electrode cap which would make the approach independent of the underlying head. Another approach would be to employ transfer learning techniques  or few-shot learning  where a CNN that is pretrained with our setup is adapted to the real-world scenarios with a few new images.
We study deep learning-based EEG electrode detection from camera images. To facilitate the approach, we propose a robotic setup for automatic data and label generation. This allows for quick generation of arbitrarily-sized datasets including ground-truth annotations. We demonstrate that CNNs are able to detect the electrodes using either RGB or depth images. Furthermore, our automatically generated labels closely match a more accurate, manual ground-truth annotation. Thus, our setup allows for developing and studying deep learning-based electrode detection approaches. Future work could study more extended scenarios, e.g., with different head shapes and sizes.
This work was partially funded by AiF research grant number ZF4026302CR7.
-  Otten, P., Kim, J., Son, S. (2015) A framework to automate assessment of upper-limb motor function impairment: A feasibility study. Sensors 15(8), 20,097–20,114
-  Plummer, C., Harvey, A.S., Cook, M. (2008) Eeg source localization in focal epilepsy: where are we now? Epilepsia 49(2), 201–218
-  Jasper, H. (1958) Report of the committee on methods of clinical examination in electroencephalography. Electroencephalogr Clin Neurophysiol 10, 370–375
-  Jurcak, V., Tsuzuki, D., Dan, I. (2007) 10/20, 10/10, and 10/5 systems revisited: their validity as relative head-surface-based positioning systems. Neuroimage 34(4), 1600–1611
-  Towle, V.L., Bolaños, J., Suarez, D., Tan, K., Grzeszczuk, R., Levin, D.N., Cakmur, R., Frank, S.A., Spire, J.P. (1993) The spatial location of eeg electrodes: locating the best-fitting sphere relative to cortical anatomy. Electroencephalography and clinical neurophysiology 86(1), 1–6
-  Brinkmann, B.H., O’Brien, T.J., Dresner, M.A., Lagerlund, T.D., Sharbrough, F.W., Robb, R.A. (1998) Scalp-recorded eeg localization in mri volume data. Brain topography 10(4), 245–253
-  Qian, S., Sheng, Y. (2011) A single camera photogrammetry system for multi-angle fast localization of eeg electrodes. Annals of biomedical engineering 39(11), 2844
-  Reis, P.M., Lochmann, M. (2015) Using a motion capture system for spatial localization of eeg electrodes. Frontiers in neuroscience 9, 130
-  Clausner, T., Dalal, S.S., Crespo-García, M. (2017) Photogrammetry-based head digitization for rapid and accurate localization of eeg electrodes and meg fiducial markers using a single digital slr camera. Frontiers in neuroscience 11, 264
-  Ren, S., He, K., Girshick, R., Sun, J. (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99
-  Ranjan, R., Patel, V.M., Chellappa, R. (2019) Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(1), 121–135
-  Bradski, G. (2000) The OpenCV Library. Dr. Dobb’s Journal of Software Tools
-  Ernst, F., Richter, L., Matthäus, L., Martens, V., Bruder, R., Schlaefer, A., Schweikard, A. (2012) Non-orthogonal tool/flange and robot/world calibration. The International Journal of Medical Robotics and Computer Assisted Surgery 8(4), 407–420
-  Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q. (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708
-  Hu, J., Shen, L., Sun, G. (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141
-  Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A. (2017) Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop
-  Borchani, H., Varando, G., Bielza, C., Larrañaga, P. (2015) A survey on multi-output regression. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5), 216–233
-  Oquab, M., Bottou, L., Laptev, I., Sivic, J. (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1717–1724
-  Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M. (2018) Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208