Abstract
Self-supervised learning (SSL) methods are popular since they can addresssituations with limited annotated data by directly utilising the underlyingdata distribution. However, the adoption of such methods is not explored enoughin ultrasound (US) imaging, especially for fetal assessment. We investigate thepotential of dual-encoder SSL in utilizing unlabelled US video data to improvethe performance of challenging downstream Standard Fetal Cardiac Planes (SFCP)classification using limited labelled 2D US images. We study 7 SSL approachesbased on reconstruction, contrastive loss, distillation, and information theoryand evaluate them extensively on a large private US dataset. Our observationsand findings are consolidated from more than 500 downstream trainingexperiments under different settings. Our primary observation shows that forSSL training, the variance of the dataset is more crucial than its size becauseit allows the model to learn generalisable representations, which improve theperformance of downstream tasks. Overall, the BarlowTwins method shows robustperformance, irrespective of the training settings and data variations, whenused as an initialisation for downstream tasks. Notably, full fine-tuning with1% of labelled data outperforms ImageNet initialisation by 12% in F1-score andoutperforms other SSL initialisations by at least 4% in F1-score, thus makingit a promising candidate for transfer learning from US video to image data.