Unsupervised domain adaptation for clinician pose estimation and instance segmentationin the operating room

Abstract

The fine-grained localization of clinicians in the operating room (OR) is akey component to design the new generation of OR support systems. Computervision models for person pixel-based segmentation and body-keypoints detectionare needed to better understand the clinical activities and the spatial layoutof the OR. This is challenging, not only because OR images are very differentfrom traditional vision datasets, but also because data and annotations arehard to collect and generate in the OR due to privacy concerns. To addressthese concerns, we first study how joint person pose estimation and instancesegmentation can be performed on low resolutions images with downsamplingfactors from 1x to 12x. Second, to address the domain shift and the lack ofannotations, we propose a novel unsupervised domain adaptation method, calledAdaptOR, to adapt a model from an in-the-wild labeled source domain to astatistically different unlabeled target domain. We propose to exploit explicitgeometric constraints on the different augmentations of the unlabeled targetdomain image to generate accurate pseudo labels and use these pseudo labels totrain the model on high- and low-resolution OR images in a self-trainingframework. Furthermore, we propose disentangled feature normalization to handlethe statistically different source and target domain data. Extensiveexperimental results with detailed ablation studies on the two OR datasetsMVOR+ and TUM-OR-test show the effectiveness of our approach against stronglyconstructed baselines, especially on the low-resolution privacy-preserving ORimages. Finally, we show the generality of our method as a semi-supervisedlearning (SSL) method on the large-scale COCO dataset, where we achievecomparable results with as few as 1% of labeled supervision against a modeltrained with 100% labeled supervision.

Quick Read (beta)

loading the full paper ...