Abstract
Egocentric open-surgery videos capture rich, fine-grained details essentialfor accurately modeling surgical procedures and human behavior in the operatingroom. A detailed, pixel-level understanding of hands and surgical tools iscrucial for interpreting a surgeon's actions and intentions. We introduceEgoSurgery-HTS, a new dataset with pixel-wise annotations and a benchmark suitefor segmenting surgical tools, hands, and interacting tools in egocentricopen-surgery videos. Specifically, we provide a labeled dataset for (1) toolinstance segmentation of 14 distinct surgical tools, (2) hand instancesegmentation, and (3) hand-tool segmentation to label hands and the tools theymanipulate. Using EgoSurgery-HTS, we conduct extensive evaluations ofstate-of-the-art segmentation methods and demonstrate significant improvementsin the accuracy of hand and hand-tool segmentation in egocentric open-surgeryvideos compared to existing datasets. The dataset will be released athttps://github.com/Fujiry0/EgoSurgery.