Abstract
Rip currents are strong, localized and narrow currents of water that flowoutwards into the sea, causing numerous beach-related injuries and fatalitiesworldwide. Accurate identification of rip currents remains challenging due totheir amorphous nature and the lack of annotated data, which often requiresexpert knowledge. To address these issues, we present RipVIS, a large-scalevideo instance segmentation benchmark explicitly designed for rip currentsegmentation. RipVIS is an order of magnitude larger than previous datasets,featuring $184$ videos ($212,328$ frames), of which $150$ videos ($163,528$frames) are with rip currents, collected from various sources, includingdrones, mobile phones, and fixed beach cameras. Our dataset encompasses diversevisual contexts, such as wave-breaking patterns, sediment flows, and watercolor variations, across multiple global locations, including USA, Mexico,Costa Rica, Portugal, Italy, Greece, Romania, Sri Lanka, Australia and NewZealand. Most videos are annotated at $5$ FPS to ensure accuracy in dynamicscenarios, supplemented by an additional $34$ videos ($48,800$ frames) withoutrip currents. We conduct comprehensive experiments with Mask R-CNN, CascadeMask R-CNN, SparseInst and YOLO11, fine-tuning these models for the task of ripcurrent segmentation. Results are reported in terms of multiple metrics, with aparticular focus on the $F_2$ score to prioritize recall and reduce falsenegatives. To enhance segmentation performance, we introduce a novelpost-processing step based on Temporal Confidence Aggregation (TCA). RipVISaims to set a new standard for rip current segmentation, contributing towardssafer beach environments. We offer a benchmark website to share data, models,and results with the research community, encouraging ongoing collaboration andfuture contributions, at https://ripvis.ai.