A Comprehensive Survey on Segment Anything Model for Vision and Beyond

Abstract

Artificial intelligence (AI) is evolving towards artificial generalintelligence, which refers to the ability of an AI system to perform a widerange of tasks and exhibit a level of intelligence similar to that of a humanbeing. This is in contrast to narrow or specialized AI, which is designed toperform specific tasks with a high degree of efficiency. Therefore, it isurgent to design a general class of models, which we term foundation models,trained on broad data that can be adapted to various downstream tasks. Therecently proposed segment anything model (SAM) has made significant progress inbreaking the boundaries of segmentation, greatly promoting the development offoundation models for computer vision. To fully comprehend SAM, we conduct asurvey study. As the first to comprehensively review the progress of segmentinganything task for vision and beyond based on the foundation model of SAM, thiswork focuses on its applications to various tasks and data types by discussingits historical development, recent progress, and profound impact on broadapplications. We first introduce the background and terminology for foundationmodels including SAM, as well as state-of-the-art methods contemporaneous withSAM that are significant for segmenting anything task. Then, we analyze andsummarize the advantages and limitations of SAM across various image processingapplications, including software scenes, real-world scenes, and complex scenes.Importantly, some insights are drawn to guide future research to develop moreversatile foundation models and improve the architecture of SAM. We alsosummarize massive other amazing applications of SAM in vision and beyond.

Quick Read (beta)

loading the full paper ...