Abstract
This paper introduces Grounding DINO 1.5, a suite of advanced open-set objectdetection models developed by IDEA Research, which aims to advance the "Edge"of open-set object detection. The suite encompasses two models: Grounding DINO1.5 Pro, a high-performance model designed for stronger generalizationcapability across a wide range of scenarios, and Grounding DINO 1.5 Edge, anefficient model optimized for faster speed demanded in many applicationsrequiring edge deployment. The Grounding DINO 1.5 Pro model advances itspredecessor by scaling up the model architecture, integrating an enhancedvision backbone, and expanding the training dataset to over 20 million imageswith grounding annotations, thereby achieving a richer semantic understanding.The Grounding DINO 1.5 Edge model, while designed for efficiency with reducedfeature scales, maintains robust detection capabilities by being trained on thesame comprehensive dataset. Empirical results demonstrate the effectiveness ofGrounding DINO 1.5, with the Grounding DINO 1.5 Pro model attaining a 54.3 APon the COCO detection benchmark and a 55.7 AP on the LVIS-minival zero-shottransfer benchmark, setting new records for open-set object detection.Furthermore, the Grounding DINO 1.5 Edge model, when optimized with TensorRT,achieves a speed of 75.2 FPS while attaining a zero-shot performance of 36.2 APon the LVIS-minival benchmark, making it more suitable for edge computingscenarios. Model examples and demos with API will be released athttps://github.com/IDEA-Research/Grounding-DINO-1.5-API