YOLOv11-RGBT: Towards a Comprehensive Single-Stage Multispectral Object Detection Framework

Abstract

Multispectral object detection, which integrates information from multiplebands, can enhance detection accuracy and environmental adaptability, holdinggreat application potential across various fields. Although existing methodshave made progress in cross-modal interaction, low-light conditions, and modellightweight, there are still challenges like the lack of a unified single-stageframework, difficulty in balancing performance and fusion strategy, andunreasonable modality weight allocation. To address these, based on the YOLOv11framework, we present YOLOv11-RGBT, a new comprehensive multimodal objectdetection framework. We designed six multispectral fusion modes andsuccessfully applied them to models from YOLOv3 to YOLOv12 and RT-DETR. Afterreevaluating the importance of the two modalities, we proposed a P3 mid-fusionstrategy and multispectral controllable fine-tuning (MCF) strategy formultispectral models. These improvements optimize feature fusion, reduceredundancy and mismatches, and boost overall model performance. Experimentsshow our framework excels on three major open-source multispectral objectdetection datasets, like LLVIP and FLIR. Particularly, the multispectralcontrollable fine-tuning strategy significantly enhanced model adaptability androbustness. On the FLIR dataset, it consistently improved YOLOv11 models' mAPby 3.41%-5.65%, reaching a maximum of 47.61%, verifying the framework andstrategies' effectiveness. The code is available at:https://github.com/wandahangFY/YOLOv11-RGBT.

Quick Read (beta)

loading the full paper ...