Abstract
Combining multiple object detection datasets offers a path to improvedgeneralisation but is hindered by inconsistencies in class semantics andbounding box annotations. Some methods to address this assume shared labeltaxonomies and address only spatial inconsistencies; others require manualrelabelling, or produce a unified label space, which may be unsuitable when afixed target label space is required. We propose Label-Aligned Transfer (LAT),a label transfer framework that systematically projects annotations fromdiverse source datasets into the label space of a target dataset. LAT begins bytraining dataset-specific detectors to generate pseudo-labels, which are thencombined with ground-truth annotations via a Privileged Proposal Generator(PPG) that replaces the region proposal network in two-stage detectors. Tofurther refine region features, a Semantic Feature Fusion (SFF) module injectsclass-aware context and features from overlapping proposals using aconfidence-weighted attention mechanism. This pipeline preservesdataset-specific annotation granularity while enabling many-to-one label spacetransfer across heterogeneous datasets, resulting in a semantically andspatially aligned representation suitable for training a downstream detector.LAT thus jointly addresses both class-level misalignments and bounding boxinconsistencies without relying on shared label spaces or manual annotations.Across multiple benchmarks, LAT demonstrates consistent improvements intarget-domain detection performance, achieving gains of up to +4.8AP oversemi-supervised baselines.