Abstract
In this work, we propose a progressive scaling training strategy for visualobject tracking, systematically analyzing the influence of training datavolume, model size, and input resolution on tracking performance. Our empiricalstudy reveals that while scaling each factor leads to significant improvementsin tracking accuracy, naive training suffers from suboptimal optimization andlimited iterative refinement. To address this issue, we introduce DT-Training,a progressive scaling framework that integrates small teacher transfer anddual-branch alignment to maximize model potential. The resulting scaled trackerconsistently outperforms state-of-the-art methods across multiple benchmarks,demonstrating strong generalization and transferability of the proposed method.Furthermore, we validate the broader applicability of our approach toadditional tasks, underscoring its versatility beyond tracking.