Slot Attention with Re-Initialization and Self-Distillation

  • 2025-07-31 17:41:18
  • Rongzhen Zhao, Yi Zhao, Juho Kannala, Joni Pajarinen
  • 0

Abstract

Unlike popular solutions based on dense feature maps, Object-Centric Learning(OCL) represents visual scenes as sub-symbolic object-level feature vectors,termed slots, which are highly versatile for tasks involving visual modalities.OCL typically aggregates object superpixels into slots by iteratively applyingcompetitive cross attention, known as Slot Attention, with the slots as thequery. However, once initialized, these slots are reused naively, causingredundant slots to compete with informative ones for representing objects. Thisoften results in objects being erroneously segmented into parts. Additionally,mainstream methods derive supervision signals solely from decoding slots intothe input's reconstruction, overlooking potential supervision based on internalinformation. To address these issues, we propose Slot Attention withre-Initialization and self-Distillation (DIAS): $\emph{i)}$ We reduceredundancy in the aggregated slots and re-initialize extra aggregation toupdate the remaining slots; $\emph{ii)}$ We drive the bad attention map at thefirst aggregation iteration to approximate the good at the last iteration toenable self-distillation. Experiments demonstrate that DIAS achievesstate-of-the-art on OCL tasks like object discovery and recognition, while alsoimproving advanced visual prediction and reasoning. Our code is available onhttps://github.com/Genera1Z/DIAS.

 

Quick Read (beta)

loading the full paper ...