Cross-Architecture Distillation Made Simple with Redundancy Suppression

  • 2025-07-29 14:21:40
  • Weijia Zhang, Yuehao Liu, Wu Ran, Chao Ma
  • 0

Abstract

We describe a simple method for cross-architecture knowledge distillation,where the knowledge transfer is cast into a redundant information suppressionformulation. Existing methods introduce sophisticated modules,architecture-tailored designs, and excessive parameters, which impair theirefficiency and applicability. We propose to extract the architecture-agnosticknowledge in heterogeneous representations by reducing the redundantarchitecture-exclusive information. To this end, we present a simple redundancysuppression distillation (RSD) loss, which comprises cross-architectureinvariance maximisation and feature decorrelation objectives. To prevent thestudent from entirely losing its architecture-specific capabilities, we furtherdesign a lightweight module that decouples the RSD objective from the student'sinternal representations. Our method is devoid of the architecture-specificdesigns and complex operations in the pioneering method of OFA. It outperformsOFA on CIFAR-100 and ImageNet-1k benchmarks with only a fraction of theirparameter overhead, which highlights its potential as a simple and strongbaseline to the cross-architecture distillation community.

 

Quick Read (beta)

loading the full paper ...