Abstract
Vector-quantized networks (VQNs) have exhibited remarkable performance acrossvarious tasks, yet they are prone to training instability, which complicatesthe training process due to the necessity for techniques such as subtleinitialization and model distillation. In this study, we identify the localminima issue as the primary cause of this instability. To address this, weintegrate an optimal transport method in place of the nearest neighbor searchto achieve a more globally informed assignment. We introduce OptVQ, a novelvector quantization method that employs the Sinkhorn algorithm to optimize theoptimal transport problem, thereby enhancing the stability and efficiency ofthe training process. To mitigate the influence of diverse data distributionson the Sinkhorn algorithm, we implement a straightforward yet effectivenormalization strategy. Our comprehensive experiments on image reconstructiontasks demonstrate that OptVQ achieves 100% codebook utilization and surpassescurrent state-of-the-art VQNs in reconstruction quality.