Abstract
Blind face restoration is a highly ill-posed problem that often requiresauxiliary guidance to 1) improve the mapping from degraded inputs to desiredoutputs, or 2) complement high-quality details lost in the inputs. In thispaper, we demonstrate that a learned discrete codebook prior in a small proxyspace largely reduces the uncertainty and ambiguity of restoration mapping bycasting blind face restoration as a code prediction task, while providing richvisual atoms for generating high-quality faces. Under this paradigm, we proposea Transformer-based prediction network, named CodeFormer, to model globalcomposition and context of the low-quality faces for code prediction, enablingthe discovery of natural faces that closely approximate the target faces evenwhen the inputs are severely degraded. To enhance the adaptiveness fordifferent degradation, we also propose a controllable feature transformationmodule that allows a flexible trade-off between fidelity and quality. Thanks tothe expressive codebook prior and global modeling, CodeFormer outperforms stateof the arts in both quality and fidelity, showing superior robustness todegradation. Extensive experimental results on synthetic and real-worlddatasets verify the effectiveness of our method.