Abstract
The diversity and complexity of degradations in real-world videosuper-resolution (VSR) pose non-trivial challenges in inference and training.First, while long-term propagation leads to improved performance in cases ofmild degradations, severe in-the-wild degradations could be exaggerated throughpropagation, impairing output quality. To balance the tradeoff between detailsynthesis and artifact suppression, we found an image pre-cleaning stageindispensable to reduce noises and artifacts prior to propagation. Equippedwith a carefully designed cleaning module, our RealBasicVSR outperformsexisting methods in both quality and efficiency. Second, real-world VSR modelsare often trained with diverse degradations to improve generalizability,requiring increased batch size to produce a stable gradient. Inevitably, theincreased computational burden results in various problems, including 1)speed-performance tradeoff and 2) batch-length tradeoff. To alleviate the firsttradeoff, we propose a stochastic degradation scheme that reduces up to 40\% oftraining time without sacrificing performance. We then analyze differenttraining settings and suggest that employing longer sequences rather thanlarger batches during training allows more effective uses of temporalinformation, leading to more stable performance during inference. To facilitatefair comparisons, we propose the new VideoLQ dataset, which contains a largevariety of real-world low-quality video sequences containing rich textures andpatterns. Our dataset can serve as a common ground for benchmarking. Code,models, and the dataset will be made publicly available.