Abstract
We present UltraZoom, a system for generating gigapixel-resolution images ofobjects from casually captured inputs, such as handheld phone photos. Given afull-shot image (global, low-detail) and one or more close-ups (local,high-detail), UltraZoom upscales the full image to match the fine detail andscale of the close-up examples. To achieve this, we construct a per-instancepaired dataset from the close-ups and adapt a pretrained generative model tolearn object-specific low-to-high resolution mappings. At inference, we applythe model in a sliding window fashion over the full image. Constructing thesepairs is non-trivial: it requires registering the close-ups within the fullimage for scale estimation and degradation alignment. We introduce a simple,robust method for getting registration on arbitrary materials in casual,in-the-wild captures. Together, these components form a system that enablesseamless pan and zoom across the entire object, producing consistent,photorealistic gigapixel imagery from minimal input.