Abstract
We present a technique and benchmark dataset for estimating the relative 3Dorientation between a pair of Internet images captured in an extreme setting,where the images have limited or non-overlapping field of views. Prior worktargeting extreme rotation estimation assume constrained 3D environments andemulate perspective images by cropping regions from panoramic views. However,real images captured in the wild are highly diverse, exhibiting variation inboth appearance and camera intrinsics. In this work, we propose aTransformer-based method for estimating relative rotations in extremereal-world settings, and contribute the ExtremeLandmarkPairs dataset, assembledfrom scene-level Internet photo collections. Our evaluation demonstrates thatour approach succeeds in estimating the relative rotations in a wide variety ofextreme-view Internet image pairs, outperforming various baselines, includingdedicated rotation estimation techniques and contemporary 3D reconstructionmethods.