Abstract
Can objects that are not visible in an image -- but are in the vicinity ofthe camera -- be detected? This study introduces the novel tasks of 2D, 2.5Dand 3D unobserved object detection for predicting the location of nearbyobjects that are occluded or lie outside the image frame. We adapt severalstate-of-the-art pre-trained generative models to address this task, including2D and 3D diffusion models and vision-language models, and show that they canbe used to infer the presence of objects that are not directly observed. Tobenchmark this task, we propose a suite of metrics that capture differentaspects of performance. Our empirical evaluation on indoor scenes from theRealEstate10k and NYU Depth v2 datasets demonstrate results that motivate theuse of generative models for the unobserved object detection task.