We propose im2nerf, a learning framework that predicts a continuous neuralobject representation given a single input image in the wild, supervised byonly segmentation output from off-the-shelf recognition methods. The standardapproach to constructing neural radiance fields takes advantage of multi-viewconsistency and requires many calibrated views of a scene, a requirement thatcannot be satisfied when learning on large-scale image data in the wild. Wetake a step towards addressing this shortcoming by introducing a model thatencodes the input image into a disentangled object representation that containsa code for object shape, a code for object appearance, and an estimated camerapose from which the object image is captured. Our model conditions a NeRF onthe predicted object representation and uses volume rendering to generateimages from novel views. We train the model end-to-end on a large collection ofinput images. As the model is only provided with single-view images, theproblem is highly under-constrained. Therefore, in addition to using areconstruction loss on the synthesized input view, we use an auxiliaryadversarial loss on the novel rendered views. Furthermore, we leverage objectsymmetry and cycle camera pose consistency. We conduct extensive quantitativeand qualitative experiments on the ShapeNet dataset as well as qualitativeexperiments on Open Images dataset. We show that in all cases, im2nerf achievesthe state-of-the-art performance for novel view synthesis from a single-viewunposed image in the wild.