We consider two important aspects in understanding and editing images:modeling regular, program-like texture or patterns in 2D planes, and 3D posingof these planes in the scene. Unlike prior work on image-based programsynthesis, which assumes the image contains a single visible 2D plane, wepresent Box Program Induction (BPI), which infers a program-like scenerepresentation that simultaneously models repeated structure on multiple 2Dplanes, the 3D position and orientation of the planes, and camera parameters,all from a single image. Our model assumes a box prior, i.e., that the imagecaptures either an inner view or an outer view of a box in 3D. It uses neuralnetworks to infer visual cues such as vanishing points, wireframe lines toguide a search-based algorithm to find the program that best explains theimage. Such a holistic, structured scene representation enables 3D-awareinteractive image editing operations such as inpainting missing pixels,changing camera parameters, and extrapolate the image contents.