Abstract
Many machine learning models operate on images, but ignore the fact thatimages are 2D projections formed by 3D geometry interacting with light, in aprocess called rendering. Enabling ML models to understand image formationmight be key for generalization. However, due to an essential rasterizationstep involving discrete assignment operations, rendering pipelines arenon-differentiable and thus largely inaccessible to gradient-based MLtechniques. In this paper, we present {\emph DIB-R}, a differentiable renderingframework which allows gradients to be analytically computed for all pixels inan image. Key to our approach is to view foreground rasterization as a weightedinterpolation of local properties and background rasterization as adistance-based aggregation of global geometry. Our approach allows for accurateoptimization over vertex positions, colors, normals, light directions andtexture coordinates through a variety of lighting models. We showcase ourapproach in two ML applications: single-image 3D object prediction, and 3Dtextured object generation, both trained using exclusively using 2Dsupervision. Our project website is: https://nv-tlabs.github.io/DIB-R/