There has been a recent surge in methods that aim to decompose and segmentscenes into multiple objects in an unsupervised manner, i.e., unsupervisedmulti-object segmentation. Performing such a task is a long-standing goal ofcomputer vision, offering to unlock object-level reasoning without requiringdense annotations to train segmentation models. Despite significant progress,current models are developed and trained on visually simple scenes depictingmono-colored objects on plain backgrounds. The natural world, however, isvisually complex with confounding aspects such as diverse textures andcomplicated lighting effects. In this study, we present a new benchmark calledClevrTex, designed as the next challenge to compare, evaluate and analyzealgorithms. ClevrTex features synthetic scenes with diverse shapes, texturesand photo-mapped materials, created using physically based renderingtechniques. It includes 50k examples depicting 3-10 objects arranged on abackground, created using a catalog of 60 materials, and a further test setfeaturing 10k images created using 25 different materials. We benchmark a largeset of recent unsupervised multi-object segmentation models on ClevrTex andfind all state-of-the-art approaches fail to learn good representations in thetextured setting, despite impressive performance on simpler data. We alsocreate variants of the ClevrTex dataset, controlling for different aspects ofscene complexity, and probe current approaches for individual shortcomings.Dataset and code are available athttps://www.robots.ox.ac.uk/~vgg/research/clevrtex.