We introduce three new robustness benchmarks consisting of naturallyoccurring distribution changes in image style, geographic location, cameraoperation, and more. Using our benchmarks, we take stock of previously proposedhypotheses for out-of-distribution robustness and put them to the test. We findthat using larger models and synthetic data augmentation can improve robustnesson real-world distribution shifts, contrary to claims in prior work. Motivatedby this, we introduce a new data augmentation method which advances thestate-of-the-art and outperforms models pretrained with 1000x more labeleddata. We find that some methods consistently help with distribution shifts intexture and local image statistics, but these methods do not help with someother distribution shifts like geographic changes. We conclude that futureresearch must study multiple distribution shifts simultaneously.