Computer-aided detection systems based on deep learning have shown greatpotential in breast cancer detection. However, the lack of domaingeneralization of artificial neural networks is an important obstacle to theirdeployment in changing clinical environments. In this work, we explore thedomain generalization of deep learning methods for mass detection in digitalmammography and analyze in-depth the sources of domain shift in a large-scalemulti-center setting. To this end, we compare the performance of eightstate-of-the-art detection methods, including Transformer-based models, trainedin a single domain and tested in five unseen domains. Moreover, a single-sourcemass detection training pipeline is designed to improve the domaingeneralization without requiring images from the new domain. The results showthat our workflow generalizes better than state-of-the-art transferlearning-based approaches in four out of five domains while reducing the domainshift caused by the different acquisition protocols and scanner manufacturers.Subsequently, an extensive analysis is performed to identify the covariateshifts with bigger effects on the detection performance, such as due todifferences in patient age, breast density, mass size, and mass malignancy.Ultimately, this comprehensive study provides key insights and best practicesfor future research on domain generalization in deep learning-based breastcancer detection.