Many real-world visual recognition use-cases can not directly benefit fromstate-of-the-art CNN-based approaches because of the lack of many annotateddata. The usual approach to deal with this is to transfer a representationpre-learned on a large annotated source-task onto a target-task of interest.This raises the question of how well the original representation is"universal", that is to say directly adapted to many different target-tasks. Toimprove such universality, the state-of-the-art consists in training networkson a diversified source problem, that is modified either by adding generic orspecific categories to the initial set of categories. In this vein, we proposeda method that exploits finer-classes than the most specific ones existing, forwhich no annotation is available. We rely on unsupervised learning and abottom-up split and merge strategy. We show that our method learns moreuniversal representations than state-of-the-art, leading to significantlybetter results on 10 target-tasks from multiple domains, using several networkarchitectures, either alone or combined with networks learned at a coarsersemantic level.