Hierarchy and compositionality are common latent properties in many naturaland scientific datasets. Determining when a deep network's hidden activationsrepresent hierarchy and compositionality is important both for understandingdeep representation learning and for applying deep networks in domains whereinterpretability is crucial. However, current benchmark machine learningdatasets either have little hierarchical or compositional structure, or thestructure is not known. This gap impedes precise analysis of a network'srepresentations and thus hinders development of new methods that can learn suchproperties. To address this gap, we developed a new benchmark dataset withknown hierarchical and compositional structure. The Hangul Fonts Dataset (HFD)is comprised of 35 fonts from the Korean writing system (Hangul), each with11,172 blocks (syllables) composed from the product of initial consonant,medial vowel, and final consonant glyphs. All blocks can be grouped into a fewgeometric types which induces a hierarchy across blocks. In addition, eachblock is composed of individual glyphs with rotations, translations, scalings,and naturalistic style variation across fonts. We find that both shallow anddeep unsupervised methods only show modest evidence of hierarchy andcompositionality in their representations of the HFD compared to superviseddeep networks. Supervised deep network representations contain structurerelated to the geometrical hierarchy of the characters, but the compositionalstructure of the data is not evident. Thus, HFD enables the identification ofshortcomings in existing methods, a critical first step toward developing newmachine learning algorithms to extract hierarchical and compositional structurein the context of naturalistic variability.