Humans are remarkably flexible when understanding new sentences that includecombinations of concepts they have never encountered before. Recent work hasshown that while deep networks can mimic some human language abilities whenpresented with novel sentences, systematic variation uncovers the limitationsin the language-understanding abilities of networks. We demonstrate that theselimitations can be overcome by addressing the generalization challenges in arecently-released dataset, gSCAN, which explicitly measures how well an agentis able to interpret novel ideas grounded in vision, e.g., novel pairings ofadjectives and nouns. The key principle we employ is compositionality: that thecompositional structure of networks should reflect the compositional structureof the problem domain they address, while allowing other parameters andproperties to be learned end-to-end with weak supervision. We build ageneral-purpose mechanism that enables robots to generalize their languageunderstanding to compositional domains. Crucially, our network has the samestate-of-the-art performance as prior work while at the same time generalizingits knowledge when prior work does not. Our network also provides a level ofinterpretability that enables users to inspect what each part of networkslearns. Robust language understanding without dramatic failures and withoutcorner cases is critical to building safe and fair robots; we demonstrate thesignificant role that compositionality can play in achieving that goal.