We define general linguistic intelligence as the ability to reuse previouslyacquired knowledge about a language's lexicon, syntax, semantics, and pragmaticconventions to adapt to new tasks quickly. Using this definition, we analyzestate-of-the-art natural language understanding models and conduct an extensiveempirical investigation to evaluate them against these criteria through aseries of experiments that assess the task-independence of the knowledge beingacquired by the learning process. In addition to task performance, we propose anew evaluation metric based on an online encoding of the test data thatquantifies how quickly an existing agent (model) learns a new task. Our resultsshow that while the field has made impressive progress in terms of modelarchitectures that generalize to many tasks, these models still require a lotof in-domain training examples (e.g., for fine tuning, training task-specificmodules), and are prone to catastrophic forgetting. Moreover, we find that farfrom solving general tasks (e.g., document question answering), our models areoverfitting to the quirks of particular datasets (e.g., SQuAD). We discussmissing components and conjecture on how to make progress toward generallinguistic intelligence.