Abstract
Recent research in behaviour understanding through language grounding hasshown it is possible to automatically generate behaviour models from textualinstructions. These models usually have goal-oriented structure and aremodelled with different formalisms from the planning domain such as thePlanning Domain Definition Language. One major problem that still remains isthat there are no benchmark datasets for comparing the different modelgeneration approaches, as each approach is usually evaluated on domain-specificapplication. To allow the objective comparison of different methods for modelgeneration from textual instructions, in this report we introduce a datasetconsisting of 83 textual instructions in English language, their refinement ina more structured form as well as manually developed plans for each of theinstructions. The dataset is publicly available to the community.