Understanding the continuous states of objects is essential for task learningand planning in the real world. However, most existing task learning benchmarksassume discrete (e.g., binary) object goal states, which poses challenges forthe learning of complex tasks and transferring learned policy from simulatedenvironments to the real world. Furthermore, state discretization limits arobot's ability to follow human instructions based on the grounding of actionsand states. To tackle these challenges, we present ARNOLD, a benchmark thatevaluates language-grounded task learning with continuous states in realistic3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involveunderstanding object states and learning policies for continuous goals. Topromote language-instructed learning, we provide expert demonstrations withtemplate-generated language descriptions. We assess task performance byutilizing the latest language-conditioned policy learning models. Our resultsindicate that current models for language-conditioned manipulations continue toexperience significant challenges in novel goal-state generalizations, scenegeneralizations, and object generalizations. These findings highlight the needto develop new algorithms that address this gap and underscore the potentialfor further research in this area. Project website:https://arnold-benchmark.github.io.