Abstract
This study investigates the structured generation capabilities of largelanguage models (LLMs), focusing on producing valid JSON outputs against agiven schema. Despite the widespread use of JSON in integrating language modelswith programs, there is a lack of comprehensive analysis and benchmarking ofthese capabilities. We explore various aspects of JSON generation, such asstructure understanding, escaping, and natural language description, todetermine how to assess and enable LLMs to generate valid responses. Buildingupon this, we propose SchemaBench features around 40K different JSON schemas toobtain and assess models' abilities in generating valid JSON. We find that thelatest LLMs are still struggling to generate a valid JSON string. Moreover, wedemonstrate that incorporating reinforcement learning with a Fine-grainedSchema Validator can further enhance models' understanding of JSON schema,leading to improved performance. Our models demonstrate significant improvementin both generating JSON outputs and downstream tasks.