Abstract
Recent work shows Large Language Models (LLMs) struggle to understand naturallanguage constraints for various text generation tasks in zero- and few-shotsettings. While, in the code domain, there is wide usage of constraints in codeformat to maintain the integrity of code written in Domain-Specific Languages(DSLs) like JSON and YAML which are widely used for system-level programmingtasks in enterprises. Given that LLMs are increasingly used for system-levelcode tasks, evaluating if they can comprehend these code constraints iscrucial. However, no work has been done to evaluate their controllability overcode constraints. Hence, we introduce ConCodeEval, a first-of-its-kindbenchmark having two novel tasks for code constraints across fiverepresentations. Our findings suggest that language models struggle with codeconstraints. Code languages that perform excellently for normal code tasks donot perform well when the same languages represent fine-grained constraints.