This work shows how one can use large-scale language models (LMs) tosynthesize programming problems with verified solutions, in the form ofprogramming puzzles, which can then in turn be used to fine-tune those samemodels, improving their performance. This work builds on two recentdevelopments. First, LMs have achieved breakthroughs in non-trivial reasoningand algorithm implementation, generating code that can solve someintermediate-level competitive programming problems. However, training code LMsinvolves curated sets of natural-language problem descriptions and source-codetests and solutions, which are limited in size. Second, a new format ofprogramming challenge called a programming puzzle was introduced, which doesnot require a natural language description and is directly specified by asource-code test. In this work we show how generating synthetic programmingpuzzles and solutions, verified for correctness by a Python interpreter, can beused to improve performance in solving test puzzles from P3, a public benchmarkset of Python Programming Puzzles. Additionally, we release a dataset of 1million puzzles and solutions generated by the Codex model, which we show canimprove smaller models through fine-tuning.