Few-shot learning with large-scale, pre-trained language models is a powerfulway to answer questions about code, e.g., how to complete a given code example,or even generate code snippets from scratch. The success of these models raisesthe question whether they could serve as a basis for building a wide range codegeneration tools. Traditionally, such tools are built manually and separatelyfor each task. Instead, few-shot learning may allow to obtain different toolsfrom a single pre-trained language model by simply providing a few examples ora natural language description of the expected tool behavior. This paperstudies to what extent a state-of-the-art, pre-trained language model of code,Codex, may serve this purpose. We consider three code manipulation and codegeneration tasks targeted by a range of traditional tools: (i) code mutation;(ii) test oracle generation from natural language documentation; and (iii) testcase generation. For each task, we compare few-shot learning to a manuallybuilt tool. Our results show that the model-based tools complement (codemutation), are on par (test oracle generation), or even outperform theirrespective traditionally built tool (test case generation), while imposing farless effort to develop them. By comparing the effectiveness of differentvariants of the model-based tools, we provide insights on how to design anappropriate input ("prompt") to the model and what influence the size of themodel has. For example, we find that providing a small natural languagedescription of the code generation task is an easy way to improve predictions.Overall, we conclude that few-shot language models are surprisingly effective,yet there is still more work to be done, such as exploring more diverse ways ofprompting and tackling even more involved tasks.