Unit Test Case Generation with Transformers

Abstract

Automated Unit Test Case generation has been the focus of extensiveliterature within the research community. Existing approaches are usuallyguided by the test coverage criteria, generating synthetic test cases that areoften difficult to read or understand for developers. In this paper we proposeAthenaTest, an approach that aims at generating unit test cases by learningfrom real-world, developer-written test cases. Our approach relies on astate-of-the-art sequence-to-sequence transformer model which is able to writeuseful test cases for a given method under test (i.e., focal method). We alsointroduce methods2test - the largest publicly available supervised parallelcorpus of unit test case methods and corresponding focal methods in Java, whichcomprises 630k test cases mined from 70k open-source repositories hosted onGitHub. We use this dataset to train a transformer model to translate focalmethods into the corresponding test cases. We evaluate the ability of our modelin generating test cases using natural language processing as well ascode-specific criteria. First, we assess the quality of the translationcompared to the target test case, then we analyze properties of the test casesuch as syntactic correctness and number and variety of testing APIs (e.g.,asserts). We execute the test cases, collect test coverage information, andcompare them with test cases generated by EvoSuite and GPT-3. Finally, wesurvey professional developers on their preference in terms of readability,understandability, and testing effectiveness of the generated test cases.

Quick Read (beta)

loading the full paper ...