Abstract
We study the ability of Wasserstein Generative Adversarial Network (WGAN) togenerate missing audio content which is, in context, (statistically similar) tothe sound and the neighboring borders. We deal with the challenge of audioinpainting long range gaps (500 ms) using WGAN models. We improved the qualityof the inpainting part using a new proposed WGAN architecture that uses ashort-range and a long-range neighboring borders compared to the classical WGANmodel. The performance was compared with two different audio instruments (pianoand guitar) and on virtuoso pianists together with a string orchestra. Theobjective difference grading (ODG) was used to evaluate the performance of botharchitectures. The proposed model outperforms the classical WGAN model andimproves the reconstruction of high-frequency content. Further, we got betterresults for instruments where the frequency spectrum is mainly in the lowerrange where small noises are less annoying for human ear and the inpaintingpart is more perceptible. Finally, we could show that better test results foraudio dataset were reached where a particular instrument is accompanist byother instruments if we train the network only on this particular instrumentneglecting the other instruments.