Abstract
Generating multi-instrument music from symbolic music representations is animportant task in Music Information Retrieval (MIR). A central but stilllargely unsolved problem in this context is musically and acoustically informedcontrol in the generation process. As the main contribution of this work, wepropose enhancing control of multi-instrument synthesis by conditioning agenerative model on a specific performance and recording environment, thusallowing for better guidance of timbre and style. Building on state-of-the-artdiffusion-based music generative models, we introduce performance conditioning- a simple tool indicating the generative model to synthesize music with styleand timbre of specific instruments taken from specific performances. Ourprototype is evaluated using uncurated performances with diverseinstrumentation and achieves state-of-the-art FAD realism scores while allowingnovel timbre and style control. Our project page, including samples anddemonstrations, is available at benadar293.github.io/midipm