Abstract
In face-to-face conversations, individuals need to switch between speakingand listening roles seamlessly. Existing 3D talking head generation modelsfocus solely on speaking or listening, neglecting the natural dynamics ofinteractive conversation, which leads to unnatural interactions and awkwardtransitions. To address this issue, we propose a new task -- multi-rounddual-speaker interaction for 3D talking head generation -- which requiresmodels to handle and generate both speaking and listening behaviors incontinuous conversation. To solve this task, we introduce DualTalk, a novelunified framework that integrates the dynamic behaviors of speakers andlisteners to simulate realistic and coherent dialogue interactions. Thisframework not only synthesizes lifelike talking heads when speaking but alsogenerates continuous and vivid non-verbal feedback when listening, effectivelycapturing the interplay between the roles. We also create a new datasetfeaturing 50 hours of multi-round conversations with over 1,000 characters,where participants continuously switch between speaking and listening roles.Extensive experiments demonstrate that our method significantly enhances thenaturalness and expressiveness of 3D talking heads in dual-speakerconversations. We recommend watching the supplementary video:https://ziqiaopeng.github.io/dualtalk.