Abstract
Empathetic response generation, aiming to understand the user's situation andfeelings and respond empathically, is crucial in building human-like dialoguesystems. Traditional approaches typically employ maximum likelihood estimationas the optimization objective during training, yet fail to align the empathylevels between generated and target responses. To this end, we propose anempathetic response generation framework using reinforcement learning (EmpRL).The framework develops an effective empathy reward function and generatesempathetic responses by maximizing the expected reward through reinforcementlearning. EmpRL utilizes the pre-trained T5 model as the generator and furtherfine-tunes it to initialize the policy. To align the empathy levels betweengenerated and target responses within a given context, an empathy rewardfunction containing three empathy communication mechanisms -- emotionalreaction, interpretation, and exploration -- is constructed using pre-designedand pre-trained empathy identifiers. During reinforcement learning training,the proximal policy optimization algorithm is used to fine-tune the policy,enabling the generation of empathetic responses. Both automatic and humanevaluations demonstrate that the proposed EmpRL framework significantlyimproves the quality of generated responses, enhances the similarity in empathylevels between generated and target responses, and produces empatheticresponses covering both affective and cognitive aspects.