Mitigating Sycophancy in Decoder-Only Transformer Architectures: Synthetic Data Intervention

Abstract

To address the sycophancy problem caused by reinforcement learning from humanfeedback in large language models, this research applies synthetic dataintervention technology to the decoder-only transformer architecture. Based onthe research gaps in the existing literature, the researcher designed anexperimental process to reduce the tendency of models to cater by generatingdiversified data, and used GPT4o as an experimental tool for verification. Theexperiment used 100 true and false questions, and compared the performance ofthe model trained with synthetic data intervention and the original untrainedmodel on multiple indicators. The results show that the SDI training modelsupports the technology in terms of accuracy rate and sycophancy rate and hassignificant effectiveness in reducing sycophancy phenomena. Notably, the dataset, experimental process, code and data results have been uploaded to Github,the link is https://github.com/brucewang123456789/GeniusTrail.git.

Quick Read (beta)

loading the full paper ...