Abstract
Deep reinforcement learning has demonstrated superhuman performance incomplex decision-making tasks, but it struggles with generalization andknowledge reuse - key aspects of true intelligence. This article introduces anovel approach that modifies Cycle Generative Adversarial Networks specificallyfor reinforcement learning, enabling effective one-to-one knowledge transferbetween two tasks. Our method enhances the loss function with two newcomponents: model loss, which captures dynamic relationships between source andtarget tasks, and Q-loss, which identifies states significantly influencing thetarget decision policy. Tested on the 2-D Atari game Pong, our method achieved100% knowledge transfer in identical tasks and either 100% knowledge transferor a 30% reduction in training time for a rotated task, depending on thenetwork architecture. In contrast, using standard Generative AdversarialNetworks or Cycle Generative Adversarial Networks led to worse performance thantraining from scratch in the majority of cases. The results demonstrate thatthe proposed method ensured enhanced knowledge generalization in deepreinforcement learning.