Abstract
The nonlinear and unstable aerodynamic interference generated by the tandemwings of such biomimetic systems poses substantial challenges for motioncontrol, especially under multiple random operating conditions. To addressthese challenges, the Concerto Reinforcement Learning Extension (CRL2E)algorithm has been developed. This plug-and-play, fully on-the-job, real-timereinforcement learning algorithm incorporates a novel Physics-InspiredRule-Based Policy Composer Strategy with a Perturbation Module alongside alightweight network optimized for real-time control. To validate theperformance and the rationality of the module design, experiments wereconducted under six challenging operating conditions, comparing seven differentalgorithms. The results demonstrate that the CRL2E algorithm achieves safe andstable training within the first 500 steps, improving tracking accuracy by 14to 66 times compared to the Soft Actor-Critic, Proximal Policy Optimization,and Twin Delayed Deep Deterministic Policy Gradient algorithms. Additionally,CRL2E significantly enhances performance under various random operatingconditions, with improvements in tracking accuracy ranging from 8.3% to 60.4%compared to the Concerto Reinforcement Learning (CRL) algorithm. Theconvergence speed of CRL2E is 36.11% to 57.64% faster than the CRL algorithmwith only the Composer Perturbation and 43.52% to 65.85% faster than the CRLalgorithm when both the Composer Perturbation and Time-Interleaved CapabilityPerturbation are introduced, especially in conditions where the standard CRLstruggles to converge. Hardware tests indicate that the optimized lightweightnetwork structure excels in weight loading and average inference time, meetingreal-time control requirements.