Reinforcement learning (RL) is always the preferred embodiment to constructthe control strategy of complex tasks, like asymmetric assembly tasks. However,the convergence speed of reinforcement learning severely restricts itspractical application. In this paper, the convergence is first accelerated bycombining RL and compliance control. Then a completely innovative progressiveextension of action dimension (PEAD) mechanism is proposed to optimize theconvergence of RL algorithms. The PEAD method is verified in DDPG and PPO. Theresults demonstrate the PEAD method will enhance the data-efficiency andtime-efficiency of RL algorithms as well as increase the stable reward, whichprovides more potential for the application of RL.