Abstract
Alpha factor mining aims to discover investment signals from the historicalfinancial market data, which can be used to predict asset returns and gainexcess profits. Powerful deep learning methods for alpha factor mining lackinterpretability, making them unacceptable in the risk-sensitive real markets.Formulaic alpha factors are preferred for their interpretability, while thesearch space is complex and powerful explorative methods are urged. Recently, apromising framework is proposed for generating formulaic alpha factors usingdeep reinforcement learning, and quickly gained research focuses from bothacademia and industries. This paper first argues that the originally employedpolicy training method, i.e., Proximal Policy Optimization (PPO), faces severalimportant issues in the context of alpha factors mining. Herein, a novelreinforcement learning algorithm based on the well-known REINFORCE algorithm isproposed. REINFORCE employs Monte Carlo sampling to estimate the policygradient-yielding unbiased but high variance estimates. The minimalenvironmental variability inherent in the underlying state transition function,which adheres to the Dirac distribution, can help alleviate this high varianceissue, making REINFORCE algorithm more appropriate than PPO. A new dedicatedbaseline is designed to theoretically reduce the commonly suffered highvariance of REINFORCE. Moreover, the information ratio is introduced as areward shaping mechanism to encourage the generation of steady alpha factorsthat can better adapt to changes in market volatility. Evaluations on realassets data indicate the proposed algorithm boosts correlation with returns by3.83\%, and a stronger ability to obtain excess returns compared to the latestalpha factors mining methods, which meets the theoretical results well.