Abstract
Deep Reinforcement Learning (DRL) has been extensively used to addressportfolio optimization problems. The DRL agents acquire knowledge and makedecisions through unsupervised interactions with their environment withoutrequiring explicit knowledge of the joint dynamics of portfolio assets. Amongthese DRL algorithms, the combination of actor-critic algorithms and deepfunction approximators is the most widely used DRL algorithm. Here, we findthat training the DRL agent using the actor-critic algorithm and deep functionapproximators may lead to scenarios where the improvement in the DRL agent'srisk-adjusted profitability is not significant. We propose that such situationsprimarily arise from the following two problems: sparsity in positive rewardand the curse of dimensionality. These limitations prevent DRL agents fromcomprehensively learning asset price change patterns in the trainingenvironment. As a result, the DRL agents cannot explore the dynamic portfoliooptimization policy to improve the risk-adjusted profitability in the trainingprocess. To address these problems, we propose a novel multi-agent HierarchicalDeep Reinforcement Learning (HDRL) algorithmic framework in this research.Under this framework, the agents work together as a learning system forportfolio optimization. Specifically, by designing an auxiliary agent thatworks together with the executive agent for optimal policy exploration, thelearning system can focus on exploring the policy with higher risk-adjustedreturn in the action space with positive return and low variance. In this way,we can overcome the issue of the curse of dimensionality and improve thetraining efficiency in the positive reward sparse environment.