Abstract
This study presents a Reinforcement Learning (RL)-based portfolio managementmodel tailored for high-risk environments, addressing the limitations oftraditional RL models and exploiting market opportunities through two-sidedtransactions and lending. Our approach integrates a new environmentalformulation with a Profit and Loss (PnL)-based reward function, enhancing theRL agent's ability in downside risk management and capital optimization. Weimplemented the model using the Soft Actor-Critic (SAC) agent with aConvolutional Neural Network with Multi-Head Attention (CNN-MHA). This setupeffectively manages a diversified 12-crypto asset portfolio in the Binanceperpetual futures market, leveraging USDT for both granting and receiving loansand rebalancing every 4 hours, utilizing market data from the preceding 48hours. Tested over two 16-month periods of varying market volatility, the modelsignificantly outperformed benchmarks, particularly in high-volatilityscenarios, achieving higher return-to-risk ratios and demonstrating robustprofitability. These results confirm the model's effectiveness in leveragingmarket dynamics and managing risks in volatile environments like thecryptocurrency market.