Abstract
Deep reinforcement learning (DRL) has reached super human levels in complextasks like game solving (Go and autonomous driving). However, it remains anopen question whether DRL can reach human level in applications to financialproblems and in particular in detecting pattern crisis and consequentlydis-investing. In this paper, we present an innovative DRL framework consistingin two sub-networks fed respectively with portfolio strategies pastperformances and standard deviations as well as additional contextual features.The second sub network plays an important role as it captures dependencies withcommon financial indicators features like risk aversion, economic surpriseindex and correlations between assets that allows taking into account contextbased information. We compare different network architectures either usinglayers of convolutions to reduce network's complexity or LSTM block to capturetime dependency and whether previous allocations is important in the modeling.We also use adversarial training to make the final model more robust. Resultson test set show this approach substantially over-performs traditionalportfolio optimization methods like Markowitz and is able to detect andanticipate crisis like the current Covid one.