Reinforcement Learning (RL) is a semi-supervised learning paradigm which anagent learns by interacting with an environment. Deep learning in combinationwith RL provides an efficient method to learn how to interact with theenvironment is called Deep Reinforcement Learning (deep RL). Deep RL has gainedtremendous success in gaming - such as AlphaGo, but its potential have rarelybeing explored for challenging tasks like Speech Emotion Recognition (SER). Thedeep RL being used for SER can potentially improve the performance of anautomated call centre agent by dynamically learning emotional-aware response tocustomer queries. While the policy employed by the RL agent plays a major rolein action selection, there is no current RL policy tailored for SER. Inaddition, extended learning period is a general challenge for deep RL which canimpact the speed of learning for SER. Therefore, in this paper, we introduce anovel policy - "Zeta policy" which is tailored for SER and apply Pre-trainingin deep RL to achieve faster learning rate. Pre-training with cross dataset wasalso studied to discover the feasibility of pre-training the RL Agent with asimilar dataset in a scenario of where no real environmental data is notavailable. IEMOCAP and SAVEE datasets were used for the evaluation with theproblem being to recognize four emotions happy, sad, angry and neutral in theutterances provided. Experimental results show that the proposed "Zeta policy"performs better than existing policies. The results also support thatpre-training can reduce the training time upon reducing the warm-up period andis robust to cross-corpus scenario.