Cooperative communication is an effective approach to improve spectrumutilization. When considering relay selection and power allocation incooperative communication, most of the existing studies require the assumptionof channel state information (CSI). However, it is difficult to get an accurateCSI in practice. In this paper, we consider an outage-based method subjected toa total transmission power constraint in the two-hop cooperative communicationscenario. We use reinforcement learning (RL) methods to learn strategies, andcomplete the optimal relay selection and power allocation, which do not needany prior knowledge of CSI but simply rely on the interaction with thecommunication environment. It is noted that conventional RL methods, includingcommon deep reinforcement learning (DRL) methods, perform poorly when thesearch space is large. Therefore, we first propose a practical DRL frameworkwith an outage-based reward function, which is used as a baseline. Then, wefurther propose our novel hierarchical reinforcement learning (HRL) algorithmfor dynamic relay selection and power allocation. A key difference from otherRL-based methods in existing literatures is that, our HRL approach decomposesrelay selection and power allocation into two hierarchical optimizationobjectives, which are trained in different levels. Simulation results revealthat our HRL algorithm trains faster and obtains a lower outage probabilitywhen compared with traditional DRL methods, especially in a sparse rewardenvironment.