This work demonstrates the potential of deep reinforcement learningtechniques for transmit power control in wireless networks. Existing techniquestypically find near-optimal power allocations by solving a challengingoptimization problem. Most of these algorithms are not scalable to largenetworks in real-world scenarios because of their computational complexity andinstantaneous cross-cell channel state information (CSI) requirement. In thispaper, a distributively executed dynamic power allocation scheme is developedbased on model-free deep reinforcement learning. Each transmitter collects CSIand quality of service (QoS) information from several neighbors and adapts itsown transmit power accordingly. The objective is to maximize a weightedsum-rate utility function, which can be particularized to achieve maximumsum-rate or proportionally fair scheduling. Both random variations and delaysin the CSI are inherently addressed using deep Q-learning. For a typicalnetwork architecture, the proposed algorithm is shown to achieve near-optimalpower allocation in real time based on delayed CSI measurements available tothe agents. The proposed scheme is especially suitable for practical scenarioswhere the system model is inaccurate and CSI delay is non-negligible.