A model of music needs to have the ability to recall past details and have aclear, coherent understanding of musical structure. Detailed in the paper is adeep reinforcement learning architecture that predicts and generates polyphonicmusic aligned with musical rules. The probabilistic model presented is aBi-axial LSTM trained with a pseudo-kernel reminiscent of a convolutionalkernel. To encourage exploration and impose greater global coherence on thegenerated music, a deep reinforcement learning approach DQN is adopted. Whenanalyzed quantitatively and qualitatively, this approach performs well incomposing polyphonic music.