This paper shows how reinforcement learning can be used to derive optimalhedging strategies for derivatives when there are transaction costs. The paperillustrates the approach by showing the difference between using delta hedgingand optimal hedging for a short position in a call option when the objective isto minimize a function equal to the mean hedging cost plus a constant times thestandard deviation of the hedging cost. Two situations are considered. In thefirst, the asset price follows a geometric Brownian motion. In the second, theasset price follows a stochastic volatility process. The paper extends thebasic reinforcement learning approach in a number of ways. First, it uses twodifferent Q-functions so that both the expected value of the cost and theexpected value of the square of the cost are tracked for different state/actioncombinations. This approach increases the range of objective functions that canbe used. Second, it uses a learning algorithm that allows for continuous stateand action space. Third, it compares the accounting P&L approach (where thehedged position is valued at each step) and the cash flow approach (where cashinflows and outflows are used). We find that a hybrid approach involving theuse of an accounting P&L approach that incorporates a relatively simplevaluation model works well. The valuation model does not have to correspond tothe process assumed for the underlying asset price.