We conduct a detailed experiment on major cash fx pairs, accuratelyaccounting for transaction and funding costs. These sources of profit and loss,including the price trends that occur in the currency markets, are madeavailable to our recurrent reinforcement learner via a quadratic utility, whichlearns to target a position directly. We improve upon earlier work, by castingthe problem of learning to target a risk position, in an online learningcontext. This online learning occurs sequentially in time, but also in the formof transfer learning. We transfer the output of radial basis function hiddenprocessing units, whose means, covariances and overall size are determined byGaussian mixture models, to the recurrent reinforcement learner and baselinemomentum trader. Thus the intrinsic nature of the feature space is learnt andmade available to the upstream models. The recurrent reinforcement learningtrader achieves an annualised portfolio information ratio of 0.52 with compoundreturn of 9.3%, net of execution and funding cost, over a 7 year test set. Thisis despite forcing the model to trade at the close of the trading day 5pm EST,when trading costs are statistically the most expensive. These results arecomparable with the momentum baseline trader, reflecting the low interestdifferential environment since the the 2008 financial crisis, and very obviouscurrency trends since then. The recurrent reinforcement learner doesnevertheless maintain an important advantage, in that the model's weights canbe adapted to reflect the different sources of profit and loss variation. Thisis demonstrated visually by a USDRUB trading agent, who learns to targetdifferent positions, that reflect trading in the absence or presence of cost.