Differentiable Dynamic Programming for Structured Prediction and Attention

  • 2018-02-11 01:36:06
  • Arthur Mensch, Mathieu Blondel
  • 71

Abstract

Dynamic programming (DP) solves a variety of structured combinatorialproblems by iteratively breaking them down into smaller subproblems. In spiteof their versatility, DP algorithms are usually non-differentiable, whichhampers their use as a layer in neural networks trained by backpropagation. Toaddress this issue, we propose to smooth the max operator in the dynamicprogramming recursion, using a strongly convex regularizer. This allows torelax both the optimal value and solution of the original combinatorialproblem, and turns a broad class of DP algorithms into differentiableoperators. Theoretically, we provide a new probabilistic perspective onbackpropagating through these DP operators, and relate them to inference ingraphical models. We derive two particular instantiations of our framework, asmoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithmfor time-series alignment. We showcase these instantiations on two structuredprediction tasks and on structured and sparse attention for neural machinetranslation.

 

Quick Read (beta)

loading the full paper ...