Variational data assimilation optimizes for an initial state of a dynamicalsystem such that its evolution fits observational data. The physical model cansubsequently be evolved into the future to make predictions. This principle isa cornerstone of large scale forecasting applications such as numerical weatherprediction. As such, it is implemented in current operational systems ofweather forecasting agencies across the globe. However, finding a good initialstate poses a difficult optimization problem in part due to the non-invertiblerelationship between physical states and their corresponding observations. Welearn a mapping from observational data to physical states and show how it canbe used to improve optimizability. We employ this mapping in two ways: tobetter initialize the non-convex optimization problem, and to reformulate theobjective function in better behaved physics space instead of observationspace. Our experimental results for the Lorenz96 model and a two-dimensionalturbulent fluid flow demonstrate that this procedure significantly improvesforecast quality for chaotic systems.