We propose an efficient inference procedure for non-autoregressive machinetranslation that iteratively refines translation purely in the continuousspace. Given a continuous latent variable model for machine translation (Shu etal., 2020), we train an inference network to approximate the gradient of themarginal log probability of the target sentence, using only the latent variableas input. This allows us to use gradient-based optimization to find the targetsentence at inference time that approximately maximizes its marginalprobability. As each refinement step only involves computation in the latentspace of low dimensionality (we use 8 in our experiments), we avoidcomputational overhead incurred by existing non-autoregressive inferenceprocedures that often refine in token space. We compare our approach to arecently proposed EM-like inference procedure (Shu et al., 2020) that optimizesin a hybrid space, consisting of both discrete and continuous variables. Weevaluate our approach on WMT'14 En-De, WMT'16 Ro-En and IWSLT'16 De-En, andobserve two advantages over the EM-like inference: (1) it is computationallyefficient, i.e. each refinement step is twice as fast, and (2) it is moreeffective, resulting in higher marginal probabilities and BLEU scores with thesame number of refinement steps. On WMT'14 En-De, for instance, our approach isable to decode 6.2 times faster than the autoregressive model with minimaldegradation to translation quality (0.9 BLEU).