The weights of a deep neural network model are optimized in conjunction withthe governing flow equations to provide a model for sub-grid-scale stresses ina temporally developing plane turbulent jet at Reynolds number $Re_0=6\,000$.The objective function for training is first based on the instantaneousfiltered velocity fields from a corresponding direct numerical simulation, andthe training is by a stochastic gradient descent method, which uses the adjointNavier--Stokes equations to provide the end-to-end sensitivities of the modelweights to the velocity fields. In-sample and out-of-sample testing on multipledual-jet configurations show that its required mesh density in each coordinatedirection for prediction of mean flow, Reynolds stresses, and spectra is halfthat needed by the dynamic Smagorinsky model for comparable accuracy. The sameneural-network model trained directly to match filtered sub-grid-scale stresses-- without the constraint of being embedded within the flow equations duringthe training -- fails to provide a qualitatively correct prediction. Thecoupled formulation is generalized to train based only on mean-flow andReynolds stresses, which are more readily available in experiments. Themean-flow training provides a robust model, which is important, though asomewhat less accurate prediction for the same coarse meshes, as might beanticipated due to the reduced information available for training in this case.The anticipated advantage of the formulation is that the inclusion of resolvedphysics in the training increases its capacity to extrapolate. This is assessedfor the case of passive scalar transport, for which it outperforms establishedmodels due to improved mixing predictions.