Convolutional neural network is a machine-learning model widely applied invarious prediction tasks, such as computer vision and medical image analysis.Their great predictive power requires extensive computation, which encouragesmodel owners to host the prediction service in a cloud platform. Recentresearches focus on the privacy of the query and results, but they do notprovide model privacy against the model-hosting server and may leak partialinformation about the results. Some of them further require frequentinteractions with the querier or heavy computation overheads, which discouragesquerier from using the prediction service. This paper proposes a new scheme forprivacy-preserving neural network prediction in the outsourced setting, i.e.,the server cannot learn the query, (intermediate) results, and the model.Similar to SecureML (S&P'17), a representative work that provides modelprivacy, we leverage two non-colluding servers with secret sharing and tripletgeneration to minimize the usage of heavyweight cryptography. Further, we adoptasynchronous computation to improve the throughput, and design garbled circuitsfor the non-polynomial activation function to keep the same accuracy as theunderlying network (instead of approximating it). Our experiments on MNISTdataset show that our scheme achieves an average of 122x, 14.63x, and 36.69xreduction in latency compared to SecureML, MiniONN (CCS'17), and EzPC(EuroS&P'19), respectively. For the communication costs, our scheme outperformsSecureML by 1.09x, MiniONN by 36.69x, and EzPC by 31.32x on average. On theCIFAR dataset, our scheme achieves a lower latency by a factor of 7.14x and3.48x compared to MiniONN and EzPC, respectively. Our scheme also provides13.88x and 77.46x lower communication costs than MiniONN and EzPC on the CIFARdataset.