Many applications of machine learning in science and medicine, includingmolecular property and protein function prediction, can be cast as problems ofpredicting some properties of graphs, where having good graph representationsis critical. However, two key challenges in these domains are (1) extremescarcity of labeled data due to expensive lab experiments, and (2) needing toextrapolate to test graphs that are structurally different from those seenduring training. In this paper, we explore pre-training to address both ofthese challenges. In particular, working with Graph Neural Networks (GNNs) forrepresentation learning of graphs, we wish to obtain node representations that(1) capture similarity of nodes' network neighborhood structure, (2) can becomposed to give accurate graph-level representations, and (3) capturedomain-knowledge. To achieve these goals, we propose a series of methods topre-train GNNs at both the node-level and the graph-level, using both unlabeleddata and labeled data from related auxiliary supervised tasks. We performextensive evaluation on two applications, molecular property and proteinfunction prediction. We observe that performing only graph-level supervisedpre-training often leads to marginal performance gain or even can worsen theperformance compared to non-pre-trained models. On the other hand, effectivelycombining both node- and graph-level pre-training techniques significantlyimproves generalization to out-of-distribution graphs, consistentlyoutperforming non-pre-trained GNNs across 8 datasets in molecular propertyprediction (resp. 40 tasks in protein function prediction), with the averageROC-AUC improvement of 7.2% (resp. 11.7%).