Dynamic graph representation learning is an important task with widespreadapplications. Previous methods on dynamic graph learning are usually sensitiveto noisy graph information such as missing or spurious connections, which canyield degenerated performance and generalization. To overcome this challenge,we propose a Transformer-based dynamic graph learning method named DynamicGraph Transformer (DGT) with spatial-temporal encoding to effectively learngraph topology and capture implicit links. To improve the generalizationability, we introduce two complementary self-supervised pre-training tasks andshow that jointly optimizing the two pre-training tasks results in a smallerBayesian error rate via an information-theoretic analysis. We also propose atemporal-union graph structure and a target-context node sampling strategy forefficient and scalable training. Extensive experiments on real-world datasetsillustrate that DGT presents superior performance compared with severalstate-of-the-art baselines.