Spatial structures in the 3D space are important to determine molecularproperties. Recent papers use geometric deep learning to represent moleculesand predict properties. These papers, however, are computationally expensive incapturing long-range dependencies of input atoms; and have not considered thenon-uniformity of interatomic distances, thus failing to learncontext-dependent representations at different scales. To deal with suchissues, we introduce 3D-Transformer, a variant of the Transformer for molecularrepresentations that incorporates 3D spatial information. 3D-Transformeroperates on a fully-connected graph with direct connections between atoms. Tocope with the non-uniformity of interatomic distances, we develop a multi-scaleself-attention module that exploits local fine-grained patterns with increasingcontextual scales. As molecules of different sizes rely on different kinds ofspatial features, we design an adaptive position encoding module that adoptsdifferent position encoding methods for small and large molecules. Finally, toattain the molecular representation from atom embeddings, we propose anattentive farthest point sampling algorithm that selects a portion of atomswith the assistance of attention scores, overcoming handicaps of the virtualnode and previous distance-dominant downsampling methods. We validate3D-Transformer across three important scientific domains: quantum chemistry,material science, and proteomics. Our experiments show significant improvementsover state-of-the-art models on the crystal property prediction task and theprotein-ligand binding affinity prediction task, and show better or competitiveperformance in quantum chemistry molecular datasets. This work provides clearevidence that biochemical tasks can gain consistent benefits from 3D molecularrepresentations and different tasks require different position encodingmethods.