The task of cross-view image geo-localization aims to determine thegeo-location (GPS coordinates) of a query ground-view image by matching it withthe GPS-tagged aerial (satellite) images in a reference dataset. Due to thedramatic changes of viewpoint, matching the cross-view images is challenging.In this paper, we propose the GeoCapsNet based on the capsule network forground-to-aerial image geo-localization. The network first extracts featuresfrom both ground-view and aerial images via standard convolution layers and thecapsule layers further encode the features to model the spatial featurehierarchies and enhance the representation power. Moreover, we introduce asimple and effective weighted soft-margin triplet loss with online batch hardsample mining, which can greatly improve image retrieval accuracy. Experimentalresults show that our GeoCapsNet significantly outperforms the state-of-the-artapproaches on two benchmark datasets.