Abstract
We propose a method to leapfrog pixel-wise, semantic segmentation of (aerial)images and predict objects in a vector representation directly. PolyMapperpredicts maps of cities from aerial images as collections of polygons with alearnable framework. Instead of the usual multi-step procedure of semanticsegmentation, shape improvement, conversion to polygons, and polygonrefinement, our approach learns mappings with a single network architecture anddirectly outputs maps. We demonstrate that our method is capable of drawingpolygons of buildings and road networks that very closely approximate thestructure of existing online maps such as OpenStreetMap, and it does so in afully automated manner. Validation on existing and novel large scale datasetsof several cities show that our approach achieves good levels of performance.