A detailed environment perception is a crucial component of automatedvehicles. However, to deal with the amount of perceived information, we alsorequire segmentation strategies. Based on a grid map environmentrepresentation, well-suited for sensor fusion, free-space estimation andmachine learning, we detect and classify objects using deep convolutionalneural networks. As input for our networks we use a multi-layer grid mapefficiently encoding 3D range sensor information. The inference output consistsof a list of rotated bounding boxes with associated semantic classes. Weconduct extensive ablation studies, highlight important design considerationswhen using grid maps and evaluate our models on the KITTI Bird's Eye Viewbenchmark. Qualitative and quantitative benchmark results show that we achieverobust detection and state of the art accuracy solely using top-view grid mapsfrom range sensor data.