When 3D-point clouds from overhead sensors are used as input to remotesensing data exploitation pipelines, a large amount of effort is devoted todata preparation. Among the multiple stages of the preprocessing chain,estimating the Digital Terrain Model (DTM) model is considered to be of a highimportance; however, this remains a challenge, especially for raw point cloudsderived from optical imagery. Current algorithms estimate the ground pointsusing either a set of geometrical rules that require tuning multiple parametersand human interaction, or cast the problem as a binary classification machinelearning task where ground and non-ground classes are found. In contrast, herewe present an algorithm that directly operates on 3D-point clouds and estimatethe underlying DTM for the scene using an end-to-end approach without the needto classify points into ground and non-ground cover types. Our model learnsneighborhood information and seamlessly integrates this with point-wise andblock-wise global features. We validate our model using the ISPRS 3D SemanticLabeling Contest LiDAR data, as well as three scenes generated using densestereo matching, representative of high-rise buildings, lower urban structures,and a dense old-city residential area. We compare our findings with two widelyused software packages for DTM extraction, namely ENVI and LAStools. Ourpreliminary results show that the proposed method is able to achieve an overallMean Absolute Error of 11.5% compared to 29% and 16% for ENVI and LAStools.