Abstract
Our study introduces a novel, low-cost, and reproducible framework forreal-time, object-level structural assessment and geolocation of roadsidevegetation and infrastructure with commonly available but underutilizeddashboard camera (dashcam) video data. We developed an end-to-end pipeline thatcombines monocular depth estimation, depth error correction, and geometrictriangulation to generate accurate spatial and structural data fromstreet-level video streams from vehicle-mounted dashcams. Depth maps were firstestimated using a state-of-the-art monocular depth model, then refined via agradient-boosted regression framework to correct underestimations, particularlyfor distant objects. The depth correction model achieved strong predictiveperformance (R2 = 0.92, MAE = 0.31 on transformed scale), significantlyreducing bias beyond 15 m. Further, object locations were estimated usingGPS-based triangulation, while object heights were calculated using pin holecamera geometry. Our method was evaluated under varying conditions of cameraplacement and vehicle speed. Low-speed vehicle with inside camera gave thehighest accuracy, with mean geolocation error of 2.83 m, and mean absoluteerror (MAE) in height estimation of 2.09 m for trees and 0.88 m for poles. Tothe best of our knowledge, it is the first framework to combine monocular depthmodeling, triangulated GPS-based geolocation, and real-time structuralassessment for urban vegetation and infrastructure using consumer-grade videodata. Our approach complements conventional RS methods, such as LiDAR and imageby offering a fast, real-time, and cost-effective solution for object-levelmonitoring of vegetation risks and infrastructure exposure, making itespecially valuable for utility companies, and urban planners aiming forscalable and frequent assessments in dynamic urban environments.