Object detection in challenging situations such as scale variation,occlusion, and truncation depends not only on feature details but also oncontextual information. Most previous networks emphasize too much on detailedfeature extraction through deeper and wider networks, which may enhance theaccuracy of object detection to certain extent. However, the feature detailsare easily being changed or washed out after passing through complicatedfiltering structures. To better handle these challenges, the paper proposes anovel framework, multi-scale, deep inception convolutional neural network(MDCN), which focuses on wider and broader object regions by activating featuremaps produced in the deep part of the network. Instead of incepting innerlayers in the shallow part of the network, multi-scale inceptions areintroduced in the deep layers. The proposed framework integrates the contextualinformation into the learning process through a single-shot network structure.It is computational efficient and avoids the hard training problem of previousmacro feature extraction network designed for shallow layers. Extensiveexperiments demonstrate the effectiveness and superior performance of MDCN overthe state-of-the-art models.