We address the packet routing problem in highly dynamic mobile ad-hocnetworks (MANETs). In the network routing problem each router chooses thenext-hop(s) of each packet to deliver the packet to a destination with lowerdelay, higher reliability, and less overhead in the network. In this paper, wepresent a novel framework and routing policies, DeepCQ+ routing, usingmulti-agent deep reinforcement learning (MADRL) which is designed to be robustand scalable for MANETs. Unlike other deep reinforcement learning (DRL)-basedrouting solutions in the literature, our approach has enabled us to train overa limited range of network parameters and conditions, but achieve realisticrouting policies for a much wider range of conditions including a variablenumber of nodes, different data flows with varying data rates andsource/destination pairs, diverse mobility levels, and other dynamic topologyof networks. We demonstrate the scalability, robustness, and performanceenhancements obtained by DeepCQ+ routing over a recently proposed model-freeand non-neural robust and reliable routing technique (i.e. CQ+ routing).DeepCQ+ routing outperforms non-DRL-based CQ+ routing in terms of overheadwhile maintains same goodput rate. Under a wide range of network sizes andmobility conditions, we have observed the reduction in normalized overhead of10-15%, indicating that the DeepCQ+ routing policy delivers more packetsend-to-end with less overhead used. To the best of our knowledge, this is thefirst successful application of MADRL for the MANET routing problem thatsimultaneously achieves scalability and robustness under dynamic conditionswhile outperforming its non-neural counterpart. More importantly, we provide aframework to design scalable and robust routing policy with any desired networkperformance metric of interest.