Abstract
This paper introduces a novel reinforcement learning (RL) framework, termedReward-Guided Conservative Q-learning (RG-CQL), to enhance coordination betweenride-pooling and public transit within a multimodal transportation network. Wemodel each ride-pooling vehicle as an agent governed by a Markov DecisionProcess (MDP) and propose an offline training and online fine-tuning RLframework to learn the optimal operational decisions of the multimodaltransportation systems, including rider-vehicle matching, selection of drop-offlocations for passengers, and vehicle routing decisions, with improved dataefficiency. During the offline training phase, we develop a Conservative DoubleDeep Q Network (CDDQN) as the action executor and a supervised learning-basedreward estimator, termed the Guider Network, to extract valuable insights intoaction-reward relationships from data batches. In the online fine-tuning phase,the Guider Network serves as an exploration guide, aiding CDDQN in effectivelyand conservatively exploring unknown state-action pairs. The efficacy of ouralgorithm is demonstrated through a realistic case study using real-world datafrom Manhattan. We show that integrating ride-pooling with public transitoutperforms two benchmark cases solo rides coordinated with transit andride-pooling without transit coordination by 17% and 22% in the achieved systemrewards, respectively. Furthermore, our innovative offline training and onlinefine-tuning framework offers a remarkable 81.3% improvement in data efficiencycompared to traditional online RL methods with adequate exploration budgets,with a 4.3% increase in total rewards and a 5.6% reduction in overestimationerrors. Experimental results further demonstrate that RG-CQL effectivelyaddresses the challenges of transitioning from offline to online RL inlarge-scale ride-pooling systems integrated with transit.