Abstract
The bisimulation metric (BSM) is a powerful tool for computing statesimilarities within a Markov decision process (MDP), revealing that statescloser in BSM have more similar optimal value functions. While BSM has beensuccessfully utilized in reinforcement learning (RL) for tasks like staterepresentation learning and policy exploration, its application to multiple-MDPscenarios, such as policy transfer, remains challenging. Prior work hasattempted to generalize BSM to pairs of MDPs, but a lack of rigorous analysisof its mathematical properties has limited further theoretical progress. Inthis work, we formally establish a generalized bisimulation metric (GBSM)between pairs of MDPs, which is rigorously proven with the three fundamentalproperties: GBSM symmetry, inter-MDP triangle inequality, and the distancebound on identical state spaces. Leveraging these properties, we theoreticallyanalyse policy transfer, state aggregation, and sampling-based estimation inMDPs, obtaining explicit bounds that are strictly tighter than those derivedfrom the standard BSM. Additionally, GBSM provides a closed-form samplecomplexity for estimation, improving upon existing asymptotic results based onBSM. Numerical results validate our theoretical findings and demonstrate theeffectiveness of GBSM in multi-MDP scenarios.