Abstract
Reinforcement learning (RL) techniques have been developed to optimizeindustrial cooling systems, offering substantial energy savings compared totraditional heuristic policies. A major challenge in industrial controlinvolves learning behaviors that are feasible in the real world due tomachinery constraints. For example, certain actions can only be executed everyfew hours while other actions can be taken more frequently. Without extensivereward engineering and experimentation, an RL agent may not learn realisticoperation of machinery. To address this, we use hierarchical reinforcementlearning with multiple agents that control subsets of actions according totheir operation time scales. Our hierarchical approach achieves energy savingsover existing baselines while maintaining constraints such as operatingchillers within safe bounds in a simulated HVAC control environment.