Scarcity of health care resources could result in the unavoidable consequenceof rationing. For example, ventilators are often limited in supply, especiallyduring public health emergencies or in resource-constrained health caresettings, such as amid the pandemic of COVID-19. Currently, there is nouniversally accepted standard for health care resource allocation protocols,resulting in different governments prioritizing patients based on variouscriteria and heuristic-based protocols. In this study, we investigate the useof reinforcement learning for critical care resource allocation policyoptimization to fairly and effectively ration resources. We propose atransformer-based deep Q-network to integrate the disease progression ofindividual patients and the interaction effects among patients during thecritical care resource allocation. We aim to improve both fairness ofallocation and overall patient outcomes. Our experiments demonstrate that ourmethod significantly reduces excess deaths and achieves a more equitabledistribution under different levels of ventilator shortage, when compared toexisting severity-based and comorbidity-based methods in use by differentgovernments. Our source code is included in the supplement and will be releasedon Github upon publication.