Abstract
Distributed access control is a crucial component for massive machine typecommunication (mMTC). In this communication scenario, centralized resourceallocation is not scalable because resource configurations have to be sentfrequently from the base station to a massive number of devices. We investigatedistributed reinforcement learning for resource selection without relying oncentralized control. Another important feature of mMTC is the sporadic anddynamic change of traffic. Existing studies on distributed access controlassume that traffic load is static or they are able to gradually adapt to thedynamic traffic. We minimize the adaptation period by training TinyQMIX, whichis a lightweight multi-agent deep reinforcement learning model, to learn adistributed wireless resource selection policy under various traffic patternsbefore deployment. Therefore, the trained agents are able to quickly adapt todynamic traffic and provide low access delay. Numerical results are presentedto support our claims.