Abstract
Spectrum access is an essential problem in device-to-device (D2D)communications. However, with the recent growth in the number of mobiledevices, the wireless spectrum is becoming scarce, resulting in low spectralefficiency for D2D communications. To address this problem, this paper aims tointegrate the ambient backscatter communication technology into D2D devices toallow them to backscatter ambient RF signals to transmit their data when theshared spectrum is occupied by mobile users. To obtain the optimal spectrumaccess policy, i.e., stay idle or access the shared spectrum and perform activetransmissions or backscattering ambient RF signals for transmissions, tomaximize the average throughput for D2D users, deep reinforcement learning(DRL) can be adopted. However, DRL-based solutions may require long trainingtime due to the curse of dimensionality issue as well as complex deep neuralnetwork architectures. For that, we develop a novel quantum reinforcementlearning (RL) algorithm that can achieve a faster convergence rate with fewertraining parameters compared to DRL thanks to the quantum superposition andquantum entanglement principles. Specifically, instead of using conventionaldeep neural networks, the proposed quantum RL algorithm uses a parametrizedquantum circuit to approximate an optimal policy. Extensive simulations thendemonstrate that the proposed solution not only can significantly improve theaverage throughput of D2D devices when the shared spectrum is busy but also canachieve much better performance in terms of convergence rate and learningcomplexity compared to existing DRL-based methods.