Edge-Compatible Reinforcement Learning for Recommendations

Abstract

Most reinforcement learning (RL) recommendation systems designed for edgecomputing must either synchronize during recommendation selection or depend onan unprincipled patchwork collection of algorithms. In this work, we build onasynchronous coagent policy gradient algorithms \citep{kostas2020asynchronous}to propose a principled solution to this problem. The class of algorithms thatwe propose can be distributed over the internet and run asynchronously and inreal-time. When a given edge fails to respond to a request for data withsufficient speed, this is not a problem; the algorithm is designed to functionand learn in the edge setting, and network issues are part of this setting. Theresult is a principled, theoretically grounded RL algorithm designed to bedistributed in and learn in this asynchronous environment. In this work, wedescribe this algorithm and a proposed class of architectures in detail, anddemonstrate that they work well in practice in the asynchronous setting, evenas the network quality degrades.

Quick Read (beta)

loading the full paper ...