Online Subset Selection using $α$-Core with no Augmented Regret

Abstract

We consider the problem of sequential sparse subset selections in an onlinelearning setup. Assume that the set $[N]$ consists of $N$ distinct elements. Onthe $t^{\text{th}}$ round, a monotone reward function $f_t: 2^{[N]} \to\mathbb{R}_+,$ which assigns a non-negative reward to each subset of $[N],$ isrevealed to a learner. The learner selects (perhaps randomly) a subset $S_t\subseteq [N]$ of $k$ elements before the reward function $f_t$ for that roundis revealed $(k \leq N)$. As a consequence of its choice, the learner receivesa reward of $f_t(S_t)$ on the $t^{\text{th}}$ round. The learner's goal is todesign an online subset selection policy to maximize its expected cumulativereward accrued over a given time horizon. In this connection, we propose anonline learning policy called SCore (Subset Selection with Core) that solvesthe problem for a large class of reward functions. The proposed SCore policy isbased on a new concept of $\alpha$-Core, which is a generalization of thenotion of Core from the cooperative game theory literature. We establish alearning guarantee for the SCore policy in terms of a new performance metriccalled $\alpha$-augmented regret. In this new metric, the power of the offlinebenchmark is suitably augmented compared to the online policy. We give severalillustrative examples to show that a broad class of reward functions, includingsubmodular, can be efficiently learned using the SCore policy. We also outlinehow the SCore policy can be used under a semi-bandit feedback model andconclude the paper with a number of open problems.

Quick Read (beta)

loading the full paper ...