Abstract
We study finite-armed semiparametric bandits, where each arm's rewardcombines a linear component with an unknown, potentially adversarial shift.This model strictly generalizes classical linear bandits and reflectscomplexities common in practice. We propose the first experimental-designapproach that simultaneously offers a sharp regret bound, a PAC bound, and abest-arm identification guarantee. Our method attains the minimax regret$\tilde{O}(\sqrt{dT})$, matching the known lower bound for finite-armed linearbandits, and further achieves logarithmic regret under a positive suboptimalitygap condition. These guarantees follow from our refined non-asymptotic analysisof orthogonalized regression that attains the optimal $\sqrt{d}$ rate, pavingthe way for robust and efficient learning across a broad class ofsemiparametric bandit problems.