Constant Regret in Online Decision-Making
Abstract: I will present a class of finite-horizon control problems, where we see a random stream of arrivals, need to select actions in each step, and where the final objective depends only on the aggregate type-action counts; this includes many widely-studied control problems including online resource-allocation, dynamic pricing, generalized assignment, online bin packing, and bandits with knapsacks. For such settings, I will introduce a unified algorithmic paradigm, and provide a simple yet general condition under which these algorithms achieve constant regret, i.e., additive loss compared to the hindsight optimal solution which is independent of the horizon and state-space. These results stem from an elementary coupling argument, which may prove useful for many other questions in online decision-making. Time permitting, I will illustrate this by showing how we can use this technique to incorporate side information and historical data in these settings, and achieve constant regret with as little as a single data trace.