When: Friday, Feb 2nd, from 2:00 PM to 3:00 PM
Where: ENGR 045
Abstract: We introduce a reinforcement learning framework for economic platform design where the interaction between the platform designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules for the platform, while the participants (followers) respond strategically. We integrate the algorithms for determining followers’ response strategies into the leader’s learning environment, thereby formulating the leader’s learning problem as a POMDP that we call the Stackelberg POMDP. We prove that optimal leader’s strategies in the Stackelberg game are optimal policies in our Stackelberg POMDP, establishing a connection between solving POMDPs and Stackelberg games. For the specific case of no-regret learning followers, we solve an array of increasingly complex settings, including problems of indirect mechanism design where there is turn-taking and limited agent communication. We demonstrate the effectiveness of our training framework through ablation studies. We also give convergence results for no-regret learners to a Bayesian version of a coarse-correlated equilibrium, extending known results to the case of correlated types.