Date: Monday, March 27, 2017 at 3:00 PM in Rice 404
Committee: Hongning Wang (Advisor), Jack Stankovic (Chair), Worthy Martin, and Yanjun Qi
Contextual Bandits in a Collaborative Environment
Contextual bandit algorithms provide principled online learning solutions to find optimal trade-offs between exploration and exploitation with companion side-information. They have been extensively used in many important practical scenarios, such as display advertising and content recommendation. A common practice estimates the unknown bandit parameters pertaining to each user independently. This unfortunately ignores dependency among users and thus leads to suboptimal solutions, especially for the applications that have strong social components. We develop a collaborative contextual bandit algorithm, in which the adjacency graph among users is leveraged to share context and payoffs among neighboring users while online updating. We rigorously prove an improved upper regret bound of the proposed collaborative bandit algorithm comparing to conventional independent bandit algorithms.
We also study the upper regret bound of the collaborative bandit algorithm when the input user dependency is erroneous and prove that under certain mild assumptions sub-linear upper regret bound is still achievable even when the input user dependency is erroneous. Based on the theoretical analysis, we enhance the collaborative contextual bandit algorithm to automatically estimate user dependency from the interaction data accumulated on the fly. With this collaborative framework, a greatly reduced learning complexity in a per-user basis can be achieved, which corresponds to rapid user preference learning and satisfaction improvement. Extensive experiments on both synthetic and three large-scale real-world datasets verified the improvement of our proposed algorithms against several state- of-the-art contextual bandit algorithms.