Contextual combinatorial cascading bandits
Refereed conference paper presented and published in conference proceedings

Full Text

Times Cited

Other information
AbstractWe propose the contextual combinatorial cascading bandits, a combinatorial online learning game, where at each time step a learning agent is given a set of contextual information, then selects a list of items, and observes stochastic outcomes of a prefix in the selected items by some stopping criterion. In online recommendation, the stopping criterion might be the first item a user selects; in network routing, the stopping criterion might be the first edge blocked in a path. We consider position discounts in the list order, so that the agent's reward is discounted depending on the position where the stopping criterion is met. We design a UCB-type algorithm, C3-UCB, for this problem, prove an n-step regret bound Õ(√n) in the general setting, and give finer analysis for two special cases. Our work generalizes existing studies in several directions, including contextual information, position discounts, and a more general cascading bandit model. Experiments on synthetic and real datasets demonstrate the advantage of involving contextual information and position discounts.
All Author(s) ListShuai Li, Baoxiang Wang, Shengyu Zhang, Wei Chen
Name of Conference33rd International Conference on Machine Learning, ICML 2016
Start Date of Conference19/06/2016
End Date of Conference24/06/2016
Place of ConferenceNew York City
Country/Region of ConferenceUnited States of America
Volume Number3
Pages1900 - 1916
LanguagesEnglish-United Kingdom

Last updated on 2020-05-08 at 03:36