UMI  
ProQuest® Dissertations & Theses
The world's most comprehensive collection of dissertations and theses. Learn more...
ProQuest  
 
 
Algorithms for bandit online linear optimization
by Dani, Varsha, Ph.D., THE UNIVERSITY OF CHICAGO, 2008, 91 pages; 3300428
 

Abstract:

The well-known "multi-armed bandit" problem is a problem of iterated decision making in a largely unknown and constantly changing environment. A decision-maker must repeatedly select one of k decisions, and incur a corresponding loss, with the eventual goal of minimizing the "regret," defined as the difference between the actual total loss of an online decision-making procedure and that of the best single decision in hindsight. This problem has been well studied, and the best algorithms achieve the optimal regret of [Special characters omitted.] (ignoring polylogarithmic factors), where T is the number of rounds for which the game is played.

We study the problem of "bandit online linear optimization," which is a generalization of the multi-armed bandit problem to a setting in which the decision set is a compact subset of [Special characters omitted.] and the loss functions to be optimized are linear. Although the algorithms from the multi-armed bandit setting can be directly applied here, they suffer from the dependence of their regret on the size of the decision set, since this may be very large or even infinite. The aim, rather, is to find algorithms whose regret is controlled by the ambient dimension n. "Bandit" refers to the setting in which the algorithm is only told its loss on each round, rather than the entire loss function. Prior to our work, algorithms for the bandit setting with bounded linear loss functions, achieved a regret bound that was polynomial in n by sacrificing their dependence on the number of rounds, the best previous bounds being of order poly(n )T2/3 . We bridge this gap by presenting an algorithm that has expected regret at most n3/2 [Special characters omitted.] (ignoring polylogarithmic factors). We also present a modification of our algorithm which achieves the same regret bound with high probability.

We provide lower bounds to show that no algorithm for bandit online linear optimization can achieve lower regret than ?(n[Special characters omitted.] ). An easy corollary of earlier work of Freund and Schapire shows that in the full-information setting, (i.e., when the algorithm observes the entire loss vector each round) a regret bound of O ([Special characters omitted.] ) is possible. Thus we show that for online linear optimization, the price of bandit information, i.e., the extra regret factor due to not observing the entire loss function is between [Special characters omitted.] and n.

Another variant of the problem has the loss functions drawn from some fixed but unknown distribution. For this case we present a deterministic algorithm which achieves a regret of O (n [Special characters omitted.] ) in general. In the special case when the decision set is a fixed polytope, this algorithm achieves regret O (n 2 log3HASH(0xa7c0b0c)T ). We also give a lower bound of ?([Special characters omitted.] ) on regret for general compact decision sets.

 
Advisor: Fortnow, Lance
School: THE UNIVERSITY OF CHICAGO
Source: DAI-B 69/02, p. , Aug 2008
Source Type: Ph.D.
Subjects: Computer science
Publication Number: 3300428
     
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3300428
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

 
 
 

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.il.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.



Copyright © 2007 ProQuest. All rights reserved. Terms and Conditions

ProQuest