Model learning and application of partially observable Markov decision processes
by He, Lihan, Ph.D., DUKE UNIVERSITY, 2008, 156 pages; 3373515

Abstract:

The partially observable Markov decision process (POMDP) has been widely used in robot navigation and decision making. Learning an accurate POMDP model is a prerequisite for model-based POMDP applications. Given the definitions of states, actions and observations, learning a POMDP model concerns inferring the state-transition probabilities and state-dependent observation probabilities. This dissertation presents three Bayesian methods for learning a POMDP model, based on the MEDUSA (Markovian exploration with decision based on the use of sampled models algorithm) learning, multi-task learning (MTL) and life-long learning (LLL). These learning algorithms are introduced within two POMDP applications: adaptive land-mine sensing and online target searching.

After presenting background material on POMDPs, MEDUSA, MTL and LLL in Chapter 1, Chapter 2 addresses the application of multimodality sensing of landmines using two sensors. We first assume adequate data for learning an underlying POMDP model of mines and clutter are available, and describe the method of building an appropriate model. This is generalized by assuming the training data for mines and clutter are not available a priori, and the underling POMDP model is learned online based on a modified MEDUSA approach. An oracle is employed adaptively to reveal the label information of the underground mines/clutter under interrogation, and the posterior of the underlying POMDP model is updated based on the interrogation result. Example results are presented using measured sensing data from two sensors for buried mines and clutter, to demonstrate the performance of the algorithm.

Chapters 3–5 address the application of online target searching in an unknown environment. POMDPs and a simultaneous localization and mapping (SLAM) algorithm are combined in this application to navigate a robot (searching for an acoustic source) and to build a global map simultaneously. Chapter 3 introduces the SLAM algorithm, and proposes a geometric map representation by which a map is represented by a set of geometric units. Chapter 4 presents the online target searching framework, based on a modified MEDUSA and a grid-based SLAM, under the assumption of the availability of all the possible subworlds that may be encountered. An accurate POMDP model for each possible subworld is built before searching. The modified MEDUSA is performed for each subworld during the searching process, to recognize a correct underlying model. The assumption of knowing all the possible subworlds a priori is removed in Chapter 5, where two transfer learning approaches, multi-task learning and life-long learning, are proposed for learning a POMDP model, when the training data from a single task are not sufficient. The matrix stick-breaking process prior employed in the algorithms provides a flexible sharing structure, allowing two learning tasks to share only a subset of states with associated state transition probabilities and observation probabilities, instead of sharing the entire POMDP model. The simulation results for some simulated environments and for a real environment show the effectiveness of the framework and the algorithms.

 
AdviserLawrence Carin
SchoolDUKE UNIVERSITY
SourceDAI/B 70-08, p. , Oct 2009
Source TypeDissertation
SubjectsElectrical engineering; Artificial intelligence
Publication Number3373515
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3373515
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.