Koolio: Path-planning using reinforcement learning on a real robot in a real environment
by Zamstein, Lavi Michael, Ph.D., UNIVERSITY OF FLORIDA, 2009, 320 pages; 3367596

Abstract:

There are many cases where it is not possible to program a robot with precise instructions. The environment may be unknown, or the programmer may not even know the best way in which to solve a problem. In cases such as these, intelligent machine learning is useful in order to provide the robot, or agent, with a policy, a set schema for determining choices based on inputs.

The two primary method groups of machine learning are Supervised Learning, methods by which the supervisor provides training data in order to help the agent learn, and Reinforcement Learning, which requires only a set of rewards for certain choices. Of the three categories of Reinforcement Learning, Dynamic Programming, Monte Carlo, and Temporal Difference, the Temporal Difference method known as Q-Learning was chosen.

Q-Learning is a Markov method which uses a weighted decision table to determine the best choice for any given set of sensor inputs. The values in this Q-table are calculated using the Q-formula, which weighs the expected value of a decision based on the known reward and uses a discounting factor to give more recent choices a greater effect on the values than older choices. The Q-table also allowed the learning to be modular, as a learning agent would only need the file containing the table to be able to use the learned policy generated by a different agent.

Because of the large number of iterations required for Q-Learning to reach an optimal policy, a simulator was required. This simulator provided a means by which the agent could learn behaviors without the need to worry about such things as parts wearing down or an untrained robot colliding with a wall.

After a policy was found in simulation, the Q-table was transferred into Koolio, a refrigerator robot, to allow it to navigate the hallways with the experience gathered in simulation. This Q-table was then further refined through more learning on the real robot.

 
AdviserA. Antonio Arroyo
SchoolUNIVERSITY OF FLORIDA
SourceDAI/B 70-07, p. , Sep 2009
Source TypeDissertation
SubjectsElectrical engineering; Robotics; Artificial intelligence
Publication Number3367596
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3367596
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.