Large implicit state space enumeration: Overcoming memory and disk limitations

by Robinson, Eric, Ph.D., NORTHEASTERN UNIVERSITY, 2008, 148 pages; 3315757

Abstract:

This thesis presents a software implementation that provides an application-independent method for searching and enumerating implicit state spaces too large for distributed memory. When presented with a specific enumeration application and a description of the hardware resources available for the enumeration, the software automatically chooses the fastest enumeration technique. Implicit state spaces are formally defined in Section 2.1 but can be viewed as a compact representation for a large graph using only a small subset of states and a set of functions to generate edges in the graph.

In order to determine the best enumeration technique for an application, analytical formulas are derived in this thesis to predict the time and space requirements for many well-known search and enumeration techniques. These formulas take as input parameters a description of the enumeration to be performed as well as the hardware available for that enumeration. The time and space requirements for the enumeration are then derived by applying the formulas for a given set of parameters. The software offers an application-independent approach for estimating many of these parameters where they are not provided by the user. The size of the search space itself is a prominent example of one of these estimated parameters.

In the process of analyzing the many parallel and oftentimes disk-based techniques for search and enumeration, a natural space-time search hierarchy is uncovered. With this, one can see a trade-off between the amount of storage an enumeration technique requires and the number of computations that technique must perform. A gap was discovered in this hierarchy and a new technique for search and enumeration, tiered duplicate detection, is presented in this thesis to fill that gap.

Tiered duplicate detection is novel in that it requires fewer passes through disk than many competing techniques. It uses an in-core imperfect duplicate detection method, and has an out-of-core method for determining when errors have been made. This allows it to explore multiple depths in the search tree without having to access disk between each to perform duplicate detection as many of the earlier disk-based techniques had done. It was the preferred enumeration technique for two of the computational group theory applications presented in Section 8.3, the baby monster group and Fischer's group Fi23.

With this software implementation, applications in fields such as computational group theory, puzzle search, and implicit state model checking are considered. An in-depth examination of applications in the computational group theory field give evidence that the analysis and formulas derived in this thesis are accurate. For the applications considered in this thesis, these formulas are always accurate to within ±50%, and typically the accuracy is within ±25%. This is almost always accurate enough to choose a disk-based parallel enumeration technique for a given application that yields a run-time comparable to the best technique for that application.

With this framework, we offer the most comprehensive examination of the topic of disk-based search and enumeration to date. We present not only an analysis of existing techniques, but also introduce a new hybrid technique, tiered duplicate detection. In addition, we provide a software implementation. This implementation allows non-experts to specify their search or enumeration in terms relevant to their application, such as state representation and transitions between states. We present a wide variety of applications in several fields to show the applicability of our software while focusing on the computational group theory field to offer experimental evidence that the analysis presented is valid.

AdviserGene Cooperman
SchoolNORTHEASTERN UNIVERSITY
Source TypeDissertation
SubjectsComputer science
Publication Number3315757

About ProQuest Dissertations & Theses
With nearly 4 million records, the ProQuest Dissertations & Theses (PQDT) Global database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

PQDT Global combines content from a range of the world's premier universities - from the Ivy League to the Russell Group. Of the nearly 4 million graduate works included in the database, ProQuest offers more than 2.5 million in full text formats. Of those, over 1.7 million are available in PDF format. More than 90,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or contact ProQuest Support.