Integrated approach for the exploration of geospatial datasets: The interaction of concepts, methods and data
by Dai, Xiping, Ph.D., THE PENNSYLVANIA STATE UNIVERSITY, 2007, 248 pages; 3298984

Abstract:

Categories are the basic concepts and building blocks for human knowledge in understanding, exploring and describing the world. Geographers use categories to conceptualize, interpret and communicate phenomena, such as land cover, urbanization, regions of economic growth, etc. Inductive machine-learning enabled by computers has proved to be a powerful tool for categorization in increasingly complex geographical datasets. Machine-learning tools are able to locate clusters of patterns or partitions in the space constructed from variables, which, however, are often hard for humans to understand and interpret. Furthermore, the data-driven clusters or patterns provided as results are not guaranteed to be appropriate in terms of “information classes” which are meaningful and important to users. This research provides a two-part solution to this problem.

In the first part, a cognitive model of category development is synthesized to emphasize the integration of data-driven and theory/knowledge-based categories. Inductive learning from data examples provides data-driven classes or clusters. Human knowledge is employed in supervised category development, and develops generalized category models for GIS communities. The construction of categories is a problem of the integration between “information classes” (categories) and data-driven classes provided by machine learning. This model provides an integrated approach combining machine learning and human expert knowledge, and then improves communication, representation and sharing of categories during their development.

In the second part, the integrated category development model is structured and optimized by combining visualization techniques and exploratory statistics with machine-learning tools. The visualization interface enables preprocessing of examples, facilitates an examination of the uncertainty in category design, and allows users to visually explore feature space. The combination of visualization and machine-learning supports the construction of categories, including the exploration for appropriate methods in category development, e.g. choosing between different types of classifiers, selection of appropriate training examples, rejection of outliers. This combined method of category development incorporates human expertise into the data-driven machine-learning by the visual interface, which allows controls on machine-learning tools and coordination among tools. The communications between the examples, description, categories and human expertise are, thus, enhanced and an integrated category development system is achieved. The integrated category development model is implemented as a series of visual and computation components and connected into a workflow design using GeoVISTA Studio.

 
Advisor
SchoolTHE PENNSYLVANIA STATE UNIVERSITY
SourceDAI/A 69-01, p. , Apr 2008
Source TypeDissertation
SubjectsGeography; Information science; Artificial intelligence
Publication Number3298984
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3298984
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.