Preserving nearest neighbor consistency in cluster analysis
by Lee, Jong-Seok, Ph.D., IOWA STATE UNIVERSITY, 2009, 131 pages; 3369852

Abstract:

The two main streams in finding cluster structure from data could be to identify the number of natural clusters and, of course, to group the objects in a reasonable way. In order to achieve good results for these two, measuring goodness of clustering is required prior to beginning any related studies because it helps to establish a definition of cluster that could be ambiguous by individuals having different opinions on it. In this research we are concerned about the compactness and the connectivity of cluster as our goodness measurements. The former has been regarded as one of the most important properties that should be accomplished in a clustering task, whereas the latter that we think as a significant factor has received less attention. Since we believe that both are individually important, we employ them for better estimating the number of clusters and clustering objects. A new estimating method produces a set of promising estimates by measuring compactness and connectivity from clustered datasets which look similar to the original data but have an amount of perturbation, and then determines a single optimal number by majority voting scheme. The connectivity measure newly introduced in our research is also used as an objective to be achieved in clustering objects. We propose a new clustering algorithm, named as CNCLUST, that works in a way to optimize the quantity of connectivity. The proposed clustering algorithm is a greedy heuristic that looks like a single linkage method, but it is distinguishable by the fact that it first considers local compactness of objects and later incorporates it into global connectivity. We conducted numerical experiments in order to evaluate the performances of the proposed methods based on simulated datasets and a real data. The results seem optimistic.

 
AdviserSigurdur Olafsson
SchoolIOWA STATE UNIVERSITY
SourceDAI/B 70-08, p. , Oct 2009
Source TypeDissertation
SubjectsIndustrial engineering
Publication Number3369852
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3369852
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.