Dimensionality reduction for classification with high-dimensional data
by Tian, T. Siva, Ph.D., UNIVERSITY OF SOUTHERN CALIFORNIA, 2009, 115 pages; 3368663

Abstract:

This thesis addresses dimensionality reduction problems in classification for both high-dimensional multivariate and functional data.

High-dimensional data refers to data with a large number of variables, often larger than the number of observations. High-dimensional data are encountered in a wide range of areas such as engineering, biometrics, psychometrics, and neuroimaging. Classifying these data is a difficult problem because the enormous number of variables poses challenges to conventional classification methods and renders many classical techniques impractical. A natural solution is to add a dimensionality reduction step before a classification technique is applied.

In order to deal with multivariate data, two approaches are proposed. One is a simulated annealing (SA) based method and the other is a multivariate adaptive stochastic search (MASS) method. They both utilize stochastic search algorithms to select a handful of optimal transformation directions from a large number of random directions in each iteration. One advantage of the proposed methods is that they can accurately project the data onto very low-dimensional non-linear, as well as linear, spaces. These methods are designed to mimic variable selection type methods, such as the Lasso, or variable combination methods, such as PCA, or a method that combines the two approaches. Particularly, MASS can adaptively adjust the model complexity level, and hence performs well in situations where variable selection or variable combination methods fail. We demonstrate the strengths of SA and MASS on an extensive range of simulation and real studies by comparing them to many classical and modern classification methods.

Classification problems associated with functional data are also addressed. We propose a functional adaptive classification (FAC) approach which takes the functional response into consideration and produces highly accurate and interpretable results. FAC is also based on a stochastic search procedure guided by the evaluation of model complexity. This often results in a simple relationship between functional covariates and the reduced data and makes the model interpretable. Simulation studies and an fMRI time course study are also provided to show the effectiveness of the proposed method.

 
AdvisersRand R. Wilcox; Gareth M. James
SchoolUNIVERSITY OF SOUTHERN CALIFORNIA
SourceDAI/B 70-07, p. , Sep 2009
Source TypeDissertation
SubjectsStatistics
Publication Number3368663
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3368663
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.