Robust Margin Based Classifiers For Small Sample Data
by Gupta, Sidharth, M.S., ARIZONA STATE UNIVERSITY, 2011, 46 pages; 1491825

Abstract:

In many classification problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example cancer vs normal patients the consequences of misclassification are probably more important than any other data type, because the data point could be a cancer patient or the classification decision could help determine what gene might be over expressed and perhaps a cause of cancer. These mis-classifications are typically higher in the presence of outlier data points. The aim of this thesis is to develop a maximum margin classifier that is suited to address the lack of robustness of discriminant based classifiers (like the Support Vector Machine (SVM)) to noise and outliers. The underlying notion is to adopt and develop a natural loss function that is more robust to outliers and more representative of the true loss function of the data. It is demonstrated experimentally that SVM's are indeed susceptible to outliers and that the new classifier developed, here coined as Robust-SVM (RSVM), is superior to all studied classifier on the synthetic datasets. It is superior to the SVM in both the synthetic and experimental data from biomedical studies and is competent to a classifier derived on similar lines when real life data examples are considered.

 
AdviserSeungchan Kim
SchoolARIZONA STATE UNIVERSITY
SourceMAI/ 49-05, p. , Jun 2011
Source TypeThesis
SubjectsStatistics; Bioinformatics; Computer science
Publication Number1491825
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:1491825
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.