Parallel support vector machines for multi-category classification of large scale data
by Rajendran, Arun Kumar, Ph.D., THE UNIVERSITY OF SOUTHERN MISSISSIPPI, 2007, 79 pages; 3300868

Abstract:

Support Vector Machines (SVM) is a classification algorithm based on statistical learning theory, which has been receiving wide attention for classification problems, because of its accuracy and generalization property. There are many SVM tools available for classification, which perform well for binary classification, but multi-class classification increases the complexity of the problem and makes the computation more expensive or even prohibitive for larger datasets.

In this work we propose a method to parallelize the classification of multi-class data based on Sequential Minimal Optimization (SMO) SVM algorithm using parallel computing techniques. This parallel implementation allows us to use high performance computing resources to perform multi-class classification in an efficient way. The SMO algorithm breaks down the classification problem into the smallest Quadratic Programming (QP) problems avoiding expensive numerical optimization. In this implementation the SMO algorithm is used to perform multi-class classification by building a series of binary classifiers and then use them to perform one-versus-one multi-class classification. The implementation was tested on many publicly available datasets. The variable size of the data for each class results in load balancing problems, which is solved by preprocessing the data set and scheduling the subtasks based on the size of the subtasks to improve the throughput.

The load balancing issue is addressed by developing different mapping schemes to distribute the tasks to the parallel nodes. The parallel algorithm developed minimizes the communication between the nodes, by reducing the data transferred between the processes. Therefore the parallel algorithms developed can be studied in both shared and distributed memory parallel systems.

SVMs can be used for classification of Gene expression data, which helps to analyze and understand many aspects of gene functions and irregularities. SVMs are also used in face recognition, text categorization, credit card processing, etc.

 
AdviserChaoyang Zhang
SchoolTHE UNIVERSITY OF SOUTHERN MISSISSIPPI
SourceDAI/B 69-01, p. , Apr 2008
Source TypeDissertation
SubjectsArtificial intelligence; Computer science
Publication Number3300868
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3300868
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.