Improving Prediction Accuracy Using Class-specific Ensemble Feature Selection
by Soares, Caio Vinicius, Ph.D., AUBURN UNIVERSITY, 2010, 138 pages; 3446224

Abstract:

As data accumulates at a speed significantly faster than can be processed, data preprocessing techniques such as feature selection become increasingly important and beneficial. Moreover, given the well-known gains of feature selection, any further improvements can positively affect a wide array of fields and applications. So, this research explores a novel feature selection architecture, Class-specific Ensemble Feature Selection (CEFS), which finds class-specific subsets of features optimal to each available classification in the dataset. Each subset is then combined with a classifier to create an ensemble feature selection model which is further used to predict unseen instances. CEFS attempts to provide the diversity and base classifier disagreement sought after in effective ensemble models by providing highly useful, yet highly exclusive feature subsets. CEFS is not a feature selection algorithm, but rather, a unique way of performing feature selection. Hence, it is also algorithm independent, suggesting that various machine learners and feature selection algorithms can benefit from the use of this architecture. To test this architecture, a comprehensive experiment is conducted, implementing the architecture under two different classifiers, three different feature selection algorithms, and under ten different datasets. The results of this experiment shows that the CEFS architecture outperforms the traditional feature selection architecture in every algorithmic combination and for every dataset. Moreover, the presence of high-dimensional datasets suggests that CEFS will scale up. Finally, the feature results obtained from the experiment suggest that vital class-specific information can be lost if feature selection is performed on the entire dataset as a whole, as opposed to a class-specific manner.

 
AdviserJuan E. Gilbert
SchoolAUBURN UNIVERSITY
SourceDAI/B 72-04, p. , Mar 2011
Source TypeDissertation
SubjectsComputer science
Publication Number3446224
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3446224
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.