Partial Least Squares (PLS) is a class of methods for modeling relations between sets of observed variables by means of latent variables where the explanatory variables are highly collinear and where they outnumber the observations. In general, PLS methods aim to derive orthogonal components using the cross-covariance matrix between the response variable(s) and the explanatory variables, a quantity that is known to be affected by unusual observations (outliers) in the data set. In this study, robustified versions of PLS methods, for regression and classification, are introduced.
For regression with quantitative response, a robust PLS regression method (RoPLS), based on weights calculated by BACON or PCOUT algorithm, is proposed. A robust criteria is suggested to determine the optimal number of PLS components which is an important issue in building a PLS regression model. In addition, diagnostic plots are constructed to visualize and classify outliers. Robustness of the proposed method, RoPLS, is studied in detail. Influence function for the RoPLS estimator is derived for low dimensional data and empirical robustness properties are provided for high dimensional data.
PLS was originally designed for regression problems with quantitative response, however, it is also used as a classification technique where the response variable is qualitative. Although several robust PLS methods have been proposed for regression problems, to our knowledge, there has been no study on the robustness of the PLS classification methods. In this study, the effect of outliers on existing PLS classification methods is investigated and a new robust PLS algorithm (RoCPLS) for classification is introduced.
The performances of the proposed methods, RoPLS and RoCPLS, are being assessed by employing several benchmark data sets and extensive simulation experiments.
About ProQuest Dissertations & Theses
With nearly 4 million records, the ProQuest Dissertations & Theses (PQDT) Global database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.
PQDT Global combines content from a range of the world's premier universities - from the Ivy League to the Russell Group. Of the nearly 4 million graduate works included in the database, ProQuest offers more than 2.5 million in full text formats. Of those, over 1.7 million are available in PDF format. More than 90,000 dissertations and theses are added to the database each year.