High-dimensional regression with grouped variables
by Wei, Fengrong, Ph.D., THE UNIVERSITY OF IOWA, 2009, 89 pages; 3383179

Abstract:

In many multiple regression problems when covariates can be naturally grouped, it is important to take into account the group structure and select groups of variables. Such kind of problems arise in many statistical modeling and applied problems. For example, in multifactor analysis-of-variance problems, each factor may have several levels and can be expressed through a group of dummy variables. Then the selection of important factors corresponds to the selection of groups of variables.

There has been much work on the selection of important groups of variables using penalized methods. In our study, we generalize the results on the Lasso obtained in Zhang and Huang (2008) to the group Lasso in high-dimensional cases. We study the selection and estimation properties of the group Lasso and adaptive group Lasso methods. We show that, under appropriate conditions, the group Lasso selects a model of the right order of dimensionality and controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias. In addition, we show that, under a narrow sense of sparsity condition, the adaptive group Lasso possesses an oracle selection property, in the sense that it can correctly select important groups with probability converging to one. In contrast, group Lasso does not posses this oracle property.

Moreover, we apply the idea of the group Lasso to nonparametric varying coefficient problems which can simultaneously select the important variables and estimate the relative coefficient functions. We approximate each coefficient function by B-spline basis functions. Thus, the selection of important variables and the estimation of the corresponding coefficient functions amounts to the selection of groups of variables and the estimation of the relative spline approximation coefficients. We show that, under appropriate conditions, the estimator has consistency in sparsity and converges at the best possible rates.

Existing algorithms are adapted to compute the solution paths for both group Lasso and adaptive group Lasso. Tuning parameter selection and initial value selection methods are considered during the implementation of the algorithms. All the methods are illustrated by simulation studies and real examples.

 
AdviserJian Huang
SchoolTHE UNIVERSITY OF IOWA
SourceDAI/B 70-10, p. , Dec 2009
Source TypeDissertation
SubjectsBiostatistics; Statistics
Publication Number3383179
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3383179
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.