Methods for large-scale convex optimization problems withl 1 regularization
by Koh, Kwangmoo, Ph.D., STANFORD UNIVERSITY, 2009, 130 pages; 3343647

Abstract:

Much of recent research in signal processing, statistics, and many other fields has focused on ℓ1 regularization based methods for feature selection, sparse signal reconstruction. In this thesis we study optimization problems with ℓ1 regularization, and efficient methods to solve them.

Our topic in §2, is how to solve a famous problem in classification, ℓ 1-regularized logistic regression problem. First, we study the problem, and derive optimality conditions and a dual problem. Then we describe an efficient interior-point method for the problem. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems with tens of thousand of features and examples, can be solved in tens of seconds (assuming sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search direction, can solve very large problems, with millions of features and examples in a few minutes, on a PC. We exploit the fact that PCG converges fast when the eigenvalues of the matrix in linear equations are clustered, and eigenvalues of the Hessian of the primal interior-point method are clustered. Using warm-start techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.

In §3, we focus on how to solve a famous problem in regression, ℓ 1-regularized least squares problem, which is also known as Lasso, Basis Pursuit denoising, and Compressed Sensing problem. In parallel with §2, we describe an efficient interior-point method based on the search direction found by the preconditioned conjugate gradient method. The method can solve large sparse problems, with a million variables and observations, in a few tens of minutes on a PC. It can efficiently solve large dense problems, that arise in sparse signal recovery with orthogonal transforms, by exploiting fast algorithms for these transforms. The method is illustrated on a magnetic resonance imaging (MRI) data set.

In §4, we consider another application of ℓ1 regularization to the problem of estimating underlying trends in time series data. We propose a variation on Hodrick-Prescott (H-P) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., ℓ1-norm) for the sum of squares used in H-P filtering to penalize variations in the estimated trend. The ℓ1 trend filtering method produces trend estimates that are piecewise linear, and therefore is well suited to analyzing time series with an underlying piecewise linear trend. The kinks, knots, or changes in slope of the estimated trend can be interpreted as abrupt changes or events in the underlying dynamics of the time series. Using specialized interior-point methods, ℓ1 trend filtering can be carried out with not much more effort than H-P filtering; in particular, the number of arithmetic operations required grows linearly with the number of data points. We describe the method and some of its basic properties, and give some illustrative examples. We show how the method is related to ℓ1 regularization based methods in sparse signal recovery and feature selection, and list some extensions of the basic method.

 
AdviserStephen Boyd
SchoolSTANFORD UNIVERSITY
SourceDAI/B 70-01, p. , Mar 2009
Source TypeDissertation
SubjectsStatistics; Electrical engineering; Artificial intelligence; Computer science
Publication Number3343647
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3343647
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.