Much of recent research in signal processing, statistics, and many other fields has focused on ℓ1 regularization based methods for feature selection, sparse signal reconstruction. In this thesis we study optimization problems with ℓ1 regularization, and efficient methods to solve them.
Our topic in §2, is how to solve a famous problem in classification, ℓ 1-regularized logistic regression problem. First, we study the problem, and derive optimality conditions and a dual problem. Then we describe an efficient interior-point method for the problem. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems with tens of thousand of features and examples, can be solved in tens of seconds (assuming sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search direction, can solve very large problems, with millions of features and examples in a few minutes, on a PC. We exploit the fact that PCG converges fast when the eigenvalues of the matrix in linear equations are clustered, and eigenvalues of the Hessian of the primal interior-point method are clustered. Using warm-start techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
In §3, we focus on how to solve a famous problem in regression, ℓ 1-regularized least squares problem, which is also known as Lasso, Basis Pursuit denoising, and Compressed Sensing problem. In parallel with §2, we describe an efficient interior-point method based on the search direction found by the preconditioned conjugate gradient method. The method can solve large sparse problems, with a million variables and observations, in a few tens of minutes on a PC. It can efficiently solve large dense problems, that arise in sparse signal recovery with orthogonal transforms, by exploiting fast algorithms for these transforms. The method is illustrated on a magnetic resonance imaging (MRI) data set.
In §4, we consider another application of ℓ1 regularization to the problem of estimating underlying trends in time series data. We propose a variation on Hodrick-Prescott (H-P) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., ℓ1-norm) for the sum of squares used in H-P filtering to penalize variations in the estimated trend. The ℓ1 trend filtering method produces trend estimates that are piecewise linear, and therefore is well suited to analyzing time series with an underlying piecewise linear trend. The kinks, knots, or changes in slope of the estimated trend can be interpreted as abrupt changes or events in the underlying dynamics of the time series. Using specialized interior-point methods, ℓ1 trend filtering can be carried out with not much more effort than H-P filtering; in particular, the number of arithmetic operations required grows linearly with the number of data points. We describe the method and some of its basic properties, and give some illustrative examples. We show how the method is related to ℓ1 regularization based methods in sparse signal recovery and feature selection, and list some extensions of the basic method.