Intractability results for problems in computational learning and approximation
by Saket, Rishi, Ph.D., GEORGIA INSTITUTE OF TECHNOLOGY, 2009, 175 pages; 3376346

Abstract:

In this thesis we prove intractability results for some well studied problems in computational learning and combinatorial optimization.

Minimizing and learning DNF expressions. We study the problem of finding the minimum size DNF formula for a function f : {0, 1}d [special characters omitted] {0, 1} given its truth table. We show that unless NP ⊆ DTIME([special characters omitted]), there is no polynomial time algorithm that approximates this problem to within factor d1−ϵ where ϵ > 0 is an arbitrarily small constant. Our result essentially matches the known O(d) approximation for the problem.

We also study the learnability of small size DNF formulas. We show that assuming NP [special characters omitted] RP, for arbitrarily small constant ϵ > 0 and any fixed positive integer t, a two term DNF cannot be PAC-learnt in polynomial time by a t term DNF to within ½ + ϵ accuracy. Under the same complexity assumption, we show that for arbitrarily small constants μ, ϵ > 0 and any fixed positive integer t, an AND function (i.e. a single term DNF) cannot be PAC-learnt in polynomial time under adversarial μ-noise (also known as agnostic learning) by a t-CNF to within ½ + ϵ accuracy. Our results improve upon the previously known hardness for these problems [1, 30, 31].

Learning intersection of two halfspaces. We show that unless NP= RP, it is hard to PAC-learn intersection of two halfspaces in [special characters omitted] using a hypothesis which is a function of up to ℓ linear threshold functions for any integer ℓ to within accuracy of ½ + ϵ for any constant ϵ > 0. Specifically, we show that for every integer ℓ and an arbitrarily small constant ϵ > 0, unless NP = RP, no polynomial time algorithm can distinguish whether there is an intersection of two halfspaces that correctly classifies a given set of labeled points in [special characters omitted], or whether any function of ℓ linear threshold functions can correctly classify at most ½ + ϵ fraction of the points. Our result is optimal up to an arbitrarily small constant ϵ > 0, and improves upon the previous NP-hardness for this problem [1].

Reconstructing multivariate polynomials over [special characters omitted][2]. We study the polynomial reconstruction problem for low-degree multivariate polynomials over [special characters omitted][2]. In this problem, we are given a set of points xi ∈ [special characters omitted][2]n and target values ζ i ∈ [special characters omitted][2] for each of these points, with the strong promise promise that there is a linear polynomial over [special characters omitted][2] that agrees with at least 1 − ϵ fraction of the point-value pairs. Our goal is to find a degree d polynomial that has good agreement with the set of point-value pairs. We show that it is NP-hard to find a polynomial that agrees with more than 1 − 2−d + δ fraction of the pairs for any constants ϵ, δ > 0 and positive integer d. This extends the previously known hardness of approximation (or even NP-completeness) for the case when d = 1, which follows from a celebrated result of Håstad [38]. In the setting of Computational Learning, our result shows the hardness of agnostic learning of parities, where the learner is allowed a low-degree polynomial over [special characters omitted][2] as a hypothesis.

SDP integrality gaps with Local ℓ1-embeddability . We construct integrality gap instances for SDP relaxation of the M

AXIMUM

C

UT

and the S

PARSEST

C

UT

problems. If the triangle inequality constraints are added to the SDP, then the SDP vectors naturally define an n-point negative type metric where n is the number of vertices in the problem instance. Our gap-instances satisfy a stronger constraint that every sub-metric on t = O((log log log n)[special characters omitted]) points is isometrically embeddable into ℓ1. The local ℓ 1-embeddability constraints are implied when the basic SDP relaxation is augmented with t rounds of the Sherali-Adams LP-relaxation [75].

For the M

AXIMUM

C

UT

problem, we obtain an optimal gap of [special characters omitted] − ϵ, where αGW is the Goemans-Williamson constant [33] and ϵ > 0 is an arbitrarily small constant. For the S

PARSEST

C

UT

problem, we obtain a gap of Ω((log log log n)[special characters omitted]). The latter result can be rephrased as a construction of an n-point negative type metric such that every t-point sub-metric is isometrically ℓ1-embeddable, but embedding the whole metric into ℓ1 incurs distortion Ω((log log log n)[special characters omitted]).

Integrality gap for U

NIFORM

S

PARSEST

C

UT

. Arora, Rao and Vazirani [7] showed that the standard semi-definite programming relaxation of the U

NIFORM

S

PARSEST

C

UT

problem with the triangle inequality constraints has an integrality gap of O([special characters omitted]). They conjectured that the gap is bounded from above by a constant. In this study, we disprove this conjecture (referred to as the ARV-Conjecture) by constructing an Ω(log log n) integrality gap instance. Khot and Vishnoi [54] had earlier disproved the non-uniform version of the ARV-Conjecture.

 
AdviserSubhash Khot
SchoolGEORGIA INSTITUTE OF TECHNOLOGY
SourceDAI/B 70-11, p. , Dec 2009
Source TypeDissertation
SubjectsMathematics; Computer science
Publication Number3376346
Adobe PDF Access the complete dissertation:
 

» Find an electronic copy at your library.
  Use the link below to access a full citation record of this graduate work:
  http://gateway.proquest.com/openurl%3furl_ver=Z39.88-2004%26res_dat=xri:pqdiss%26rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation%26rft_dat=xri:pqdiss:3376346
  If your library subscribes to the ProQuest Dissertations & Theses (PQDT) database, you may be entitled to a free electronic version of this graduate work. If not, you will have the option to purchase one, and access a 24 page preview for free (if available).

About ProQuest Dissertations & Theses
With over 2.3 million records, the ProQuest Dissertations & Theses (PQDT) database is the most comprehensive collection of dissertations and theses in the world. It is the database of record for graduate research.

The database includes citations of graduate works ranging from the first U.S. dissertation, accepted in 1861, to those accepted as recently as last semester. Of the 2.3 million graduate works included in the database, ProQuest offers more than 1.9 million in full text formats. Of those, over 860,000 are available in PDF format. More than 60,000 dissertations and theses are added to the database each year.

If you have questions, please feel free to visit the ProQuest Web site - http://www.proquest.com - or call ProQuest Hotline Customer Support at 1-800-521-3042.