 CMU-CS-10-142 Computer Science Department School of Computer Science, Carnegie Mellon University CMU-CS-10-142 The Approximability of Learning and Constraint Satisfaction Problems Yi Wu August 2010 Ph.D. Thesis Keywords: Complexity Theory, Approximation Algorithms, Computational Learning, Constraint Satisfaction Problem, Hardness of Approximation, Semidefinite Programming An α-approximation algorithm is an algorithm guaranteed to output a solution that is within an α ratio of the optimal solution. We are interested in the following question: Given an NP-hard optimization problem, what is the best approximation guarantee that any polynomial time algorithm could achieve? We mostly focus on studying the approximability of two classes of NP-hard problems: Constraint Satisfaction Problems (CSPs) and Computational Learning Problems. For CSPs, we mainly study the approximability of MAX CUT, MAX 3-CSP, MAX 2-LINR, VERTEX-PRICING, as well as serval variants of the UNIQUEGAMES. The problem of MAX CUT is to find a partition of a graph so as to maximize the number of edges between the two partitions. Assuming the Unique Games Conjecture, we give a complete characterization of the approximation curve of the MAX CUT problem: for every optimum value of the instance, we show that certain SDP algorithm with RPR2 rounding always achieve the optimal approximation curve. The input to a 3-CSP is a set of Boolean constraints such that each constraint contains at most 3 Boolean variables. The goal is to find an assignment to these variables to maximize the number of satisfied constraints. We are interested in the case when a 3-CSP is satisfiable, i.e., there does exist an assignment that satisfies every constraint. Assuming the d-to-1 conjecture (a variant of the Unique Games Conjecture), we prove that it is NP-hard to give a better than 5/8-approximation for the problem. Such a result matches a SDP algorithm by Zwick which gives a 5/8-approximation problem for satisfiable 3-CSP. In addition, our result also conditionally resolves a fundamental open problem in PCP theory on the optimal soundness for a 3-query nonadaptive PCP system for NP with perfect completeness. The problem of MAX 2-LINZ involves a linear systems of integer equations; these equations are so simple such that each equation contains at most 2 variables. The goal is to find an assignment to the variables so as to maximize the total number of satisfied equations. It is a natural generalization of the Unique Games Conjecture which address the hardness of the same equation systems over finite fields. We show that assuming the Unique Games Conjecture, for a MAX 2-LINZ instance, even that there exists a solution that satisfies 1 - ε of the equations, it is NP-hard to find one that satisfies ε of the equations for any ε > 0. The problem of VERTEX-PRICING involves of a set of customers each of which is interested in buying a set of items. VERTEX-PRICINGk is the special case when each customer is interested in at most k of the items. All of the buyers are single minded, which means that each of the buyer would buy all the items they have interest on if pthe total cost of the items is within their budget. The algorithmic task is to price each item with so as to maximize the overall profit. When each item is priced positive profit margin, it is known that there is a O(k)-approximation algorithm for the problem. We prove that in contrast for the very simple case of VERTEX-PRICING3, when the seller is allowed to price some of the items with negative profit margin (in which case more profit could possibly be achieved), there is no polynomial time approximation algorithm that gives constant approximation to the problemassuming the Unique Games Conjecture. For the learning problems, our results mostly involve showing that learning tasks are hard for many basic function classes under the agnostic learning model. In particular, we proved that the following two results on agnostic learning monomials and low degree polynomial threshold functions: Our first result is about hardness of learning monomials. We prove that given a set of examples, even that there exists a monomial that is consistent with 1 - ε of the examples, it is NP-hard to find a (1/2 + ε) good hypothesis even we are allowed to output a linear threshold function, for any ε > 0. Our result rules out the possibility of using linear classifiers such as Winnow and SVM to agnostically learn monomials. Our second result is on the hardness of learning polynomial threshold functions (PTFs). We prove that assuming the Unique Games Conjecture, given a set of examples, even that there exists a low degree PTF that is consistent with 1 - ε of the examples, it is NP-hard to find such a one that is 1/2 + ε good for any ε > 0. In the language of learning, we show there is no better-than-trivial proper learning algorithm that agnostically learns low degree PTFs. 240 pages This page maintained by reports@cs.cmu.edu