|
Abstract:
We present computational approaches to solve several problems arising in protein structure and function. In the first part of this thesis, we develop a new method for finding the lowest energy positions of side chains when given the backbone of a protein, a widely studied problem that has applications in homology modeling and protein design. We present an integer linear programming formulation of side-chain positioning and relax it to give a polynomial-time linear programming heuristic that allows us to tackle large problems. We test the integer and linear programming approach on native and homologous backbones, where we show that optimal solutions can usually be found using linear programming, and in protein redesign, where we find that instances often cannot be solved using linear programming directly, but where optimal solutions for large instances can be found using the more expensive integer programming procedure. We also present an alternative formulation of the side-chain positioning problem as a semidefinite program, which provides a tighter relaxation than the linear program. We introduce two novel rounding schemes to convert fractional solutions of the semidefinite program into choices of rotamers and provide some theoretical justifications for their effectiveness. We extensively test the semidefinite programming formulation and rounding schemes on simulated data and on the redesign of two naturally occurring protein cores and show that the approach finds good solutions. The second part of this thesis considers the problem of finding transcription factor binding sites by locating a collection of mutually similar subsequences within the upstream DNA sequences of genes. Our approach to side-chain positioning can be recast to solve this problem, and it has previously been shown that this is a promising direction to pursue. We improve the mathematical programming formulation to find binding sites up to 45 times faster. Finally, in the last part of the thesis, we investigate protein function more broadly and give extensions to the popular phylogenetic profile method for predicting shared function from cross-genomic evolutionary history. For many biological functions, our methods are better able to identify functionally linked proteins than previously introduced methods.
|