In a joint work with David Baker (Biochemistry, UW) and his lab, we are studying ways to improve the numerical optimization procedure in their Fortran code Rosetta for predicting the 3-dimensional structure of a folded protein from the amino acid sequence. This is one of the central problems in computational molecular biology. See Baker Lab homepage. for more information on this problem.

A protein may be modelled as a set of points (atoms) in R3, with certain pairs of atoms linked, i.e., the Euclidean distance between each pair is constant and known. The free energy function depends on the pairwise distance between the atoms, and involves highly nonlinear terms such as the Lennard-Jones function. The atoms are further categorized as either backbone atoms or sidechain atoms. The main problem is to find torsion angles between adjacent links of backbone atoms that globally minimize the energy function, thus yielding a prediction of the folded protein. In Rosetta, the global minimization of this energy function is done by using a Monte Carlo method.

In the current implementation of Monte Carlo in the Baker lab's Rosetta code, the torsion angles are updated by moves. Four different types of moves are used to update the torsion angles for the backbone. In each one of these moves, a small subset of torsion angles (e.g., two adjacent angles or five randomly chosen angles) is randomly perturbed according to a Gaussian distribution and then local minimization (Powell's derivative-free method or the BFGS method) is applied to find a local minimum of the energy with respect to backbone angles within, say, 5 residues of the perturbed angles (the other angles are held fixed). The resulting set of angles is accepted if it doesn't increase the energy and is accepted with small probability if it increases the energy. For every, say, 25 of these moves attempted, a fifth move is attempted to minimize the total energy with respect to the sidechain torsion angles. The latter minimization is combinatorial in nature and expensive (which is why it is attempted less frequently than the other moves). These five moves are currently attempted sequentially.

The energy function has many local minima, corresponding to different folding configurations and packing of atoms, and getting near a global minimum from a local minimum usually requires a sequence of moves/perturbations that initially increases the function value. Part of the difficulty in efficiently finding global minimum lies in the dimension of the problem (number of backbone torsion angles), between 100 and 200, which makes searching for a neighborhood of global convergence challenging.

The following examples of native protein 3-D structures and those predicted by the model are borrowed from the Baker lab webpage.