Conformational Energy Searching

Background

Energy is a function of the degrees of freedom in a molecule (i.e. bonds, angles, and dihedrals). Conformational energy searching is used to find all of the energetically preferred conformations of a molecule (especially rotamers), which is mathematically equivalent to locating all of the minima of its energy function.

An energy surface resembles a mountain range, complete with peaks (energy barriers), valleys (energy minima), and passes (saddle points). Molecular mechanics energy approximations are most valid in and around the minima. The lowest energy minima tend to be the most populated (as per the Boltzmann distribution). However, since molecular energies are not accurately computed by molecular mechanics or semiempirical quantum mechanics, and the molecular environment in the model may not be exact (e.g. solvation or intermolecular influences), it is best to consider energy minima within as much as 25 kcal/mol above the lowest (depending on the energy calculation method, etc.).

Conformational searching does not apply to macromolecules because the dimensionality of the problem is untenable. However, small pieces of a macromolecule can be studied, such as some of the side chains of a protein. Highly complex molecules, such as macrocycles (found in many antibiotics), cyclic peptides, and molecules with strained fused ring systems (e.g. baccatin III ring of taxol) can also be difficult to study using conformational searching. Peptides and oligosaccharides containing less than ten residues can be studied, but assistance from experimentally derived information (e.g. NMR) is often essential.

Systematic Energy Sampling

Systematic energy sampling is, in principle, the most thorough method for searching conformational energy space. The energy is sampled over the entire range of each degree of freedom (typically bond rotations) at regularly spaced intervals. The sampled conformations thus lie on an n-dimensional lattice (n being the number of degrees of freedom), as shown in the following plot of energy vs. two bond rotations:

You specify the dihedral angles to include in the search, and for each dihedral, the range of rotations to be sampled, and the rotation step size. The rotations are performed in a clock-like fashion, such that the first dihedral angle rotates "slowest" and the last dihedral rotates "fastest." When the nth dihedral completes its cycle, dihedral n-1 is incremented to the next value, and dihedral n is again stepped through its range. This process is illustrated in the following set of diagrams:

And so on.........

The number of sampled conformations is given by the following equation:

The conformations generated by systematic searching follow a tree-like hierarchical organization, as shown in the following diagram for three bond rotations:

The major problem with systematic searching is that the number of conformations required to adequately sample the entire conformational space of a molecule can become extremely large when the number of rotatable bonds exceeds 5 (6 bonds assume 2,985,984 conformations at 30 degree increments). The number of samples can be reduced by limiting the range of rotation for a symmetric substituent (e.g. 0 to 180 degrees for a phenyl group) and/or increasing the rotation step size (all staggered conformations of a saturated C-C bond can be sampled at 120 degree increments).

Filtering is another important method for reducing computational requirements and data generation. The simplest filter is an energy cutoff. Conformations whose energies exceed the cutoff are discarded. The cutoff method helps to reduce the amount of data generated during a conformational search, but does not reduce the number of calculations. Filtering is best performed at the end of a search, after the highest and lowest energy sampled conformations are known, allowing a relative energy cutoff to be used. Tree "pruning" is a more sophisticated filter used in some conformational searching programs. Pruning takes advantage of the fact that all conformations along a given branch of the search tree that are downstream from a high energy conformation will also have high energies, and need not be calculated. Lastly, filtering can be performed using constraints that must be satisfied for all conformations. Examples of constraints include:

  1. Distance constraints (interatomic distances observed using NMR or suggested by a pharmacophore model).
  2. Conformational constraints (conformations that are known to be shared among a set of compounds, or with a rigid analog).

Energy Minimization

Energy minimization methods can precisely locate minimum energy conformations by mathematically "homing in" on the energy function minima (one at a time). The goal of energy minimization is to find a route (consisting of variation of the intramolecular degrees of freedom) from an initial conformation to the nearest minimum energy conformation using the smallest number of calculations possible. Note that, unlike molecular dynamics, the steps between starting and minimum energy conformations have no physical meaning in energy minimization.

All degrees of freedom in a molecule (3*number of atoms-1) are usually included in the calculations, although most programs allow you to geometrically tether (restrain) or fix (constrain) specific atoms. The relationship between an initial and minimum energy conformation is shown in the following hypothetical energy surface:

The most direct route is a straight line in conformational hyperspace, but minimization is rarely that efficient:

Minimization algorithms are designed to head down-hill toward the nearest minimum. Remote minima (separated from the initial conformation by an energy barrier) are not detected by energy minimization because this would require some period of up-hill movement. Each starting conformation may result in the detection of one minimum energy conformation (barring pathalogical problems), although different starting conformations can lead to the same minimum.

Energy minimization algorithms measure the energy surface along a series of incremental steps to determine a down-hill direction:

The local shape of the energy surface around a given conformation en route to a minimum is often assumed to be quadratic so as to simplify the mathematics. This is illustrated for a one-dimensional slice through an energy surface:

The quadratic energy function approximation is given by a Taylor series:

where "P" is the current point and "x" an arbitrary point on the energy surface, "b" is the gradient at "P", and "A" is the "hessian matrix" (the second partial derivatives) at "P." "A" and "b" can be viewed as parameters that fit the idealized quadratic form to the actual energy surface. The "A" term can improve the fit, allowing greater look-ahead (i.e. larger step sizes) toward a minimum, but at the cost of greatly increased computational requirements.

An energy minimum can be characterized by a small change in energy between steps and/or by a zero gradient of the energy function. The gradient usually provides the best indication for completion of minimization (convergence) because very small changes in the gradient are much more significant than small changes in the energy function itself. Precise convergence can, thus, be reached with a non-zero gradient. Under some pathological situations, the gradient test can fail, and the energy test is then used. Energy minimization programs often limit the calculations to a set computer time and/or a set number of steps/iterations by default. Ordinarily, you should substantially increase any such limits. If computer time and iterations are the "fuel" of energy minimization, an empty tank should not be construed as the end of the trip.

The major difference between energy minimization methods lies in their approach to calculating and applying the "b" term, whether the "A" term is included, and if so, how "A" is calculated. The major minimization techniques can be summarized as follows:

  1. Calculate "b" at each step, but don't keep track of "b's" from previous steps.
  2. Calculate "b" at each step, while keeping track of "b's" from previous steps.
  3. Calculate "b" and "A" (or a first-derivative approximation of "A") at each step, while keeping track of this information from previous steps.

Each of these approaches has tradeoffs between computational requirements/efficiency and avoidance of mathematical probelms associated with an actual energy surface compared to the quadratic model. The primary issues are:

  1. Tolerance to poorly behaved regions of an energy function (e.g. bad interatomic contacts, poorly defined minima).
  2. Stability/reliability/precision.

Problematic kinds of energy surfaces are associated with small changes in energy and gradient en route to a minimum, or even around a minimum itself. The shelf-like surface illustrates how convergence can occur before reaching a minimum energy conformation. There is no obvious down-hill direction in one part of the energy surface (the "doldrums"):

Shallow minima are poorly defined, and present no obvious conformation on which to converge:

Always examine the "log" file written during energy minimization. This file typically indicates the energy and gradient at each iteration, and provides error messages associated with convergence problems. Never assume that convergence has occurred unless so indicated in the "log" file. Verify that the final gradient does not exceed the cutoff value that you specified. The choice of minimization algorithm, gradient cutoff, and the use of multiple starting conformations can help to ensure convergence in most situations.

Some of the more commonly available minimization algorithms are listed below. This is by no means a complete list. Supported minimizers for the various minimization programs are listed in the software documentation.

  1. Simplex

  2. Steepest Descent

  3. Conjugate Gradients or Powell

  4. Newton-Raphson or BFGS

The choice of minimization algorithm should be matched to the problem at hand. A good strategy is to switch minimizers as the calculations progress, so that each algorithm operates in the regime for which it was designed. This can be done by successively running each algorithm for a preset numer of iterations. However, the final minimization session should always be allowed to run to convergence, as previously discussed. Algorithm switching strategies are summarized in the following table:

Search Strategies

Systematic energy sampling can be used to explore the full range of conformational space, but at the expense of finding the actual minima in the energy function. The degree of certainty in the minimum energy conformations found from systematic sampling is determined by the grid resolution (rotation increment). Energy minimization, on the other hand, can precisely locate minimum energy conformations, but cannot be used to explore the full range of conformational space (one staring point --> one minimum).

The number of atoms and rotatable bonds must be considered in devising a search strategy. The number of conformations to be calculated with systematic sampling should always be estimated. The number of conformations that can be reasonably calculated depends on the program and hardware platform (with SYBYL, that number may be several hundred thousand; with QUANTA, it should not exceed 4,000). Strategies for reducing the conformational sampling requirements include:

  1. Increasing the rotation increment as much as possible (e.g. sample only trans, gauche, gauche' states of saturated bonds; cis and trans states of unsaturated bonds).
  2. Limiting the range to the minimum necessary to fully cover conformational space.
  3. Freezing bonds that do not contribute useful information (e.g. methyl groups can be treated as "united atoms" in many force-fields).
  4. Separating bond rotations into two or more non-interacting classes, and sampling conformations separately for each class (4 bonds rotated at 30 degree increments generate approx. 21,000 samples vs. 288 samples for two sets of 2 bonds).
  5. Breaking the molecule into pieces, and sampling conformations separately for each piece (e.g. parent and substituent moieties).
  6. Using constraints to limit the scope of sampling.

The most frequently used strategy in conformational searching is based on a combination of systematic sampling and energy minimization. Systematic sampling is used to generate energetically reasonable starting conformations (distributed throughout all of conformational space) for energy minimization, which fine tunes the locations and energies of the preferred conformations.

Whenever possible, energy minimization should be performed using a semiempirical quantum energy calculation method, such as MOPAC or AMPAC. Molecular mechanics should be used for molecules that are too large for quantum calculations or if too many starting conformations are found from sampling (i.e. more than a few hundred after filtering).

The energy minimized structures should undergo a final filtering process. First, duplicate minimum energy conformations should be discarded (recall that different starting conformations can lead to the same minimum). This can be done by superimposing all conformations and testing the RMS deviation of their corresponding atoms. If the number of unique conformations is large, energy filtering should be considered. This is most productive for quantum mechanically calculated energies which can be subjected to a much smaller cutoff (as small as 5-10 kcal/mol) than molecular mechanics energies.

This "sample-minimize-filter" conformational search strategy can be used with SYBYL and other programs.

Other conformational search strategies include high temperature molecular dynamics/annealing and random sampling (e.g. monte carlo) approaches. These methods should be considered when the number of degrees of freedom is too large for systematic sampling.