Best Practices

From AlchemistryWiki
Jump to navigation Jump to search

Best Practices

We provide an overview of some of what we believe are the best practices for free energy calculations, with references, wherever possible.

This document assumes you already have a basic idea of what these calculations are and what they do (if not, you can learn more from the Free Energy Fundamentals section), and that you already have a working knowledge of molecular simulations and basic terms like convergence and equilibration (if not, start with a textbook like Leach's Molecular Modelling: Principles and Applications). Simulation-package specific issues will not be addressed here.

Feel free to edit this document or the attached discussion pages. We hope this to be a place storing the community consensus on these issues. Do back up any edits with appropriate references.

Previous Best Practices

Introduction

Free energy calculations are appealing, in that, in principle, they allow calculation of rigorously correct free energies, given a particular set of parameters and physical assumptions. We the goal of these calculations, should be to obtain these correct free energies given the particular assumptions -- and not necessarily to match experiment. Only then can the underlying parameters and physical assumptions really be tested. In other words, given this particular set of parameters and physical assumption, sthere exists a single right answer for a free energy calculation, and our goal is to obtain that free energy with as little statistical error as possible. We will call these free energies "correct" free energies. "Correct", then, implies at the very least that the computed free energies have converged, and that there are no underlying methodological problems.

It is also important to remember in what follows that free energy is a function of state, so there are many possible choices of pathway for a thermodynamic cycle connecting the same two endpoints. Some pathways may be more efficient than others (sometimes by many orders of magnitude) so methods which are in principle correct may not always be practical or useful.

Below, we discuss best practices in the context of several practical examples: Solvation free energies, and binding free energies. Solvation free energies provide a basic starting point for considering a number of the key issues. For binding free energies, there are many more choices of pathway, which raises additional complications. In both cases, there are also different methods for computing the relevant free energy differences, which may differ in efficiency.

Guidelines for setting up your system files

===Guideline 1: Choose your molecular parameter carefully.

Guideline 2: Automate when possible

Guidelines for free energy calculation pathways

Alchemical free energies almost involve some insertion or deletion of atoms -- both in hydration free energy calculations, and in absolute and relative binding free energy calculations. By insertion and deletion, we mean decoupling or annihilation (see Decoupling and annihilation for definitions) of the interactions of the atoms in question. We believe that a basic list of rules and guidelines should be followed in any calculation that involves insertion or deletion:

Guideline 1: Always use soft-core potentials while decoupling or annihilating Lennard-Jones interactions

When removing atoms from a system, the atoms should be removed gradually. Because of how potentials are defined, What gradual removal means can be somewhat subtle. The soft-core potential formalism is a well-tested way to do this. Read more about how to choose the potential to decouple or annihilate Lennard-Jones sites.

Guideline 2: Never leave a partial atomic charge on an atom while its Lennard-Jones interactions are being removed

Partial atomic charge should never be allowed to remain on an atom while its Lennard-Jones interactions are being removed, because it can result in an instance of two bare charges. Read more about how to handle changes to both charges and Lennard-Jones sites.

Guideline 3: Use few insertions/deletions

Electrostatics transformations are usually smooth functions of lambda (without soft core potentials), and require few lambda values, while Lennard-Jones transformations – especially those involving insertion and deletions – are difficult transformations which require substantially more lambda values (even when using soft core potentials) to obtain good phase-space overlap and accurate free energy differences (Shirts et al., 2005; Mobley et al., 2007, Mobley et al., 2007b, and others). Thus, insertions and deletions of particles can be thought of as “difficult” transformations (i.e. Jarzynski, 2006). Consequently, it is far more computationally efficient to modify existing particles (atoms) than to insert or delete new atoms; this should be kept in mind when constructing mutation pathways for relative free energy calculations, since multiple choices of mutation pathways between a set of molecules are typically possible.

This guideline is not at all helpful for absolute free energy calculations, since these by design involve inserting or deleting entire molecules.

Guideline 4: Think about whether your intermediate states are likely to converge or not

Given that many choices of pathway are possible, it can often be helpful to think about whether a particular choice of pathway makes convergence easier or more difficult.

For example, in absolute binding free energies, one can incorporate the standard state using either simple distance restraints between the ligand and the protein, or by restraining the ligand orientation as well. At the fully noninteracting state, the amount of configuration space the ligand will need to sample is dictated by this choice. Hence, a ligand with only a single reference distance restrained relative to the protein will need to sample a spherical shell in configuration space, while a ligand with all six relative degrees of freedom restrained would need to sample only a very small region of configuration space. These two can take drastically different amounts of time, so in fact it can be much more efficient, at least in some cases, to use the additional restraints (Mobley et al., 2006).

Phase space overlap

Phase space overlap is an additional consideration in performing free energy calculations. Essentially, all of the equilibrium approaches for estimating free energy differences rely on having simulations run at separate lambda values that are sufficiently closely-spaced that adjoining lambda values sample similar configurations. There are three basic ways to attempt to assess whether this requirement is met, listed in order of ascending sophistication:

  1. Trial and error: For a particular system, begin with very closely spaced lambda values, and then gradually increase separation until results begin to deteriorate.
  2. Error analysis:
    1. Using autocorrelation analysis (i.e. Chodera et al., 2006)
    2. If necessary, using block bootstrap approaches (i.e. Mobley et al., 2006)
  3. Phase space overlap measures (i.e. Wu and Kofke, 2005).

Different types of transformations may have different overlap properties. For example, above, we noted that electrostatics transformations are usually smooth, which, more rigorously, means that relatively good phase space overlap is maintained even with fairly coarsely-spaced lambda values. Insertion and deletion of particles, on the other hand, requires more closely-spaced lambda values to ensure good overlap.

Guidelines for performing free energy calculations

Here, we attempt to separate methods for analyzing the results of free energy calculations, from methods for performing free energy calculations. For example, thermodynamic integration and exponential averaging (among others) are methods for analyzing free energy calculations, while equilibrium, slow-growth, and fast-growth methods can be used for performing free energy calculations. In this section, we focus on methods for performing free energy calculations, and address analysis methods below.

Guideline 1: Equilibrium methods are easier to use than nonequilibrium or mixed-ensemble methods

There are several basic groups of methods for performing free energy calculations: Slow-growth, fast-growth, and equilibrium (or instantaneous growth) free energy methods. Slow-growth and equilibrium methods are more traditional, while fast-growth (non-equilibrium) methods have gained considerable recent interest. In slow-growth methods, the coupling parameter, [math]\displaystyle{ \lambda }[/math], is slowly varied over the course of one or several simulations to take a system gradually from [math]\displaystyle{ \lambda=0 }[/math] to [math]\displaystyle{ \lambda= }[/math]; from this, the free energy difference between endpoints is estimated. In equilibrium methods, on the other hand, separate simulations are run at multiple different lambda values and information from the individual simulations is used to estimate free energy differences between adjoining lambda values. Fast-growth methods are based on the demonstration by Jarzynski in 1997 that the free energy difference associated with a particular equilibrium process can be computed by taking an appropriate average of the irreversible work done in performing many (potentially rapid) non-equilibrium trials of the same process. When applied to alchemical free energy calculations, this simply amounts to estimating free energy differences by performing many different rapid versions of a slow-growth trial – that is, rapidly changing lambda between two values (i.e. 0 and 1) in many separate trials, and monitoring the work done each time. Equilibrium free energy calculations can be thought of as "instantaneous growth" as they rely on estimating the free energy difference between neighboring [math]\displaystyle{ \lambda }[/math] values based only on instantaneous evaluations of potential energy differences between [math]\displaystyle{ \lambda }[/math] values (or by evaluation of and extrapolation of [math]\displaystyle{ dV/d\lambda }[/math] at a particular [math]\displaystyle{ \lambda }[/math] value).

Which method should be used for alchemical free energy calculations? At this point, we believe the evidence is in favor of equilibrium methods. Slow-growth methods have a number of well-documented problems, such as Hamiltonian lag and hysteresis (add references). Additionally, if a slow-growth calculation does not lead to a sufficiently precise free energy estimate, the only way to improve the free energy estimate is to repeat the calculation with a slower rate of change in lambda – the simulation cannot be extended to gain additional precision, meaning that significant data can be wasted. Fast-growth methods, while conceptually appealing, do not appear to offer substantial advantages over equilibrium methods (Jarzynski, 2006, Oostenbrink and van Gunsteren, 2006).

Guideline 2: Use multiple methods to analyze the data

In view of these facts, our recommendation is to use equilibrium simulations at a set of separate [math]\displaystyle{ \lambda }[/math] values to estimate free energy differences. But how should these simulations be performed? Should a [math]\displaystyle{ lambda=0 }[/math] simulation be performed first, for example, and then the ending conformation used to seed a [math]\displaystyle{ lambda=0.1 }[/math] simulation? Or should they be independent? There is no reason in principle that simulations cannot be performed serially – but this leads to several potential pitfalls and disadvantages. First, if simulations are performed serially, and it is later concluded that some of the intermediate simulations were not long enough, the entire set of calculations performed after those intermediate simulations must be repeated. Second, results at one [math]\displaystyle{ \lambda }[/math] value become coupled to those at previous lambda values, potentially leading to hysteresis (add references) and difficulties in troubleshooting. Independent simulations, on the other hand, provide the great practical advantage that if, at the analysis stage, it appears that data is insufficient at some range of [math]\displaystyle{ \lambda }[/math] values, these particular simulations can simply be extended without requiring repetition of other component simulations. Additionally, with large computer clusters now becoming relatively common, independent simulations can often be performed in parallel, leading to significant real-time savings over running the simulations serially.

One requirement for running equilibrium simulations at independent lambda values is that simulations must separately be equilibrated at each lambda value. Significant literature exists on assessing equilibration in molecular dynamics simulations (add references), so we refer the reader there. Suffice it to say that meaningful results depend crucially on having adequate equilibration at each lambda value.

Some workers have attempted to address the need for equilibration by performing a long (many nanosecond) pre-equilibration at [math]\displaystyle{ \lambda=0 }[/math] before beginning free energy calculations at a variety of lambda values (Fujitani et al.). It is possible that this approach is a mistake. For example, if this pre-equilibration is substantially longer than simulation times, it may allow conformational changes in the system at [math]\displaystyle{ \lambda=0 }[/math] which are then not sampled on simulation timescales, trapping the system in a configuration that (while appropriate at [math]\displaystyle{ \lambda=0 }[/math]) may not be appropriate at other lambda values. For an example of what could happen, see (Mobley et al., 2007b), where the protein remains kinetically trapped in its starting conformational state throughout all component free energy calculations. Our recommendation is that, if long equilibration is deemed necessary, equilibration periods should be equally long at each lambda value.

Guideline 3: Monitor whether your free energy calculations are actually statistically converging

  1. Trial and error: For a particular system, begin with very closely spaced lambda values, and then gradually increase separation until results begin to deteriorate.
  2. Error analysis:
    1. Using autocorrelation analysis (i.e. Chodera et al., 2006)
    2. If necessary, using block bootstrap approaches (i.e. Mobley et al., 2006)