Difference between revisions of "Best Practices"

From AlchemistryWiki
Jump to navigation Jump to search
m (→‎Rule 2: Turn off partial charges: - Fixing math markup)
 
(127 intermediate revisions by 4 users not shown)
Line 1: Line 1:
= Best Practices =
+
This is intended as an overview of best practices for free energy calculations, with references, when possible.
Here, we provide an overview of some of what we believe are the best practices for free energy calculations, with references, wherever possible. This document assumes you already have a basic idea of what these calculations are and what they do (if not, start with our FreeEnergyIntroduction), and that you already have a working knowledge of molecular simulations and basic terms like convergence and equilibration (if not, start with a textbook like Leach's [http://www.google.com/search?hl=en&lr=&client=safari&rls=en&q=Leach+Molecular+Modelling:+Principles+and+Applications&btnG=Search Molecular Modelling: Principles and Applications]). Simulation-package specific issues will not be addressed here.
 
  
Feel free to edit this document or the attached discussion pages. We hope this to be a place storing the community consensus on these issues. Do back up any edits with appropriate references.
+
This document assumes you already have a basic idea of what these calculations are and what they do (if not, you can learn more from the [[Free Energy Fundamentals]] section and introductory-level reviews{{Cite|Michel2010}}, {{Cite|Christ2010}}, {{Cite|Shirts2013}}), and that you already have a working knowledge of molecular simulation, convergence and equilibration. If not, start with a textbook like Leach's [http://www.google.com/search?hl=en&lr=&client=safari&rls=en&q=Leach+Molecular+Modelling:+Principles+and+Applications&btnG=Search Molecular Modelling: Principles and Applications]{{cite|Leach2001}}). The discussion is simulation-package agnostic, focusing on the important ideas.
  
== Introduction ==
+
This document is open for editing. We hope this to be a place to build and store the community consensus on these issues. Do back up any edits with appropriate references. If you are not sure about changes, post questions on the discussion pages.
Free energy calculations are appealing, in that, in principle, they allow calculation of rigorously correct free energies, '''given a particular set of parameters and physical assumptions'''. We firmly believe that the goal of these calculations, then, should be to obtain these correct free energies given the particular assumptions -- and ''not'' necessarily to match experiment. Only then can the underlying parameters and physical assumptions really be tested. In other words, there exists a single ''right'' answer for a free energy calculation, given this particular set of parameters and physical assumptions, and our goal is to obtain it. Here, we will call these free energies "correct" free energies. "Correct", then, implies at the very least that the computed free energies have converged, and that there are no underlying methodological problems.
 
  
It is also important to remember in what follows that free energy is a function of state, so there are many possible choices of pathway for a thermodynamic cycle connecting the same two endpoints. Some pathways may be more efficient than others (sometimes by many orders of magnitude) so methods which are in principle correct may not always be practical.
+
= Introduction =
 +
Free energies calculated from molecular simulation are appealing, because, in principle, they are rigorously correct, '''given a particular set of parameters and physical assumptions'''. Given this particular set of parameters and physical assumptionstarting point, there exists a single ''right'' answer for a free energy calculation. The goal is to obtain that free energy with as little statistical error as possible  -- and ''not'' necessarily to match experiment. Without convergence, agreement with experiment may be due only to fortuitous cancellation of error. Only after free energy estimates have converged can the underlying parameters and physical assumptions really be tested.
  
Here, we discuss best practices in the context of several practical examples: Solvation free energies, and binding free energies. Solvation free energies provide a basic starting point for considering a number of the key issues, and, for binding free energies, there are many more choices of pathway, which raises additional complications. In both cases, there are also different methods for computing the relevant free energy differences, which may differ in efficiency.
+
Free energy is a function of state, so there are many possible choices of pathway for a thermodynamic cycle connecting the same two endpoints. Some pathways may be more efficient than others (sometimes by many orders of magnitude). Methods which are in principle correct may not always be practical or useful.  A large part of the following discussion will discuss exactly how to construct useful and efficient pathways connecting the end states.
 +
 
 +
While free energy calculations may be done for many different situations, here we focus on best practices for binding and solvation free energy calculations. Solvation free energies provide a basic starting point for considering a number of the key issues, and are used frequently here because of this and because of their simplicity.  Often, additional complications make it harder to track down problems when considering binding free energy calculations.
 +
 
 +
= Guidelines for setting up your system files =
 +
 
 +
== Guideline 1: Automate when possible ==
 +
 
 +
If you manually set up your systems for simulation, we would urge you not to. It is extremely easy to make a mistake that affects the entire subsequent set of simulations. Additionally, if your mistake is a result of mistyping something or similar, you may end up with no record of what you did wrong. Instead, when possible, automate simulation preparation tasks so that they do not need to be done manually. Periodically validate the results of automated scripts, especially when you change them. Not only can scripts help mistakes, but when they do have bugs, the script itself provides a permanent record of exactly what you did. Version control your scripts (using SVN or Git) so you can go back check this record more easily.
 +
 
 +
= Guidelines for free energy calculation pathways =
  
== Guidelines for free energy calculation pathways ==
 
 
Alchemical free energies almost involve some insertion or deletion of atoms -- both in hydration free energy calculations, and in absolute and relative binding free energy calculations. By insertion and deletion, we mean decoupling or annihilation (see [[Decoupling and annihilation]] for definitions) of the interactions of the atoms in question. We believe that a basic list of rules and guidelines should be followed in any calculation that involves insertion or deletion:
 
Alchemical free energies almost involve some insertion or deletion of atoms -- both in hydration free energy calculations, and in absolute and relative binding free energy calculations. By insertion and deletion, we mean decoupling or annihilation (see [[Decoupling and annihilation]] for definitions) of the interactions of the atoms in question. We believe that a basic list of rules and guidelines should be followed in any calculation that involves insertion or deletion:
  
* '''Rule 1''': Always use soft-core potentials while decoupling or annihilating Lennard-Jones interactions
+
==Guideline 1: Always use soft-core potentials while decoupling or annihilating Lennard-Jones interactions==
* '''Rule 2: '''Never leave a partial atomic charge on an atom while its Lennard-Jones interactions are being removed
+
 
* '''Guideline 3: '''It is usually more efficient to perform electrostatic and Lennard-Jones transformations separately
+
When removing atoms from a system, the atoms should be removed gradually.  Because of how potentials are defined, what gradual removal means can be somewhat subtle.  The soft-core potential formalism is a well-tested way to do this. [[Intermediate States | Read more about how to choose the potential to decouple or annihilate Lennard-Jones sites]].
* '''Guideline 4: '''Inserting or deleting atoms is usually less efficient than mutating them, so transformations should involve as few insertions and deletions as possible.
+
 
It is worth looking at each of these in more detail.
+
==Guideline 2: Never leave a partial atomic charge on an atom while its Lennard-Jones interactions are being removed==
 +
 
 +
Partial atomic charge should never be allowed to remain on an atom while its Lennard-Jones interactions are being removed, because it can result in extremely strong electrostatic interactions in the absence of steric interactions. [[Constructing_a_Pathway_of_Intermediate_States#Avoid_turning_off_charges_and_Lennard-Jones_sites_at_the_same_time |  Read more about how to handle changes to both charges and Lennard-Jones sites]].
 +
 
 +
==Guideline 3: Use few insertions/deletions==
 +
 
 +
Electrostatics transformations are usually smooth functions of lambda, and require relatively few lambda values, while Lennard-Jones transformations – especially those involving insertion and deletions – can require substantially more lambda values (even when using soft core potentials) to obtain good phase-space overlap and therefore accurate free energy differences.{{cite|Shirts2005}}{{cite|Mobley2007}}{{cite|Mobley2007b}} Insertions and deletions of particles can be thought of as “difficult” transformations.{{cite|Jarzynski2006}}
 +
 
 +
Consequently, it is far more computationally efficient to modify existing particles (atoms) than to insert or delete new atoms.  Construct mutation pathways for relative free energy calculations that minimize the number of times atoms, especially large atoms, need to be inserted and deletet, since multiple choices of mutation pathways between a set of molecules are typically possible.
 +
 
 +
This guideline is unfortunately not useful for absolute free energy calculations, since these involve inserting or deleting entire molecules.
 +
 
 +
==Guideline 4: Think about whether your intermediate states are likely to converge or not==
 +
 
 +
Many choices of pathway are possible. It can often be helpful to think about whether a particular choice of pathway makes convergence easier or more difficult at each lambda, thus reducing the amount of simulation time required.
 +
 
 +
Visualize what happens to a ligand when it is partially noninteracting with the protein, especially as interactions become quite weak during an absolute binding free energy simulation.  It will have significantly more freedom to move around the pocket than when fully interacting- but the kinetics may still be slow, since it will get trapped in free energy minima.  It then may take quite a while to sample these states.
 +
 
 +
One can introduce distance or orientation restraints between the ligand and the protein.  These reduce the space the ligand must explore, thus speeding convergence, at the expense of an additional restraining calculation.  In the noninteracting state, the interaction potential can often be removed analytically. In the interacting state, the free energy of applying these restraints needs to be included.  The free energy of applying restraints can be non-trivial, since the ligand will be pulled away from alternative minima, but convergence issues at other intermediates will be greatly reduced.
 +
 
 +
At the fully noninteracting state, the amount of configurational sampling the ligand undergoes is dictated by this choice. A ligand with a single reference distance restrained relative to the protein will need to sample a spherical shell in configuration space, and a ligand with all six relative degrees of freedom restrained would need to sample only a very small region of configuration space.
 +
These two can take drastically different amounts of time, so in fact it can be much more efficient, at least in some cases, to use the additional restraints.{{cite|Mobley2006}}
 +
 
 +
=Guidelines for carrying out free energy calculations=
 +
 
 +
==Guideline 1: Equilibrium methods are easier to use for beginners than nonequilibrium or mixed-ensemble methods==
 +
 
 +
There are a very large number of basic methods for performing free energy calculation: [[Simulation_Acceleration#Other_Methods|Slow-growth, fast-growth (Jarzynski-style)]], and equilibrium (or instantaneous growth) free energy methods. For advanced users, there are significant efficiency advantages with careful use of these methods.
 +
 
 +
However, for beginners, and for standard simulations, we believe the evidence is in favor of equilibrium methods: Run simulations at fixed values of lambda, collect equilibrium data from these simulations, and calculate the free energies from these observations. There are far fewer things that can go wrong.  If it's a problem that really requires significantly more efficiency, it's probably worth doing a warmup problem for practice first!
 +
 
 +
[[Simulating_States_of_Interest|Read more about running equilibrium methods at the states of interest]].
 +
 
 +
If you are interested it digging deeper, [[Simulation_Acceleration|read about some more advanced sampling methods]] that use exchanges between equilibrated sites to improve efficiency.
 +
 
 +
==Guideline 2: Use equilibrated data==
 +
 
 +
For free energy methods to work, the simulations must sample from the correct Boltzmann distribution.  Make sure that your simulation is well-equilibrated, and that you are not in the transient region.  Simulations must separately be equilibrated at each lambda value.
 +
 
 +
Equilibration at only <math>\lambda=0</math>, followed by sequential short simulations at other lambda values may not be sufficient.  For example in a study by Mobley et al.{{cite|Mobley2007c}} the protein remains kinetically trapped in its starting <math>\lambda=0</math> equilibrium conformational state in all calculations. If long equilibration is deemed necessary, equilibration periods should be equally long at each lambda value.
 +
 
 +
==Guideline 3: Use sufficiently rigorous underlying simulation techniques==
 +
 
 +
Sometimes simulation approximations that work for single end point simulations break down for free energy calculations. Make sure you are simulating.  Particularly watch out for simulation parameters that are a function of the intermediate state!  For example, a shifted potential will have a different shift constant at each intermediate.  If the ''same'' error is being made at each state, then it will partially cancel out of the free energy calculation. If a ''different'' error is made at each intermediate, it can drastically affect the answer.
 +
 
 +
[[Simulating_States_of_Interest#Choosing_Simulation_Parameters| Read more about properly setting simulation parameters]] in the context of free energy simulations.
 +
 
 +
= Guidelines for analyzing simulations =
 +
 
 +
See [[Draft Standards for Post-Calculation Health Reports]] for a discussion on the proposed standard practices for diagnosing typical simulation issues after an alchemical free energy calculation.
 +
 
 +
== Guideline 1: Use [[Thermodynamic_Integration|TI]], [[Bennett_Acceptance_Ratio|BAR]], or [[Multistate_Bennett_Acceptance_Ratio|MBAR]] to calculate free energies ==
 +
 
 +
These methods are by far the most robust.{{cite|Paliwal2011}}  BAR (the Bennett acceptance ratio) is almost as good as MBAR (the multistate Bennett's acceptance ration) in most standard situations. If a sufficiently large number of intermediate states are used, which is usually not significantly more than the number required for good BAR and MBAR estimates, TI (thermodynamic integration) gives good results.{{cite|Paliwal2011}} [[Analyzing_Simulation_Results|Read more here about simulation analysis]].
 +
 
 +
== Guideline 2: Be careful and rigorous when calculating statistical uncertainties  ==
 +
 
 +
Just taking the difference between two simulations is not the same thing as computing. The easiest person to fool about the true statistical error in your calculation is yourself; you have more invested in a certain answer than your readers do! The "gold standard" is to run a sufficiently large number of copies and compute the standard error in the measurement. Sufficiently large is a bit subjective, but it's certainly more than two!  However this can be inefficient.  Many methods have built in error estimates, though these estimates usually assume uncorrelated data, so don't trust those errors unless you can show the data is uncorrelated.
 +
 
 +
Read more about:
 +
 
 +
* [[Simulation_Information_Gathering#Correlation| Calculating the degree of correlation in simulation data]].
 +
* [[Analyzing_Simulation_Results#Bootstrap_Sampling|Using bootstrap sampling to calculate uncertainties]].
 +
* [[Thermodynamic_Integration#Variance_of_TI|Some subtleties in propagating uncertainty in TI]].
 +
 
 +
== Guideline 3: Correct for possible artifacts resulting from periodicity  ==
 +
 
 +
Most calculations will be performed using periodic boundary conditions.  If you are calculating free energies that involve charged species, such as the binding of a charged ligand, the protonation of a protein side chain, or the solvation free energy of an ion, be aware that periodic boundary conditions can introduce large artifacts that can be corrected in the final data analysis.  For more detail, see [[Charged binding calculations]].
 +
 
 +
=Final Guideline: Verify, Verify, Verify! =
 +
 
 +
As you can tell there are a number of ways that the simulations could fail.  Try to determine as many ways as you can to make sure that the simulations are working correctly: for example:
  
=== Rule 1: Soft core potentials ===
+
* Do alternate thermodynamic cycles give the same result?
Rule 1 -- that soft core potentials should always be used when turning on or off Lennard-Jones interactions -- really should be a rule observed by all free energy calculations, we believe. Lennard-Jones interactions between particles have a really steep (<math>1/r^{12}</math>) rise in potential energy. This prevents particles from overlapping. However, to delete particles (atoms or molecules) these interactions need to be turned off somehow, and this is not as straightforward as it might seem.
+
* Are results sufficiently similar to previous results?
 +
* If the same calculation is repeated multiple times, is the difference between simulations consistent with the estimated uncertainties?
 +
* Are there any weird observations in the data as a results of errors in the scripts (like minuses being cut off?)
  
One relatively-commonly used choice ([http://dx.doi.org/10.1098/rsta.2005.1625 Fowler et al., 2005], [http://dx.doi.org/10.1007/s10822-005-9021-3 Chipot, Rozanska and Dixit, 2005], [http://amber.scripps.edu/doc9/index.html AMBER], and others) for turning these off is simple linear-scaling of that term in the potential energy or Hamiltonian: That is,  <math>V(\lambda) = (1 - \lambda) V_0 + \lambda V_1</math>, where <math>V_0</math> is the potential energy with full Lennard-Jones interactions, and <math>V_1</math> is the potential energy where the Lennard-Jones interactions have been turned off for the atoms which are being deleted. This means that, for the atoms being deleted, Lennard-Jones interactions scale at small <math>r</math> as <math>(1-\lambda)/r^{12}</math>. This has two unfortunate and interconnected consequences. First, there is a discontinuous change in the form of the interaction potential when going from <math>\lambda=1-\epsilon</math> (where <math>\epsilon</math> is a very small number) to <math>\lambda=1</math>, as the <math>1/r^{12}</math> term still is fairly important even at <math>\lambda</math> very near 1, but is entirely turned off at <math>\lambda=1</math>. Secondly, it leads to large forces, numerical instabilities, and other problems in simulations near <math>\lambda=1</math>. Formally, it has been shown that this leads to a integrable singularity in <math>dV/(d\lambda)</math>, which means that computing correct free energies with this scheme using thermodynamic integration is impossible using numerical techniques ([http://dx.doi.org/10.1063/1.432264 Mruzik et al., 1976],[http://fulcrum.physbio.mssm.edu/~mezei/ms/cv54mmc.pdf Mezei and Beveridge, 1986], [http://fulcrum.physbio.mssm.edu/~mezei/ms/cv85.pdf Resat and Mezei, 1993] and especially [http://dx.doi.org/10.1016/0009-2614(94)00397-1 Beutler et al., 1994], [http://dx.doi.org/10.1080/08927020211973 Pitera and van Gunsteren, 2002] and [http://www.dillgroup.ucsf.edu/~dmobley/papers/steinbrecher.pdf Steinbrecher et al., 2007] and references therein) and similar problems plague free energy perturbation schemes.
+
<!--
 +
In view of these facts, our recommendation is to use equilibrium simulations at a set of separate <math>\lambda</math> values to estimate free energy differences. But how should these simulations be performed? Should a <math>\lambda=0</math> simulation be performed first, for example, and then the ending conformation used to seed a <math>\lambda=0.1</math> simulation? Or should they be independent? There is no reason in principle that simulations cannot be performed serially – but this leads to several potential pitfalls and disadvantages. First, if simulations are performed serially, and it is later concluded that some of the intermediate simulations were not long enough, the entire set of calculations performed after those intermediate simulations must be repeated. Second, results at one <math>\lambda</math> value become coupled to those at previous lambda values, potentially leading to hysteresis (''add references'') and difficulties in troubleshooting. Independent simulations, on the other hand, provide the great practical advantage that if, at the analysis stage, it appears that data is insufficient at some range of <math>\lambda</math> values, these particular simulations can simply be extended without requiring repetition of other component simulations. Additionally, with large computer clusters now becoming relatively common, independent simulations can often be performed in parallel, leading to significant real-time savings over running the simulations serially. -->
  
In an attempt to get around this, some have suggested scaling the potential energies with <math>(1-\lambda)^k</math>, where <math>k</math> is an integer greater than 1. It can be shown that, for <math>k >= 4</math>, this leads to an integrable singularity in <math>dV/d\lambda</math>, so thermodynamic integration can in principle be done ([http://fulcrum.physbio.mssm.edu/~mezei/ms/cv54mmc.pdf Mezei and Beveridge, 1986], [http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TFN-44WCYB3-T8&_user=4430&_coverDate=06/03/1994&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000059594&_version=1&_urlVersion=0&_userid=4430&md5=43ac38d23e3e595dc97c273e012406e6 Beutler et al., 1994]). But integrable singularities still pose very substantial problems for molecular simulation, and this approach can still lead to large forces, numerical instabilities and energy conservation problems ([http://dx.doi.org/10.1016/0009-2614(94)00397-1 Beutler et al., 1994] and [http://www.dillgroup.ucsf.edu/~dmobley/papers/steinbrecher.pdf Steinbrecher et al., 2007]) and make free energy differences extremely difficult to converge ([D. Mobley, unpub. data]).
+
= Old materials =
 +
Our original version of this page is under [[Previous Best Practices]].
  
Since free energies are path-independent, an elegant solution to this problem was developed ([http://dx.doi.org/10.1016/0009-2614(94)00397-1 Beutler et al., 1994]) – to modify the Lennard-Jones functional form to gradually smooth out the <math>1/r^{12}</math> term as a function of <math>\lambda</math>, rather than simply multiplying it by a prefactor. This removes problems with numerical instabilities and singularities, and improves convergence properties ([http://dx.doi.org/10.1016/0009-2614(94)00397-1 Beutler et al., 1994], [http://dx.doi.org/10.1063/1.466707 Zacharias et al., 1994], [http://dx.doi.org/10.1080/08927020211973 Pitera and van Gunsteren, 2002]). The basic idea is that it allows particles to gradually begin to overlap as <math>\lambda</math> is changed, rather than saving a drastic change in interactions for the point going from <math>\lambda=1-epsilon<math> to <math>\lambda=1</math>. This approach is known as soft core potentials (and, alternately, "separation-shifted scaling"), and has subsequently been shown to be a nearly optimal path for modifying Lennard-Jones interactions ([http://dx.doi.org/10.1002/jcc.20025 Blondel, 2004]. In some work, several groups have further tested this approach and found a slightly modified functional form and set of parameters from that originally proposed ([http://dx.doi.org/10.1016/0009-2614(94)00397-1 Beutler et al., 1994]) which leads to improved efficiency for free energy calculations ([http://dx.doi.org/10.1063/1.1877132 Shirts and Pande, 2005], [D. Mobley, unpublished data]); we recommend that the soft core potentials and parameters from that work be employed in all free energy calculations involving insertion or deletion of particles.
+
= References =
 +
<references>
 +
{{Cite|Michel2010|Michel, J.; Essex, J. W. J Comp. Aided Mol. Des. 2010, 24:649.|http://www.citeulike.org/group/14929/article/9052428}}
 +
{{Cite|Christ2010|Christ, C. D.; Mark, A. E.; van Gunsteren, W. F. J. Comp. Chem. 2010, 31:1569|http://www.citeulike.org/group/14929/article/9052143}}
 +
{{Cite|Shirts2013|Shirts, M. R.; Mobley, D. L. Biomolecular Simulations, 2013, 24:271-311|http://www.citeulike.org/group/14929/article/12477253}}
  
Some testing has suggested that the <math>(1-\lambda)^k</math> scaling approach may be essentially adequate for hydration free energy calculations ([D. Mobley, unpublished data],[http://www.dillgroup.ucsf.edu/~dmobley/papers/steinbrecher.pdf Steinbrecher et al., 2007]) but it still less efficient there than soft-core potentials, so this does not affect our recommendation.
+
{{cite|Leach2001|Leach, A. Molecular Modelling: Principles and Applications (2nd Edition); Prentice Hall: 2 ed.; 2001.}}
  
In summary: Linearly scaling Lennard-Jones interactions back as a function of <math>\lambda</math> for insertion/deletion of particles is formally incorrect for numerical integration and leads to wrong estimates of free energy differences. While more complicated schemes involving <math>\lambda^k</math> scaling can be formally correct, there are serious concerns regarding their accuracy. Soft-core potentials provide a rigorously correct, efficient alternative to these and should be used whenever particles are inserted or deleted, preferably with the functional form and parameters of (([http://dx.doi.org/10.1063/1.1877132 Shirts and Pande, 2005]), unless future work finds a still more efficient set of parameters.
+
{{cite|Shirts2005|Shirts, M. R.;  Pande, V. S. J. Chem. Phys. 2005, 122, 134508.|http://www.citeulike.org/group/14929/article/9052562}}
  
===Rule 2: Turn off partial charges===
+
{{cite|Mobley2007|Mobley, D. L.;  Dumont, E.;  Chodera, J. D.;  Dill, K. A. Journal of Physical Chemistry B 2007, 111.|http://www.citeulike.org/group/14929/article/9052441}}
  
Rule 2 states that a partial atomic charge should never be allowed to remain on an atom while its Lennard-Jones interactions are being removed. To understand the reason for this, consider two atoms of opposite charge, A and B. Lennard-Jones interactions of atom A are being scaled back. Regardless of the scaling scheme used, at some lambda value, atoms A and B will begin to overlap occasionally, since the final state allows A and B to overlap totally. If A has a remaining partial atomic charge when these overlaps become possible, the two point charges assigned to A and B can actually overlap as well. Since the potential energy of Coulomb interactions between point charges scales as <math>q_{A} q_{B}/r_{AB}</math>, where <math>r_{AB}</math> is the distance between A and B, this presents significant problems when <math>q_A</math> and <math>q_B</math> have opposite signs. In particular, there is an infinite energy minimum at <math>r_{AB}=0</math>, so the two particles would in principle get trapped on top of one another.
+
{{cite|Mobley2007b|Mobley, D. L.;  Graves, A. P.;  Chodera, J. D.;  McReynolds, A. C.;  Shoichet, B. K.;  Dill, K. A. Journal of Molecular Biology 2007, 371,.|http://www.citeulike.org/group/14929/article/9052443}}
  
In practice, what usually happens in molecular dynamics simulations in these circumstances is that the forces get extremely large as A and B begin to overlap, and the simulation will crash. Constraint algorithms are often the first to fail, so this may lead to a warning about constraints (i.e. LINCS or SHAKE) and then a crash. This issue is discussed briefly by [http://dx.doi.org/10.1080/08927020211973 Pitera and van Gunsteren] and in more detail by [http://dx.doi.org/10.1063/1.1924449 Anwar and Heyes].
+
{{cite|Mobley2006|Mobley, D. L.;  Chodera, J. D.;  Dill, K. A. Journal of Chemical Physics 2006, 125, 084902.|http://www.citeulike.org/group/14929/article/9052440}}
  
In view of this problem, we recommend always turning off partial charges for any atoms for which Lennard-Jones interactions are being removed before doing the Lennard-Jones transformation. Additionally, when Lennard-Jones parameters for an atom are being substantially modified during a free energy calculation (i.e. for relative free energy calculations involving mutation of an atom) and soft-core potentials are employed, similar problems may arise, so it may be useful to remove partial charges on atoms which are being mutated, as well.  
+
<!--{{cite|Chodera2006|Chodera, J. D.;  Swope, W. C.;  Pitera, J. W.;  Seok, C.;  Dill, K. A. Journal of Chemical Theory and Computation 2007, 3.|http://www.citeulike.org/group/14929/article/9052139}}-->
  
Several groups have developed modified electrostatics scaling methods in an attempt to bypass this problem and allow electrostatics interactions and Lennard-Jones interactions to be turned off in only one set of calculations (for example, [http://dx.doi.org/10.1063/1.1924449 Anwar and Heyes]), but since electrostatics transformations are usually so smooth a function of <math>\lambda</math> and need only few <math>\lambda</math> values for good overlap ([http://dx.doi.org/10.1063/1.1877132 Shirts et al., 2005]; [http://dx.doi.org/10.1021/jp0667442 Mobley et al., 2007], and others) it is unclear that this results in any significant efficiency gain over performing the transformations separately.
+
<!--{{cite|Wu2005|Wu, D.;  Kofke, D. A. Journal of Chemical Physics 2005, 123, 054103.|http://www.citeulike.org/group/14929/article/9052669}}-->
  
In view of this, our recommendation is that either (a) partial charges on any particles being inserted or deleted be turned off prior to the use of soft core potentials for those particles, or (b) a soft core scheme for electrostatics be implemented to allow simultaneous changes.
+
{{cite|Jarzynski2006|Jarzynski, C. Phys. Rev. E 2006, 73, 046105.|http://www.citeulike.org/group/14929/article/9052297}}
  
===Guideline 3: Perform electrostatics transformations separately from Lennard-Jones ===
+
<!--{{cite|Fujitani2005|Fujitani, H.;  Tanida, Y.;  Ito, M.;  Shirts, M. R.;  Jayachandran, G.;  Snow, C. D.;  Sorin, E. J.;  Pande, V. S. Journal of Chemical Physics 2005, 123, 084108.|http://www.citeulike.org/group/14929/article/9052204}} -->
  
As noted in Rule 2, above, electrostatics transformations are typically smooth functions of lambda with good phase-space overlap between even coarsely-spaced lambda values([http://dx.doi.org/10.1063/1.1877132 Shirts et al., 2005]; [http://dx.doi.org/10.1021/jp0667442 Mobley et al., 2007], and others). As a consequence, these are quite efficient compared to Lennard-Jones calculations. As established above, when particles are being inserted or deleted, the electrostatic interactions of these particles should be set to zero before turning off their Lennard-Jones interactions. But what about electrostatic interactions on atoms which are merely being mutated (i.e. a change of partial charge and Lennard-Jones radius), as in relative free energy calculations?
+
{{cite|Mobley2007c|Mobley, D. L.;  Chodera, J. D.;  Dill, K. A. Journal of Chemical Theory and Computation 2007, 3.|http://www.citeulike.org/group/14929/article/9052442}}
  
We are not aware of any study which has looked at this in detail, but given the efficiency of free energy calculations modifying electrostatics interactions relative to those significantly modifying Lennard-Jones interactions, we believe it makes sense to perform the two sets of calculations separately. Given that the two transformations have different lambda-dependences, it might actually be less efficient to perform them together than separately. Performing them separately has an additional advantage, as well: Uncertainties in the two components can be assessed separately, and computational effort focused on reducing the largest uncertainty (i.e. by extending some simulations to get additional sampling).  
+
{{cite|Paliwal2011|Paliwal, H.;  Shirts M.R.;  Journal of Chemical Theory and Computation 2011, 7, 4115-4134,|http://www.citeulike.org/group/14929/article/10029023}}
  
Further testing should be focused in this area, to determine whether alternative scaling approaches which can modify Lennard-Jones and electrostatic interactions simultaneously ([http://dx.doi.org/10.1063/1.1924449 Anwar and Heyes]) are actually more efficient than the approach of separate modification that we propose.
+
</references>

Latest revision as of 11:41, 2 July 2014

This is intended as an overview of best practices for free energy calculations, with references, when possible.

This document assumes you already have a basic idea of what these calculations are and what they do (if not, you can learn more from the Free Energy Fundamentals section and introductory-level reviews[1], [2], [3]), and that you already have a working knowledge of molecular simulation, convergence and equilibration. If not, start with a textbook like Leach's Molecular Modelling: Principles and Applications[4]). The discussion is simulation-package agnostic, focusing on the important ideas.

This document is open for editing. We hope this to be a place to build and store the community consensus on these issues. Do back up any edits with appropriate references. If you are not sure about changes, post questions on the discussion pages.

Introduction

Free energies calculated from molecular simulation are appealing, because, in principle, they are rigorously correct, given a particular set of parameters and physical assumptions. Given this particular set of parameters and physical assumptionstarting point, there exists a single right answer for a free energy calculation. The goal is to obtain that free energy with as little statistical error as possible -- and not necessarily to match experiment. Without convergence, agreement with experiment may be due only to fortuitous cancellation of error. Only after free energy estimates have converged can the underlying parameters and physical assumptions really be tested.

Free energy is a function of state, so there are many possible choices of pathway for a thermodynamic cycle connecting the same two endpoints. Some pathways may be more efficient than others (sometimes by many orders of magnitude). Methods which are in principle correct may not always be practical or useful. A large part of the following discussion will discuss exactly how to construct useful and efficient pathways connecting the end states.

While free energy calculations may be done for many different situations, here we focus on best practices for binding and solvation free energy calculations. Solvation free energies provide a basic starting point for considering a number of the key issues, and are used frequently here because of this and because of their simplicity. Often, additional complications make it harder to track down problems when considering binding free energy calculations.

Guidelines for setting up your system files

Guideline 1: Automate when possible

If you manually set up your systems for simulation, we would urge you not to. It is extremely easy to make a mistake that affects the entire subsequent set of simulations. Additionally, if your mistake is a result of mistyping something or similar, you may end up with no record of what you did wrong. Instead, when possible, automate simulation preparation tasks so that they do not need to be done manually. Periodically validate the results of automated scripts, especially when you change them. Not only can scripts help mistakes, but when they do have bugs, the script itself provides a permanent record of exactly what you did. Version control your scripts (using SVN or Git) so you can go back check this record more easily.

Guidelines for free energy calculation pathways

Alchemical free energies almost involve some insertion or deletion of atoms -- both in hydration free energy calculations, and in absolute and relative binding free energy calculations. By insertion and deletion, we mean decoupling or annihilation (see Decoupling and annihilation for definitions) of the interactions of the atoms in question. We believe that a basic list of rules and guidelines should be followed in any calculation that involves insertion or deletion:

Guideline 1: Always use soft-core potentials while decoupling or annihilating Lennard-Jones interactions

When removing atoms from a system, the atoms should be removed gradually. Because of how potentials are defined, what gradual removal means can be somewhat subtle. The soft-core potential formalism is a well-tested way to do this. Read more about how to choose the potential to decouple or annihilate Lennard-Jones sites.

Guideline 2: Never leave a partial atomic charge on an atom while its Lennard-Jones interactions are being removed

Partial atomic charge should never be allowed to remain on an atom while its Lennard-Jones interactions are being removed, because it can result in extremely strong electrostatic interactions in the absence of steric interactions. Read more about how to handle changes to both charges and Lennard-Jones sites.

Guideline 3: Use few insertions/deletions

Electrostatics transformations are usually smooth functions of lambda, and require relatively few lambda values, while Lennard-Jones transformations – especially those involving insertion and deletions – can require substantially more lambda values (even when using soft core potentials) to obtain good phase-space overlap and therefore accurate free energy differences.[5][6][7] Insertions and deletions of particles can be thought of as “difficult” transformations.[8]

Consequently, it is far more computationally efficient to modify existing particles (atoms) than to insert or delete new atoms. Construct mutation pathways for relative free energy calculations that minimize the number of times atoms, especially large atoms, need to be inserted and deletet, since multiple choices of mutation pathways between a set of molecules are typically possible.

This guideline is unfortunately not useful for absolute free energy calculations, since these involve inserting or deleting entire molecules.

Guideline 4: Think about whether your intermediate states are likely to converge or not

Many choices of pathway are possible. It can often be helpful to think about whether a particular choice of pathway makes convergence easier or more difficult at each lambda, thus reducing the amount of simulation time required.

Visualize what happens to a ligand when it is partially noninteracting with the protein, especially as interactions become quite weak during an absolute binding free energy simulation. It will have significantly more freedom to move around the pocket than when fully interacting- but the kinetics may still be slow, since it will get trapped in free energy minima. It then may take quite a while to sample these states.

One can introduce distance or orientation restraints between the ligand and the protein. These reduce the space the ligand must explore, thus speeding convergence, at the expense of an additional restraining calculation. In the noninteracting state, the interaction potential can often be removed analytically. In the interacting state, the free energy of applying these restraints needs to be included. The free energy of applying restraints can be non-trivial, since the ligand will be pulled away from alternative minima, but convergence issues at other intermediates will be greatly reduced.

At the fully noninteracting state, the amount of configurational sampling the ligand undergoes is dictated by this choice. A ligand with a single reference distance restrained relative to the protein will need to sample a spherical shell in configuration space, and a ligand with all six relative degrees of freedom restrained would need to sample only a very small region of configuration space. These two can take drastically different amounts of time, so in fact it can be much more efficient, at least in some cases, to use the additional restraints.[9]

Guidelines for carrying out free energy calculations

Guideline 1: Equilibrium methods are easier to use for beginners than nonequilibrium or mixed-ensemble methods

There are a very large number of basic methods for performing free energy calculation: Slow-growth, fast-growth (Jarzynski-style), and equilibrium (or instantaneous growth) free energy methods. For advanced users, there are significant efficiency advantages with careful use of these methods.

However, for beginners, and for standard simulations, we believe the evidence is in favor of equilibrium methods: Run simulations at fixed values of lambda, collect equilibrium data from these simulations, and calculate the free energies from these observations. There are far fewer things that can go wrong. If it's a problem that really requires significantly more efficiency, it's probably worth doing a warmup problem for practice first!

Read more about running equilibrium methods at the states of interest.

If you are interested it digging deeper, read about some more advanced sampling methods that use exchanges between equilibrated sites to improve efficiency.

Guideline 2: Use equilibrated data

For free energy methods to work, the simulations must sample from the correct Boltzmann distribution. Make sure that your simulation is well-equilibrated, and that you are not in the transient region. Simulations must separately be equilibrated at each lambda value.

Equilibration at only [math]\displaystyle{ \lambda=0 }[/math], followed by sequential short simulations at other lambda values may not be sufficient. For example in a study by Mobley et al.[10] the protein remains kinetically trapped in its starting [math]\displaystyle{ \lambda=0 }[/math] equilibrium conformational state in all calculations. If long equilibration is deemed necessary, equilibration periods should be equally long at each lambda value.

Guideline 3: Use sufficiently rigorous underlying simulation techniques

Sometimes simulation approximations that work for single end point simulations break down for free energy calculations. Make sure you are simulating. Particularly watch out for simulation parameters that are a function of the intermediate state! For example, a shifted potential will have a different shift constant at each intermediate. If the same error is being made at each state, then it will partially cancel out of the free energy calculation. If a different error is made at each intermediate, it can drastically affect the answer.

Read more about properly setting simulation parameters in the context of free energy simulations.

Guidelines for analyzing simulations

See Draft Standards for Post-Calculation Health Reports for a discussion on the proposed standard practices for diagnosing typical simulation issues after an alchemical free energy calculation.

Guideline 1: Use TI, BAR, or MBAR to calculate free energies

These methods are by far the most robust.[11] BAR (the Bennett acceptance ratio) is almost as good as MBAR (the multistate Bennett's acceptance ration) in most standard situations. If a sufficiently large number of intermediate states are used, which is usually not significantly more than the number required for good BAR and MBAR estimates, TI (thermodynamic integration) gives good results.[11] Read more here about simulation analysis.

Guideline 2: Be careful and rigorous when calculating statistical uncertainties

Just taking the difference between two simulations is not the same thing as computing. The easiest person to fool about the true statistical error in your calculation is yourself; you have more invested in a certain answer than your readers do! The "gold standard" is to run a sufficiently large number of copies and compute the standard error in the measurement. Sufficiently large is a bit subjective, but it's certainly more than two! However this can be inefficient. Many methods have built in error estimates, though these estimates usually assume uncorrelated data, so don't trust those errors unless you can show the data is uncorrelated.

Read more about:

Guideline 3: Correct for possible artifacts resulting from periodicity

Most calculations will be performed using periodic boundary conditions. If you are calculating free energies that involve charged species, such as the binding of a charged ligand, the protonation of a protein side chain, or the solvation free energy of an ion, be aware that periodic boundary conditions can introduce large artifacts that can be corrected in the final data analysis. For more detail, see Charged binding calculations.

Final Guideline: Verify, Verify, Verify!

As you can tell there are a number of ways that the simulations could fail. Try to determine as many ways as you can to make sure that the simulations are working correctly: for example:

  • Do alternate thermodynamic cycles give the same result?
  • Are results sufficiently similar to previous results?
  • If the same calculation is repeated multiple times, is the difference between simulations consistent with the estimated uncertainties?
  • Are there any weird observations in the data as a results of errors in the scripts (like minuses being cut off?)


Old materials

Our original version of this page is under Previous Best Practices.

References

  1. Michel, J.; Essex, J. W. J Comp. Aided Mol. Des. 2010, 24:649. - Find at Cite-U-Like
  2. Christ, C. D.; Mark, A. E.; van Gunsteren, W. F. J. Comp. Chem. 2010, 31:1569 - Find at Cite-U-Like
  3. Shirts, M. R.; Mobley, D. L. Biomolecular Simulations, 2013, 24:271-311 - Find at Cite-U-Like
  4. Leach, A. Molecular Modelling: Principles and Applications (2nd Edition); Prentice Hall: 2 ed.; 2001.
  5. Shirts, M. R.; Pande, V. S. J. Chem. Phys. 2005, 122, 134508. - Find at Cite-U-Like
  6. Mobley, D. L.; Dumont, E.; Chodera, J. D.; Dill, K. A. Journal of Physical Chemistry B 2007, 111. - Find at Cite-U-Like
  7. Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K.; Dill, K. A. Journal of Molecular Biology 2007, 371,. - Find at Cite-U-Like
  8. Jarzynski, C. Phys. Rev. E 2006, 73, 046105. - Find at Cite-U-Like
  9. Mobley, D. L.; Chodera, J. D.; Dill, K. A. Journal of Chemical Physics 2006, 125, 084902. - Find at Cite-U-Like
  10. Mobley, D. L.; Chodera, J. D.; Dill, K. A. Journal of Chemical Theory and Computation 2007, 3. - Find at Cite-U-Like
  11. 11.0 11.1 Paliwal, H.; Shirts M.R.; Journal of Chemical Theory and Computation 2011, 7, 4115-4134, - Find at Cite-U-Like