Difference between revisions of "Best Practices"

From AlchemistryWiki
Jump to navigation Jump to search
 
(94 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{Fundamentals | cTopic=Theory}}
+
This is intended as an overview of best practices for free energy calculations, with references, when possible.
  
See the previous [[Best Practices Previous Version|Best Practices page]]  
+
This document assumes you already have a basic idea of what these calculations are and what they do (if not, you can learn more from the [[Free Energy Fundamentals]] section and introductory-level reviews{{Cite|Michel2010}}, {{Cite|Christ2010}}, {{Cite|Shirts2013}}), and that you already have a working knowledge of molecular simulation, convergence and equilibration. If not, start with a textbook like Leach's [http://www.google.com/search?hl=en&lr=&client=safari&rls=en&q=Leach+Molecular+Modelling:+Principles+and+Applications&btnG=Search Molecular Modelling: Principles and Applications]{{cite|Leach2001}}). The discussion is simulation-package agnostic, focusing on the important ideas.
  
 +
This document is open for editing.  We hope this to be a place to build and store the community consensus on these issues. Do back up any edits with appropriate references. If you are not sure about changes, post questions on the discussion pages. 
  
To understand the advantages and disadvantages of different free energy methods, it is important to begin with a review of the underlying principles. This page is dedicated to the most fundamental concepts of free energy calculations and is designed to give an in-depth view of the approaches, starting from the basics. This page also contains some of the common nomenclature and symbols that are seen throughout the rest of the [[:Category:Free Energy Fundamentals|free energy fundamentals]] pages and in the literature as a whole.
+
= Introduction =
 +
Free energies calculated from molecular simulation are appealing, because, in principle, they are rigorously correct, '''given a particular set of parameters and physical assumptions'''. Given this particular set of parameters and physical assumptionstarting point, there exists a single ''right'' answer for a free energy calculation. The goal is to obtain that free energy with as little statistical error as possible  -- and ''not'' necessarily to match experiment. Without convergence, agreement with experiment may be due only to fortuitous cancellation of error. Only after free energy estimates have converged can the underlying parameters and physical assumptions really be tested.
  
This page is not meant to be an end-all repository of the background mathematics and principals required for free energy calculations, but it will serve as a good start point and hopefully a quick reference.
+
Free energy is a function of state, so there are many possible choices of pathway for a thermodynamic cycle connecting the same two endpoints. Some pathways may be more efficient than others (sometimes by many orders of magnitude). Methods which are in principle correct may not always be practical or useful.  A large part of the following discussion will discuss exactly how to construct useful and efficient pathways connecting the end states.
  
=Why the name "Alchemical"?=
+
While free energy calculations may be done for many different situations, here we focus on best practices for binding and solvation free energy calculations. Solvation free energies provide a basic starting point for considering a number of the key issues, and are used frequently here because of this and because of their simplicity.  Often, additional complications make it harder to track down problems when considering binding free energy calculations.
One of the first questions, and often most confusing points, about a number of free energy calculations is why we refer to them as "alchemical" changes in a large number of computational methods. When people first hear the word "alchemy," they usually think of the medieval alchemists who were trying to turn lead into gold, or other such transformations. How computational free energy adopted the name takes a bit of understanding of simulation limitations and the properties of what we are calculating.
 
  
Considering that the natural evolution of some of the processes we try to model are well beyond reasonable simulation time scales, we must come up with very efficient ways to compute the free energy differences. Because free energy is a state variable (path independent), we can design simulations that provide a convenient pathway to computing free energy. Furthermore, since we are doing simulations, we are not limited by experimentally observable conditions so long as we are meticulous with our calculations.
+
= Guidelines for setting up your system files =
  
With these observations in mind, it was found that we can simulate modifications to atoms which can change their properties to reflect non-physical or entirely different species. This is roughly the same definition of alchemy from old in that we are changing the properties of the atoms to be something else, although we do it in a mathematically sound and rigorous manner; hence, the term "alchemical" was coined as many free energy calculations are "like alchemy" in their pathways and methods.
+
== Guideline 1: Automate when possible ==
  
There are of course many factors and checks that must be done to ensure accuracy and robustness of the calculations, many of which will be shown in the rest of the [[:Category:Free Energy Fundamentals|free energy fundamentals pages]].
+
If you manually set up your systems for simulation, we would urge you not to. It is extremely easy to make a mistake that affects the entire subsequent set of simulations. Additionally, if your mistake is a result of mistyping something or similar, you may end up with no record of what you did wrong. Instead, when possible, automate simulation preparation tasks so that they do not need to be done manually. Periodically validate the results of automated scripts, especially when you change them. Not only can scripts help mistakes, but when they do have bugs, the script itself provides a permanent record of exactly what you did. Version control your scripts (using SVN or Git) so you can go back check this record more easily.
  
=Assumptions for the Fundamentals=
+
= Guidelines for free energy calculation pathways =
The assumptions listed here are carried out through the rest of the fundamentals sections. Free energy calculations can usually be set up without these assumptions, but we'll make these assumptions to simplify the explanations.
 
  
* '''A standard molecular mechanics model will be assumed'''; this includes:
+
Alchemical free energies almost involve some insertion or deletion of atoms -- both in hydration free energy calculations, and in absolute and relative binding free energy calculations. By insertion and deletion, we mean decoupling or annihilation (see [[Decoupling and annihilation]] for definitions) of the interactions of the atoms in question. We believe that a basic list of rules and guidelines should be followed in any calculation that involves insertion or deletion:
** Harmonic bond angle terms
 
** Periodic dihedral terms
 
** Non-bonded terms made up of point charges and Lennard-Jones repulsion/dispersion terms
 
* constant temperature is maintained:  (discussed [[#Facts from the Free Energy Difference Definition | below]])
 
* '''Masses do not alchemically change.''' If one wishes to do this, substitute all potential energies, <math>U</math>, with the more general Hamiltonian, <math>\mathcal{H}</math>.
 
* '''QM/MM will ''not'' be considered''' for fundamentals because of the field has yet to be sufficiently developed.{{Cite|Woods2008|>Woods, C. J., Manby, F. R., and Mulholland, A. J. (2008) An efficient method for the calculation of quantum mechanics/molecular mechanics free energies. J. Chem. Phys. 128, 014109.|http://www.citeulike.org/group/14929/article/9052667}}
 
  
=Free Energy Difference Equation=
+
==Guideline 1: Always use soft-core potentials while decoupling or annihilating Lennard-Jones interactions==
Most free energy calculations and methods started from a single core equation derived from statistical mechanics. The free energy difference between two states is directly related to the ratio of probabilities of those states through the partition functions. This free energy difference for an NVT ensemble is
 
  
:<math>\displaystyle \Delta A_{ij} = -k_B T \ln \frac{Q_j}{Q_i} = -k_B T \ln \frac{\int_{\Gamma_j} e^{-\frac{U_j(\vec{q})}{k_B T}} d\vec{q}}{\int_{\Gamma_i} e^{-\frac{U_i(\vec{q})}{k_B T}}d\vec{q}}</math>
+
When removing atoms from a system, the atoms should be removed gradually.  Because of how potentials are defined, what gradual removal means can be somewhat subtle.  The soft-core potential formalism is a well-tested way to do this. [[Intermediate States | Read more about how to choose the potential to decouple or annihilate Lennard-Jones sites]].
  
where <math>\Delta A_{ij}</math> is the Helmholtz free energy difference between state <math>j</math> and state <math>i</math>, <math>k_B</math> the Boltzmann constant, <math>Q</math> the canonical partition function, <math>T</math> is the temperature, <math>U_i</math> and <math>U_j</math> are the potential energies as a function of the coordinates and momenta <math>\vec{q}</math> for two states, and <math>\Gamma_i</math> and <math>\Gamma_j</math> are the ''phase space volumes'' of <math>\vec{q}</math> over which we sample, or the total set of all allowed positions and momenta of the system.
+
==Guideline 2: Never leave a partial atomic charge on an atom while its Lennard-Jones interactions are being removed==
  
From this equation, all free energy calculations are derived.
+
Partial atomic charge should never be allowed to remain on an atom while its Lennard-Jones interactions are being removed, because it can result in extremely strong electrostatic interactions in the absence of steric interactions. [[Constructing_a_Pathway_of_Intermediate_States#Avoid_turning_off_charges_and_Lennard-Jones_sites_at_the_same_time |  Read more about how to handle changes to both charges and Lennard-Jones sites]].
  
=Nomenclature and Variables=
+
==Guideline 3: Use few insertions/deletions==
In an effort to keep things uniform, this section contains all the common constants and variables that are seen throughout the [[:Category:Free Energy Fundamentals|fundamentals sections.]] Although the Table itself is not in any particular order, the sorting buttons should help you find what you are looking for. The alternate listings of some variables are things one may encounter in literature, although this site will strive to be uniform in its naming process.
 
  
{|class="wikitable sortable" style="text-align: center"
+
Electrostatics transformations are usually smooth functions of lambda, and require relatively few lambda values, while Lennard-Jones transformations – especially those involving insertion and deletions – can require substantially more lambda values (even when using soft core potentials) to obtain good phase-space overlap and therefore accurate free energy differences.{{cite|Shirts2005}}{{cite|Mobley2007}}{{cite|Mobley2007b}} Insertions and deletions of particles can be thought of as “difficult” transformations.{{cite|Jarzynski2006}}
|-
 
! scope="col" | Variable, Acronym, Term
 
! scope="col" | Definition
 
! scope="col" class="unsortable" |Notes
 
|-
 
| <math>k_B</math>
 
| [http://en.wikipedia.org/wiki/Boltzmann_constant Boltzmann's Constant]
 
|
 
|-
 
| <math>U</math>
 
| Potential Energy
 
| This is the total internal energy  calculated a classical simulation, the sum of the potential and kinetic energies.
 
|
 
|-
 
| <math>T</math>
 
| Temperature
 
| This is often shown with <math>k_B</math> as well, and so common in fact that there is a common variable affiliated with it: <math>\beta</math>.
 
|-
 
| <math>\beta</math>
 
| Inverse Boltzmann Temperature <math>\beta=(k_BT)^{-1}</math>
 
| Because it is so common in most the equations, one will see <math>\beta^{-1}</math> more often than <math>k_BT</math>
 
|-
 
| <math>P</math>
 
| Pressure
 
|
 
|-
 
| <math>u</math>
 
| Reduced potential, expressed as <math>u=\beta(U+PV-\sum \mu N_i)</math>. For many equations, it is simpler to work with the reduced potential.
 
| In the case of NVT ensembles, the PV and <math>\mu N_i</math> terms are zero.  For NPT ensembles, the <math>\sum \mu N_i</math> is zero, and for <math>NV\mu</math> ensembles, the PV term is zero.
 
|-
 
  
| <math>V</math>
+
Consequently, it is far more computationally efficient to modify existing particles (atoms) than to insert or delete new atomsConstruct mutation pathways for relative free energy calculations that minimize the number of times atoms, especially large atoms, need to be inserted and deletet, since multiple choices of mutation pathways between a set of molecules are typically possible.
| Volume (traditional)
 
| '''CAUTION:''' Volume in this sense is the kind you think of with box size. Not to be confused with the ''Phase Space Volume.'' This particular volume is not often considered by itself and should only show up on this site within <math>PV</math> terms, or ''NVT''.
 
|-
 
| <math>\Gamma</math>
 
| Phase Space Volume: Total set of coordinates and momenta where the system has a nonzero probability of being found
 
| '''CAUTION:''' This is not volume in standard physical sense, meaning the volume of the box, but the volume space times the momentum space.
 
|-
 
| <math>q</math> or <math>\vec{q}</math>
 
| Collective variable for coordinates and momentum.
 
| Can also be seen as <math>\mathbf{x}</math>, <math>\mathbf{\vec{x}}</math>, and a number of other terms in literature.
 
|-
 
| <math>\epsilon</math> and <math>\sigma</math>
 
| Lennard-Jones parameters for the well depth and the radius of the point particle respectively.
 
|
 
|-
 
| <math>Q</math>
 
| Partition function for the canonical or NVT ensemble, but sometimes also signifies an arbitrary partition function of any ensemble. <br/><math>Q=\int_{\Gamma}e^{-\beta U (\vec{q})}\,\mathrm{d}\vec{q}</math>
 
| In the examples here, this will represent the canonical partition function although most of the theory shown here works for an arbitrary partition function as well, though sometimes with slight modifications, where <math>Q=\int_{\Gamma}e^{-u(\vec{q})}\,\mathrm{d}\vec{q}</math>
 
|-
 
| <math>\tau</math>
 
| [[Simulation_Information_Gathering#Correlation|Autocorrelation time]]
 
|
 
|-
 
| ''NVT''
 
| System held at constant '''N'''umber of particles, '''V'''olume, and '''T'''emperature. Free energy calculations performed in this ensemble yield Helmholtz free energies.
 
|
 
|-
 
| ''NPT''
 
| System held at constant '''N'''umber of particles, '''P'''ressue, and '''T'''emperature. Free energy calculations performed in this ensemble yield Gibbs free energies.
 
|
 
|-
 
| ''TI''
 
| [[Thermodynamic Integration]]
 
|
 
|-
 
| ''EXP''
 
| [[Exponential Averaging]]
 
| Also called "free energy perturbation" and the "Zwanzig relationship."
 
|-
 
| ''BAR''
 
| [[Bennett Acceptance Ratio]]
 
|
 
|-
 
| ''MBAR''
 
| [[Multistate Bennett Acceptance Ratio]]
 
|
 
|-
 
| Boltzmann Weight
 
| The probability of a microstate <math>\vec{q}</math>, defined <math>e^{-u(\vec{q})}/Q</math>, where <math>u(\vec{q})</math> is the reduced potential.  It can also refer to simply <math>e^{-u(\vec{q})}</math>, although this is not a full weight because it is not normalized by the partition function.   
 
|-
 
| State (variable <math>K</math>)
 
| A unique set of conditions and parameters that completely describe a thermodynamic system.
 
| For all examples here, the upper case variable <math>K</math> will denote all states (or the count thereof), and the lower case variable <math>k</math> will refer to a specific state.
 
|-
 
| <math>\lambda</math>
 
| Alchemical variable, thermodynamic path coordinate
 
| The most common symbol for denoting/tracking the progress along some alchemical pathway. Since most free energy calculations involve an [[Intermediate States | alchemical thermodynamic path]], [[Main Page| this site]] was named accordingly.
 
|-
 
| Annihilating/Annihilated
 
| The act of removing ALL of an atoms interactions with the system, this includes the molecule's interactions with itself.
 
| Intermolecular force = OFF, Intramolecular forces = OFF. Also known as disappeared or destroyed. Not to be confused with "decoupling."
 
|-
 
| Decoupling
 
| The act of removing an atom or molecules interactions with its surroundings ONLY; the interactions between the molecule itself remain active
 
| Intermolecular force = OFF, Intramolecular forces = ON. Not to be confused with "annihilation."
 
|}
 
  
==Facts from the Free Energy Difference Definition==
+
This guideline is unfortunately not useful for absolute free energy calculations, since these involve inserting or deleting entire molecules.
There are a number of key notes we can learn from the definition of free energy differences. Each of these can be important  in interpreting simulation results.
 
  
# '''Only free energy ''differences'' are ever calculated.''' There is never a calculation where absolute free energies are needed (and rarely can they be calculated at all) as all of the biological or thermodynamical quantities of interest are based on a free energy difference. As such, there must always be a minimum of two defined thermodynamic states.
+
==Guideline 4: Think about whether your intermediate states are likely to converge or not==
#* Even ''absolute'' free energies of binding are still free energy differences between two states: the ligand restricted to the binding site, and the ligand free to explore all other configurations.
 
# '''Free energy differences between states at different temperatures are usually not what you want to be calculating''' for problems of interest.  If it did, you would get <math>\Delta A_{ij} = -k_B T_j \ln Q_j + k_B T_i \ln Q_i</math>, which is no longer a ratio calculation and not needed for biological systems of interest. Temperature dependence on free energy is more likely to be "what is <math> \Delta A_{ij} </math> change at two different temperatures?"
 
# '''There are two different phase space volumes.''' <math>\Gamma_i</math> and <math>\Gamma_j</math> are often the same, but they are not required to be. The methods presented here almost always assumes that the two phase spaces overlap. However, when the spaces do not overlap, these methods break down and it is difficult to identify this problem without in depth knowledge of your system. Consider the example of a hard sphere solute with radius <math>\sigma</math> at state <math>i</math> and a Lennard Jones repulsion/dispersion potential, with the same <math>\sigma</math> at state <math>j</math>. Since <math>\Gamma _i</math> will not have molecules at a distance less than <math>\sigma</math>, but <math>\Gamma_j</math> will, the two phase spaces are not the same and these methods will either break down or return very error-prone results.
 
#* The degree to which the phase spaces are shared is called the "phase space overlap". Efficient free energy calculations require significant phase space overlap. There are  [[Intermediate_States | a number of strategies to address]] lack of overlap between target spaces, but determining the best way for any given situation is still a research question.
 
#* It should also be noted that "near zero probability" and "always zero probability" are two distinct things when considering phase space. So long as there is a chance for an observation to be made, no matter how small, it is considered part of the phase space.
 
  
=References=
+
Many choices of pathway are possible. It can often be helpful to think about whether a particular choice of pathway makes convergence easier or more difficult at each lambda, thus reducing the amount of simulation time required.
<References />
 
  
[[Category:Free Energy Fundamentals]]
+
Visualize what happens to a ligand when it is partially noninteracting with the protein, especially as interactions become quite weak during an absolute binding free energy simulation.  It will have significantly more freedom to move around the pocket than when fully interacting- but the kinetics may still be slow, since it will get trapped in free energy minima.  It then may take quite a while to sample these states.
 +
 
 +
One can introduce distance or orientation restraints between the ligand and the protein.  These reduce the space the ligand must explore, thus speeding convergence, at the expense of an additional restraining calculation.  In the noninteracting state, the interaction potential can often be removed analytically. In the interacting state, the free energy of applying these restraints needs to be included.  The free energy of applying restraints can be non-trivial, since the ligand will be pulled away from alternative minima, but convergence issues at other intermediates will be greatly reduced.
 +
 
 +
At the fully noninteracting state, the amount of configurational sampling the ligand undergoes is dictated by this choice. A ligand with a single reference distance restrained relative to the protein will need to sample a spherical shell in configuration space, and a ligand with all six relative degrees of freedom restrained would need to sample only a very small region of configuration space.
 +
These two can take drastically different amounts of time, so in fact it can be much more efficient, at least in some cases, to use the additional restraints.{{cite|Mobley2006}}
 +
 
 +
=Guidelines for carrying out free energy calculations=
 +
 
 +
==Guideline 1: Equilibrium methods are easier to use for beginners than nonequilibrium or mixed-ensemble methods==
 +
 
 +
There are a very large number of basic methods for performing free energy calculation: [[Simulation_Acceleration#Other_Methods|Slow-growth, fast-growth (Jarzynski-style)]], and equilibrium (or instantaneous growth) free energy methods. For advanced users, there are significant efficiency advantages with careful use of these methods.
 +
 
 +
However, for beginners, and for standard simulations, we believe the evidence is in favor of equilibrium methods: Run simulations at fixed values of lambda, collect equilibrium data from these simulations, and calculate the free energies from these observations. There are far fewer things that can go wrong.  If it's a problem that really requires significantly more efficiency, it's probably worth doing a warmup problem for practice first!
 +
 
 +
[[Simulating_States_of_Interest|Read more about running equilibrium methods at the states of interest]].
 +
 
 +
If you are interested it digging deeper, [[Simulation_Acceleration|read about some more advanced sampling methods]] that use exchanges between equilibrated sites to improve efficiency.
 +
 
 +
==Guideline 2: Use equilibrated data==
 +
 
 +
For free energy methods to work, the simulations must sample from the correct Boltzmann distribution.  Make sure that your simulation is well-equilibrated, and that you are not in the transient region.  Simulations must separately be equilibrated at each lambda value.
 +
 
 +
Equilibration at only <math>\lambda=0</math>, followed by sequential short simulations at other lambda values may not be sufficient.  For example in a study by Mobley et al.{{cite|Mobley2007c}} the protein remains kinetically trapped in its starting <math>\lambda=0</math> equilibrium conformational state in all calculations. If long equilibration is deemed necessary, equilibration periods should be equally long at each lambda value.
 +
 
 +
==Guideline 3: Use sufficiently rigorous underlying simulation techniques==
 +
 
 +
Sometimes simulation approximations that work for single end point simulations break down for free energy calculations. Make sure you are simulating.  Particularly watch out for simulation parameters that are a function of the intermediate state!  For example, a shifted potential will have a different shift constant at each intermediate.  If the ''same'' error is being made at each state, then it will partially cancel out of the free energy calculation. If a ''different'' error is made at each intermediate, it can drastically affect the answer.
 +
 
 +
[[Simulating_States_of_Interest#Choosing_Simulation_Parameters| Read more about properly setting simulation parameters]] in the context of free energy simulations.
 +
 
 +
= Guidelines for analyzing simulations =
 +
 
 +
See [[Draft Standards for Post-Calculation Health Reports]] for a discussion on the proposed standard practices for diagnosing typical simulation issues after an alchemical free energy calculation.
 +
 
 +
== Guideline 1: Use [[Thermodynamic_Integration|TI]], [[Bennett_Acceptance_Ratio|BAR]], or [[Multistate_Bennett_Acceptance_Ratio|MBAR]] to calculate free energies ==
 +
 
 +
These methods are by far the most robust.{{cite|Paliwal2011}}  BAR (the Bennett acceptance ratio) is almost as good as MBAR (the multistate Bennett's acceptance ration) in most standard situations. If a sufficiently large number of intermediate states are used, which is usually not significantly more than the number required for good BAR and MBAR estimates, TI (thermodynamic integration) gives good results.{{cite|Paliwal2011}} [[Analyzing_Simulation_Results|Read more here about simulation analysis]].
 +
 
 +
== Guideline 2: Be careful and rigorous when calculating statistical uncertainties  ==
 +
 
 +
Just taking the difference between two simulations is not the same thing as computing.  The easiest person to fool about the true statistical error in your calculation is yourself; you have more invested in a certain answer than your readers do! The "gold standard" is to run a sufficiently large number of copies and compute the standard error in the measurement. Sufficiently large is a bit subjective, but it's certainly more than two!  However this can be inefficient.  Many methods have built in error estimates, though these estimates usually assume uncorrelated data, so don't trust those errors unless you can show the data is uncorrelated.
 +
 
 +
Read more about:
 +
 
 +
* [[Simulation_Information_Gathering#Correlation| Calculating the degree of correlation in simulation data]].
 +
* [[Analyzing_Simulation_Results#Bootstrap_Sampling|Using bootstrap sampling to calculate uncertainties]].
 +
* [[Thermodynamic_Integration#Variance_of_TI|Some subtleties in propagating uncertainty in TI]].
 +
 
 +
== Guideline 3: Correct for possible artifacts resulting from periodicity  ==
 +
 
 +
Most calculations will be performed using periodic boundary conditions.  If you are calculating free energies that involve charged species, such as the binding of a charged ligand, the protonation of a protein side chain, or the solvation free energy of an ion, be aware that periodic boundary conditions can introduce large artifacts that can be corrected in the final data analysis.  For more detail, see [[Charged binding calculations]].
 +
 
 +
=Final Guideline: Verify, Verify, Verify! =
 +
 
 +
As you can tell there are a number of ways that the simulations could fail.  Try to determine as many ways as you can to make sure that the simulations are working correctly: for example:
 +
 
 +
* Do alternate thermodynamic cycles give the same result?
 +
* Are results sufficiently similar to previous results?
 +
* If the same calculation is repeated multiple times, is the difference between simulations consistent with the estimated uncertainties?
 +
* Are there any weird observations in the data as a results of errors in the scripts (like minuses being cut off?)
 +
 
 +
<!--
 +
In view of these facts, our recommendation is to use equilibrium simulations at a set of separate <math>\lambda</math> values to estimate free energy differences. But how should these simulations be performed? Should a <math>\lambda=0</math> simulation be performed first, for example, and then the ending conformation used to seed a <math>\lambda=0.1</math> simulation? Or should they be independent? There is no reason in principle that simulations cannot be performed serially – but this leads to several potential pitfalls and disadvantages. First, if simulations are performed serially, and it is later concluded that some of the intermediate simulations were not long enough, the entire set of calculations performed after those intermediate simulations must be repeated. Second, results at one <math>\lambda</math> value become coupled to those at previous lambda values, potentially leading to hysteresis (''add references'') and difficulties in troubleshooting. Independent simulations, on the other hand, provide the great practical advantage that if, at the analysis stage, it appears that data is insufficient at some range of <math>\lambda</math> values, these particular simulations can simply be extended without requiring repetition of other component simulations. Additionally, with large computer clusters now becoming relatively common, independent simulations can often be performed in parallel, leading to significant real-time savings over running the simulations serially. -->
 +
 
 +
= Old materials =
 +
Our original version of this page is under [[Previous Best Practices]].
 +
 
 +
= References =
 +
<references>
 +
{{Cite|Michel2010|Michel, J.; Essex, J. W. J Comp. Aided Mol. Des. 2010, 24:649.|http://www.citeulike.org/group/14929/article/9052428}}
 +
{{Cite|Christ2010|Christ, C. D.; Mark, A. E.; van Gunsteren, W. F. J. Comp. Chem. 2010, 31:1569|http://www.citeulike.org/group/14929/article/9052143}}
 +
{{Cite|Shirts2013|Shirts, M. R.; Mobley, D. L. Biomolecular Simulations, 2013, 24:271-311|http://www.citeulike.org/group/14929/article/12477253}}
 +
 
 +
{{cite|Leach2001|Leach, A. Molecular Modelling: Principles and Applications (2nd Edition); Prentice Hall: 2 ed.; 2001.}}
 +
 
 +
{{cite|Shirts2005|Shirts, M. R.;  Pande, V. S. J. Chem. Phys. 2005, 122, 134508.|http://www.citeulike.org/group/14929/article/9052562}}
 +
 
 +
{{cite|Mobley2007|Mobley, D. L.;  Dumont, E.;  Chodera, J. D.;  Dill, K. A. Journal of Physical Chemistry B 2007, 111.|http://www.citeulike.org/group/14929/article/9052441}}
 +
 
 +
{{cite|Mobley2007b|Mobley, D. L.;  Graves, A. P.;  Chodera, J. D.;  McReynolds, A. C.;  Shoichet, B. K.;  Dill, K. A. Journal of Molecular Biology 2007, 371,.|http://www.citeulike.org/group/14929/article/9052443}}
 +
 
 +
{{cite|Mobley2006|Mobley, D. L.;  Chodera, J. D.;  Dill, K. A. Journal of Chemical Physics 2006, 125, 084902.|http://www.citeulike.org/group/14929/article/9052440}}
 +
 
 +
<!--{{cite|Chodera2006|Chodera, J. D.;  Swope, W. C.;  Pitera, J. W.;  Seok, C.;  Dill, K. A. Journal of Chemical Theory and Computation 2007, 3.|http://www.citeulike.org/group/14929/article/9052139}}-->
 +
 
 +
<!--{{cite|Wu2005|Wu, D.;  Kofke, D. A. Journal of Chemical Physics 2005, 123, 054103.|http://www.citeulike.org/group/14929/article/9052669}}-->
 +
 
 +
{{cite|Jarzynski2006|Jarzynski, C. Phys. Rev. E 2006, 73, 046105.|http://www.citeulike.org/group/14929/article/9052297}}
 +
 
 +
<!--{{cite|Fujitani2005|Fujitani, H.;  Tanida, Y.;  Ito, M.;  Shirts, M. R.;  Jayachandran, G.;  Snow, C. D.;  Sorin, E. J.;  Pande, V. S. Journal of Chemical Physics 2005, 123, 084108.|http://www.citeulike.org/group/14929/article/9052204}} -->
 +
 
 +
{{cite|Mobley2007c|Mobley, D. L.;  Chodera, J. D.;  Dill, K. A. Journal of Chemical Theory and Computation 2007, 3.|http://www.citeulike.org/group/14929/article/9052442}}
 +
 
 +
{{cite|Paliwal2011|Paliwal, H.;  Shirts M.R.;  Journal of Chemical Theory and Computation 2011, 7, 4115-4134,|http://www.citeulike.org/group/14929/article/10029023}}
 +
 
 +
</references>

Latest revision as of 11:41, 2 July 2014

This is intended as an overview of best practices for free energy calculations, with references, when possible.

This document assumes you already have a basic idea of what these calculations are and what they do (if not, you can learn more from the Free Energy Fundamentals section and introductory-level reviews[1], [2], [3]), and that you already have a working knowledge of molecular simulation, convergence and equilibration. If not, start with a textbook like Leach's Molecular Modelling: Principles and Applications[4]). The discussion is simulation-package agnostic, focusing on the important ideas.

This document is open for editing. We hope this to be a place to build and store the community consensus on these issues. Do back up any edits with appropriate references. If you are not sure about changes, post questions on the discussion pages.

Introduction

Free energies calculated from molecular simulation are appealing, because, in principle, they are rigorously correct, given a particular set of parameters and physical assumptions. Given this particular set of parameters and physical assumptionstarting point, there exists a single right answer for a free energy calculation. The goal is to obtain that free energy with as little statistical error as possible -- and not necessarily to match experiment. Without convergence, agreement with experiment may be due only to fortuitous cancellation of error. Only after free energy estimates have converged can the underlying parameters and physical assumptions really be tested.

Free energy is a function of state, so there are many possible choices of pathway for a thermodynamic cycle connecting the same two endpoints. Some pathways may be more efficient than others (sometimes by many orders of magnitude). Methods which are in principle correct may not always be practical or useful. A large part of the following discussion will discuss exactly how to construct useful and efficient pathways connecting the end states.

While free energy calculations may be done for many different situations, here we focus on best practices for binding and solvation free energy calculations. Solvation free energies provide a basic starting point for considering a number of the key issues, and are used frequently here because of this and because of their simplicity. Often, additional complications make it harder to track down problems when considering binding free energy calculations.

Guidelines for setting up your system files

Guideline 1: Automate when possible

If you manually set up your systems for simulation, we would urge you not to. It is extremely easy to make a mistake that affects the entire subsequent set of simulations. Additionally, if your mistake is a result of mistyping something or similar, you may end up with no record of what you did wrong. Instead, when possible, automate simulation preparation tasks so that they do not need to be done manually. Periodically validate the results of automated scripts, especially when you change them. Not only can scripts help mistakes, but when they do have bugs, the script itself provides a permanent record of exactly what you did. Version control your scripts (using SVN or Git) so you can go back check this record more easily.

Guidelines for free energy calculation pathways

Alchemical free energies almost involve some insertion or deletion of atoms -- both in hydration free energy calculations, and in absolute and relative binding free energy calculations. By insertion and deletion, we mean decoupling or annihilation (see Decoupling and annihilation for definitions) of the interactions of the atoms in question. We believe that a basic list of rules and guidelines should be followed in any calculation that involves insertion or deletion:

Guideline 1: Always use soft-core potentials while decoupling or annihilating Lennard-Jones interactions

When removing atoms from a system, the atoms should be removed gradually. Because of how potentials are defined, what gradual removal means can be somewhat subtle. The soft-core potential formalism is a well-tested way to do this. Read more about how to choose the potential to decouple or annihilate Lennard-Jones sites.

Guideline 2: Never leave a partial atomic charge on an atom while its Lennard-Jones interactions are being removed

Partial atomic charge should never be allowed to remain on an atom while its Lennard-Jones interactions are being removed, because it can result in extremely strong electrostatic interactions in the absence of steric interactions. Read more about how to handle changes to both charges and Lennard-Jones sites.

Guideline 3: Use few insertions/deletions

Electrostatics transformations are usually smooth functions of lambda, and require relatively few lambda values, while Lennard-Jones transformations – especially those involving insertion and deletions – can require substantially more lambda values (even when using soft core potentials) to obtain good phase-space overlap and therefore accurate free energy differences.[5][6][7] Insertions and deletions of particles can be thought of as “difficult” transformations.[8]

Consequently, it is far more computationally efficient to modify existing particles (atoms) than to insert or delete new atoms. Construct mutation pathways for relative free energy calculations that minimize the number of times atoms, especially large atoms, need to be inserted and deletet, since multiple choices of mutation pathways between a set of molecules are typically possible.

This guideline is unfortunately not useful for absolute free energy calculations, since these involve inserting or deleting entire molecules.

Guideline 4: Think about whether your intermediate states are likely to converge or not

Many choices of pathway are possible. It can often be helpful to think about whether a particular choice of pathway makes convergence easier or more difficult at each lambda, thus reducing the amount of simulation time required.

Visualize what happens to a ligand when it is partially noninteracting with the protein, especially as interactions become quite weak during an absolute binding free energy simulation. It will have significantly more freedom to move around the pocket than when fully interacting- but the kinetics may still be slow, since it will get trapped in free energy minima. It then may take quite a while to sample these states.

One can introduce distance or orientation restraints between the ligand and the protein. These reduce the space the ligand must explore, thus speeding convergence, at the expense of an additional restraining calculation. In the noninteracting state, the interaction potential can often be removed analytically. In the interacting state, the free energy of applying these restraints needs to be included. The free energy of applying restraints can be non-trivial, since the ligand will be pulled away from alternative minima, but convergence issues at other intermediates will be greatly reduced.

At the fully noninteracting state, the amount of configurational sampling the ligand undergoes is dictated by this choice. A ligand with a single reference distance restrained relative to the protein will need to sample a spherical shell in configuration space, and a ligand with all six relative degrees of freedom restrained would need to sample only a very small region of configuration space. These two can take drastically different amounts of time, so in fact it can be much more efficient, at least in some cases, to use the additional restraints.[9]

Guidelines for carrying out free energy calculations

Guideline 1: Equilibrium methods are easier to use for beginners than nonequilibrium or mixed-ensemble methods

There are a very large number of basic methods for performing free energy calculation: Slow-growth, fast-growth (Jarzynski-style), and equilibrium (or instantaneous growth) free energy methods. For advanced users, there are significant efficiency advantages with careful use of these methods.

However, for beginners, and for standard simulations, we believe the evidence is in favor of equilibrium methods: Run simulations at fixed values of lambda, collect equilibrium data from these simulations, and calculate the free energies from these observations. There are far fewer things that can go wrong. If it's a problem that really requires significantly more efficiency, it's probably worth doing a warmup problem for practice first!

Read more about running equilibrium methods at the states of interest.

If you are interested it digging deeper, read about some more advanced sampling methods that use exchanges between equilibrated sites to improve efficiency.

Guideline 2: Use equilibrated data

For free energy methods to work, the simulations must sample from the correct Boltzmann distribution. Make sure that your simulation is well-equilibrated, and that you are not in the transient region. Simulations must separately be equilibrated at each lambda value.

Equilibration at only [math]\displaystyle{ \lambda=0 }[/math], followed by sequential short simulations at other lambda values may not be sufficient. For example in a study by Mobley et al.[10] the protein remains kinetically trapped in its starting [math]\displaystyle{ \lambda=0 }[/math] equilibrium conformational state in all calculations. If long equilibration is deemed necessary, equilibration periods should be equally long at each lambda value.

Guideline 3: Use sufficiently rigorous underlying simulation techniques

Sometimes simulation approximations that work for single end point simulations break down for free energy calculations. Make sure you are simulating. Particularly watch out for simulation parameters that are a function of the intermediate state! For example, a shifted potential will have a different shift constant at each intermediate. If the same error is being made at each state, then it will partially cancel out of the free energy calculation. If a different error is made at each intermediate, it can drastically affect the answer.

Read more about properly setting simulation parameters in the context of free energy simulations.

Guidelines for analyzing simulations

See Draft Standards for Post-Calculation Health Reports for a discussion on the proposed standard practices for diagnosing typical simulation issues after an alchemical free energy calculation.

Guideline 1: Use TI, BAR, or MBAR to calculate free energies

These methods are by far the most robust.[11] BAR (the Bennett acceptance ratio) is almost as good as MBAR (the multistate Bennett's acceptance ration) in most standard situations. If a sufficiently large number of intermediate states are used, which is usually not significantly more than the number required for good BAR and MBAR estimates, TI (thermodynamic integration) gives good results.[11] Read more here about simulation analysis.

Guideline 2: Be careful and rigorous when calculating statistical uncertainties

Just taking the difference between two simulations is not the same thing as computing. The easiest person to fool about the true statistical error in your calculation is yourself; you have more invested in a certain answer than your readers do! The "gold standard" is to run a sufficiently large number of copies and compute the standard error in the measurement. Sufficiently large is a bit subjective, but it's certainly more than two! However this can be inefficient. Many methods have built in error estimates, though these estimates usually assume uncorrelated data, so don't trust those errors unless you can show the data is uncorrelated.

Read more about:

Guideline 3: Correct for possible artifacts resulting from periodicity

Most calculations will be performed using periodic boundary conditions. If you are calculating free energies that involve charged species, such as the binding of a charged ligand, the protonation of a protein side chain, or the solvation free energy of an ion, be aware that periodic boundary conditions can introduce large artifacts that can be corrected in the final data analysis. For more detail, see Charged binding calculations.

Final Guideline: Verify, Verify, Verify!

As you can tell there are a number of ways that the simulations could fail. Try to determine as many ways as you can to make sure that the simulations are working correctly: for example:

  • Do alternate thermodynamic cycles give the same result?
  • Are results sufficiently similar to previous results?
  • If the same calculation is repeated multiple times, is the difference between simulations consistent with the estimated uncertainties?
  • Are there any weird observations in the data as a results of errors in the scripts (like minuses being cut off?)


Old materials

Our original version of this page is under Previous Best Practices.

References

  1. Michel, J.; Essex, J. W. J Comp. Aided Mol. Des. 2010, 24:649. - Find at Cite-U-Like
  2. Christ, C. D.; Mark, A. E.; van Gunsteren, W. F. J. Comp. Chem. 2010, 31:1569 - Find at Cite-U-Like
  3. Shirts, M. R.; Mobley, D. L. Biomolecular Simulations, 2013, 24:271-311 - Find at Cite-U-Like
  4. Leach, A. Molecular Modelling: Principles and Applications (2nd Edition); Prentice Hall: 2 ed.; 2001.
  5. Shirts, M. R.; Pande, V. S. J. Chem. Phys. 2005, 122, 134508. - Find at Cite-U-Like
  6. Mobley, D. L.; Dumont, E.; Chodera, J. D.; Dill, K. A. Journal of Physical Chemistry B 2007, 111. - Find at Cite-U-Like
  7. Mobley, D. L.; Graves, A. P.; Chodera, J. D.; McReynolds, A. C.; Shoichet, B. K.; Dill, K. A. Journal of Molecular Biology 2007, 371,. - Find at Cite-U-Like
  8. Jarzynski, C. Phys. Rev. E 2006, 73, 046105. - Find at Cite-U-Like
  9. Mobley, D. L.; Chodera, J. D.; Dill, K. A. Journal of Chemical Physics 2006, 125, 084902. - Find at Cite-U-Like
  10. Mobley, D. L.; Chodera, J. D.; Dill, K. A. Journal of Chemical Theory and Computation 2007, 3. - Find at Cite-U-Like
  11. 11.0 11.1 Paliwal, H.; Shirts M.R.; Journal of Chemical Theory and Computation 2011, 7, 4115-4134, - Find at Cite-U-Like