# Simulating States of Interest

This section should serve as a guide for the preparation of the simulation as a whole and data collection, not as an answer to the question of "which simulation software should I use?" So long as the software can handle free energy calculations (or you can make it), and collects the information the analysis method that you choose needs, then any software is acceptable. Some software may be more suited to certain methods than others, but analysis of this is beyond the scope of this page.

There are three core things you need when running simulations (although more will be discussed below); those items are:

1. Simulations must be run at equilibrium. Special rules apply when running nonequilibrium simulations.
2. All samples must be collected from states of interest.
3. Samples used in the analysis must be uncorrelated and independent.

# Running Simulations at Equilibrium

All simulations used for analysis with any of these methods should be done at equilibrium. This is true not only for the end points, but also for every intermediate state you have. Because rare events are often given such a large importance in most simulation packages, running nonequilibrium simulations will give you the wrong results. There are some methods which require nonequilibrium conditions, but even those must start at equilibrium before perturbing the system. There are no hard and fast rules for determining exactly how long one needs to run before equilibration time, so you must use your expertise to ensure that the system is correctly equilibrated.

It is actually rather difficult for your simulation to start in equilibrium, especially for explicit solvent systems where the solvent is automatically generated. Many simulation software packages actually run or recommend you run an energy minimization step prior to beginning to correct for possible overlapping molecules. These alone are not enough to sufficiently bring the system into equilibrium. The best way to do this is to actually run the simulation for a period of time and then simply exclude those samples from analysis. This equilibration must be done at every intermediate too.

One efficient scheme to get close to equilibration at each lambda is to run short (10-100 ps) simulations at each state, then restart/begin collecting data with the configuration at the end of this time window. This will not guarantee a full relaxation of the system, but it does significantly help reduce potential instabilities in your simulation; full relaxation can take 100s to 1000s of ps or longer in some cases.[1] As $\lambda$ changes, the volume should be allowed to change as well so the solvent density of the system does not change as a result of state. Liquids are incompressible, which means that any small change in volume can have stark changes in the thermodynamic properties. Even when running at NVT conditions, the average volume from an NPT equilibration should be used as box fluctuations can cause free energy differences of 0.1-0.3 kcal/mol for typical small molecule solvation.

Solvating small molecules may take 100-500ps, but this is a best case scenario. Large protein-ligand systems started out of equilibrium have very lengthy correlation times and may take hundreds of nanoseconds to equilibrate. Things to watch for when checking equilibration are the system energy ($U$), $\left\langle \mathrm{d} U / \mathrm{d} \lambda \right\rangle$, and the number of hydrogen bonds (for small molecules) for convergence; the hydrogen bonds may be slow to converge.[2]

## Choosing Simulation Parameters

Free energy calculations typically involve calculating free energy differences that are relatively small compared to the total potential energy of a system. As a consequence, certain simulation parameters which may be unimportant for “typical” molecular dynamics simulations (because they change the total potential energy by such a small percentage, for example) can be tremendously important in free energy calculations. Thus, it is important to think carefully about simulation settings used for free energy calculations. Here are some examples of issues we have found to be important:

• PME parameters: When using lattice-sums (Ewald, PME, PPME, and variants) to handle long range electrostatics, setting details for these can substantially affect computed free energies. In particular, one needs to ensure that the settings chosen give electrostatic interaction energies accurate to very small fractions of a kcal/mol. If this condition is not met, computed free energies can be wrong – sometimes even by several kcal/mol. To test this, one can compute accurate electrostatics energies for a set of simulation snapshots by evaluating their energies using very long electrostatics cutoffs, then re-evaluate energies using shorter cutoffs and lattice-sum electrostatics. Settings for lattice-sum electrostatics and electrostatic cutoffs should be chosen so that the total potential energy from lattice sum is accurate (compared to that evaluated with a very long cutoff) to a very small fraction of a kcal/mol (less than the desired uncertainty in the computed free energies). Some work has found that default settings for some simulation packages may lead to errors in free energies of up to several kcal/mol [Mobley].
• Thermostat choice: The choice of thermostat (for constant temperature simulations) can be quite important, especially in absolute free energy calculations. Many thermostats perform well when a system has a sufficiently large number of degrees of freedom, but this is not an adequate criteria for absolute free energy calculations, as at the end state in these calculations, a portion of the system typically does not interact with its environment. This portion – i.e. a small molecule – may have comparatively few degrees of freedom. Hence, simulations done with thermostats that do not sample from the correct distribution of kinetic energies even in the limit of few degrees of freedom may exhibit problems such as non-ergodicity near these end states (Shirts, Mobley unpub, Villa and Mark 2001). Thus it is important to use thermostats (like Andersen’s thermostat or dynamics (like Langevin dynamics) which are known to sample from the correct ensemble under such circumstances.

Of course, this is by no means an exhaustive list, and these issues may be simulation-package dependent. We encourage further exploration in this area, and suggest that future work with free energy calculations should at minimum perform the above checks. Preferably, users should get used to testing different settings in their simulation packages to ensure that these are set to obtain sufficient accuracy for free energy calculations.

In addition to these issues which can be particularly important for free energy calculations, there are a lot of choices to be made in setting up your system for simulation that also need to be made in a sensible way. Other issues to consider include such as protonation states, parameters for proteins and small molecules, missing residues in protein structures, simulation box sizes, and so on.

## Running Nonequilibrium Simulations

Although all the methods discussed here require systems to be at equilibrium, there are free energy methods where some thermodynamic variables change over time, and some amount of work, W, is required to make this change. In accordance with basic thermodynamic principals, if the change is done infinitely slowly, the process is considered reversible and the work will be equal to the free energy difference. Since we can't run for an infinite amount of time, W will not equal the free energy change and the process will not be reversible. Despite this, it is still possible to write out a free energy change in a formalism developed by Jarzynski[3] where the average over the nonequilibrium trajectories which started from an equilibrium ensemble can give you the free energy; The equation is,

$\Delta G = -\beta^{-1}\ln\left\langle\exp\left(-\beta W\right)\right\rangle_0$.

If the switch is instantaneous, then this equation is reduced to EXP as W reduces to $\Delta U_{ij}$. Although this equation is rooted in EXP, it is possible to construct a variant of BAR to do nonequilibrium work, but not MBAR.[4][5] There are a number of studies available for nonequalibrium methods, and it looks as though starting from an equilibrium ensemble is just as if not slightly more efficient than starting from nonequilibrium ensembles.[6][7]

In general, it is not recommended for beginners to work with nonequilibrium simulations as there are a number of intricacies involved with setting them up. Even so, there is substantial work in developing this field as it may help improve the efficiency of free energy calculations in general.[8]

# Collecting from States of Interest

This may seem like a unnecessary point, but it is worth mentioning. The main purpose of stating this is to draw attention to the sensitivity of parameters you choose to define your "end state" at. If a given set of parameters is $\lambda$ dependent, then the total change in free energy could well be very large. As such, one needs to ensure that the changes they are inducing to a molecule are as few as possible, or at the very least as efficient as possible.

Take for instance treating your simulation's long range electrostatics with particle mesh Ewald (PME). Under non-alchemical transformations, there are several "standard" choices for the parameters which are perfectly acceptable for most simulations. However, if you use this "standard" set when modifying partial charges, you can get significant energy differences up to 4 kcal/mol on some molecules. This just shows one example of how parameters that are considered less important in regular simulations are now important in free energy simulations. The general rule to keep in mind for this is: if the potential energy is changed by the change of a parameter, then dependence of free energy on this parameter should be checked.

# Independent and Uncorrelated Samples

The samples must be independent, meaning they are uncorrelated in time; this is a basic assumption of all the theories presented here in the fundamentals section. However, for all but the simplest of systems, completely independent samples can be very difficult to generate. For protein-ligand binding affinities, the time scale for some motions may be 100s of ns, meaning truly uncorrelated samples may be impossible to generate in a reasonable amount of time with today's simulation technology. In this case, free energy calculations might provide some useful information, but will only be approximations to the correct free energy for that model and cannot be considered reliable.

There are methods for subsampling simulated states and finding correlation times to work around the independent sampling problem, but those have been relegated to their own section.

# References

1. Fujitani, H., Tanida, Y., Ito, M., Shirts, M. R., Jayachandran, G., Snow, C. D., Sorin, E. J., and Pande, V. S. (2005) Direct calculation of the binding free energies of FKBP ligands. J. Chem. Phys. 123, 084108. - Find at Cite-U-Like
2. Smith, L. J., Daura, X., and van Gunsteren, W. F. (2002) Assessing equilibration and convergence in biomolecular simulations. Proteins: Struct., Funct., Bioinf. 48, 487–496. - Find at Cite-U-Like
3. Jarzynski, C. (1997) Nonequilibrium equality for free energy differences. Phys. Rev. Lett 78, 2690–2693. - Find at Cite-U-Like
4. Crooks, G. E. (2000) Path-ensemble averages in systems driven far from equilibrium. Phys. Rev. E 61, 2361–2366. - Find at Cite-U-Like
5. Shirts, M. R., Bair, E., Hooker, G., and Pande, V. S. (2003) Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods. Phys. Rev. Lett 91, 140601. - Find at Cite-U-Like
6. Oostenbrink, C., and van Gunsteren,W. F. (2006) Calculating zeros: Non-equilibrium free energy calculations. Chem. Phys. 323, 102–108. - Find at Cite-U-Like
7. Oberhofer, H., Dellago, C., and Geissler, P. L. (2005) Biased Sampling of Nonequilibrium Trajectories: Can Fast Switching Simulations Outperform Conventional Free Energy Calculation Methods? J. Phys. Chem. B 109, 6902–6915. - Find at Cite-U-Like
8. Pohorille, A., Jarzynski, C., and Chipot, C. (2010) Good Practices in Free-Energy Calculations. J. Phys. Chem. B 114, 10235–10253. - Find at Cite-U-Like