Weighted Histogram Analysis Method

From AlchemistryWiki
Jump to navigation Jump to search



The Weighted Histogram Analysis Method (WHAM) is one of the earliest methods that take into account information from all Intermediate States. By analyzing all the information at once, we can reduce the number of cycles and loops me must run through, improving efficiency. The precursor to WHAM and first version of multiple histogram relighting techniques came from Ferrenberg and Swendsen;[1] WHAM was developed later for alchemical simulations[2].

WHAM works on the principal that if you have a discrete number of states, you can create a histogram with discrete bins that provide you a relative probability of observing the states of interest, assuming you create the bins along whatever reaction path you have selected. In our cases, the reaction path is the alchemical variable. From these probabilities, you can calculate free energies and other observables.

Derivation

WHAM's derivation[2] is best done by starting from visualizing the problem. If you are given [math]\displaystyle{ K }[/math] number of states, a number of run simulations, [math]\displaystyle{ S }[/math], [math]\displaystyle{ N_i }[/math] samples from each simulation, and then [math]\displaystyle{ K }[/math] free energies to calculate: [math]\displaystyle{ A_1,\,A_2,\,\cdots A_K }[/math]; we now wish to assign the reweighed potentials into bins, [math]\displaystyle{ B }[/math]. Terms common to this site's definitions page are preserved and while new terms are explained in context. Consider the core free energy difference equation between two states [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] of

[math]\displaystyle{ \displaystyle \Delta A_{ij} = -k_B T \ln \frac{Q_j}{Q_i} }[/math]

Instead of solving this directly, we will look at the the probability density, which is the ratio between the two partition functions (i.e. ignore the [math]\displaystyle{ -k_B T \ln }[/math] part). Recall that the partition function of any state is often written as

[math]\displaystyle{ \displaystyle Q_i = \int_{V_i} \Omega_i(U) \exp\left(-\beta U \right) dU }[/math],
[math]\displaystyle{ \displaystyle Q_j = \int_{V_j} \Omega_j(U) \exp\left(-\beta U \right) dU }[/math]

Where we have defined an unknown density of states for each of the thermodynamic states we are simulating, [math]\displaystyle{ \Omega_i }[/math].

However, we could generalize to arbitrary integration coordinates, not just energy, which gives us:

[math]\displaystyle{ \displaystyle Q_i = \int_{V_i} \Omega_i(\vec{q}) \exp\left(-\beta U(\vec{q}) \right) d\vec{q} }[/math],
[math]\displaystyle{ \displaystyle Q_j = \int_{V_j} \Omega_j(\vec{q}) \exp\left(-\beta U(\vec{q}) \right) d\vec{q} }[/math]

[math]\displaystyle{ \vec{q} }[/math] could be coordinates, in which case [math]\displaystyle{ \Omega_i(\vec{q}) = 1 }[/math], but more usually it has only one or two dimensions.

If we assume discrete states, with counts in each bin [math]\displaystyle{ B }[/math], we can write the density of states out for a given simulation, [math]\displaystyle{ i }[/math] as:

[math]\displaystyle{ \displaystyle \Omega_i(\vec{q},\lambda_i) = B_i(\vec{q},\lambda_i)\exp\left[\left(\sum_{j=0}^K \beta_i \lambda_{j,i} U(\vec{q}_j) \right)-\beta_i A_i\right] }[/math]

where the set of states available during simulation [math]\displaystyle{ i }[/math] is [math]\displaystyle{ \left\{\lambda\right\}_i = \left\{\lambda_1, \lambda_2, \dots \lambda_K \right\}_i = \left\{\lambda_{1,i}, \lambda_{2,i}, \dots \lambda_{K,i} \right\} }[/math] and [math]\displaystyle{ B_i }[/math] is the value of the histogram bin [math]\displaystyle{ i }[/math] evaluated at [math]\displaystyle{ \vec{q} }[/math] and [math]\displaystyle{ \lambda_i }[/math]. The best estimate for the density of states is then

[math]\displaystyle{ \displaystyle \sum_{i=1}^S \omega_i(\vec{q})\Omega_i(\vec{q},\lambda_i) }[/math]
[math]\displaystyle{ \displaystyle \sum_{i=1}^S \omega_i(\vec{q})=1 }[/math]

The best estimators for [math]\displaystyle{ \omega_i }[/math] are the ones that minimize the statistical noise. It turns out then, that the best estimator is the one that minimize the error of [math]\displaystyle{ B_i }[/math] for all [math]\displaystyle{ i }[/math]. The error of any individual [math]\displaystyle{ B_i }[/math] is then

[math]\displaystyle{ \delta^2 B_i = g_i\left\langle B_i \right\rangle }[/math]

where [math]\displaystyle{ g_i = 1+2\tau_i }[/math] and [math]\displaystyle{ \tau_i }[/math] is the correlation time of a given simulation. The best estimator for the expectation of the bin value is then

[math]\displaystyle{ \displaystyle \hat{\left\langle B_i \right\rangle} = N_i\Omega\exp\left(\beta_i A_i - \beta\sum_{j=0}^K \lambda_{j,i} U(\vec{q}_j) \right) }[/math].

Please note that [math]\displaystyle{ \Omega }[/math] does not have a subscript in the previous equation, as it is now the density of states determined from all of the simulations.

From here you can just substitute back in to estimate [math]\displaystyle{ /Q_i }[/math] to get the final WHAM equation of

[math]\displaystyle{ \displaystyle \hat{Q_i} = \sum_{j=1}^K \frac{\sum\limits_{x=1}^S g_x^{-1} B_x(\vec{q},\lambda_j) \exp\left[-\beta\lambda_0\vec{q}_0 - \beta\sum\limits_{l=1}^K \lambda_l U_l(\vec{q}) \right]}{\sum\limits_{y=1}^S g_y^{-1} N_y \exp\left[\beta A_y - \beta\lambda_0\vec{q}_0 - \beta\sum\limits_{m=1}^K \lambda_{m,y} U_m(\vec{q}) \right]} }[/math]

where the nought subscript indicates the conditions at the unmodified state. It is then trivial to get a free energy of this state. This is only a relative free energy though as the system must be self-consistently solved. It is common to set one of the states to a free energy of zero so you can calculate differences in free energy between states.

Zero Width Bins

It is possible to take the bin width to zero in the WHAM formula. Although this is a limiting case, it does clean up the equation a bit and looks like

[math]\displaystyle{ \displaystyle \hat{A_i} = - \beta^{-1}\ln \sum_{k=1}^K \sum_{n=1}^{N_k} \frac{ \exp[-\beta U_i(\vec{q}_{kn})]}{\sum\limits_{k'=1}^K N_{k'} \, \exp[\beta A_{k'} - \beta U_{k'}(\vec{q}_{kn})]} }[/math]

which turns out to be the exact equation for MBAR.[3]

Usage of WHAM

Given a particular implementation of WHAM (see below), one may be tempted just to let the analysis give you a black-box result. You should always understand what is happening and so the results should not be taken on blind faith alone.

Since WHAM collects data from all states, you will need to calculate [math]\displaystyle{ \Delta U_{k,j}(\vec{q}) }[/math] for all [math]\displaystyle{ j=1,2,\cdots,K }[/math] states. Even though [math]\displaystyle{ \Delta U_{k,k}(\vec{q}) }[/math] does not need to be calculated since it should be zero, it is highly recommended that you do. This lets you check to see if the re-evaluation function is working as intended, and help you identify possible errors. Although this does not tell you what the error is, it will at least tell you there is one. One possible source of error is the output of your coordinate files should be higher accuracy than a standard PDB files, [math]\displaystyle{ 10^5 }[/math]Å may be sufficient but binary format coordinates are preferred.

Because you are applying a discrete set and finite number of bins, there is a bias introduced since all variables must fall into the bins. This is the predominate problem with WHAM and everything that comes with it, so exercise caution. Error estimates are not directly available for WHAM, and so methods such as bootstrap sampling are required.

Downloading WHAM

Because WHAM requires solving sets of non-linear equations, it is not recommended for beginners to write their own. Fortunately, many simulation packages, such as GROMACS and CHARMM already include WHAM driven free energy calculations.[4][5][6] There are also | also other standalone packages available and so there should not be a need to write your own.

References

  1. Ferrenberg, A. M., and Swendsen, R. H. (1989) Optimized Monte Carlo Data Analysis. Phys. Rev. Lett 63, 1195–1198. - Find at Cite-U-Like
  2. 2.0 2.1 Kumar, S., Bouzida, D., Swendsen, R. H., Kollman, P. A., and Rosenberg, J. M. (1992) The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. J. Comput. Chem. 13, 1011–1021. - Find at Cite-U-Like
  3. Shirts, M. R., and Chodera, J. D. (2008) Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 129, 129105. - Find at Cite-U-Like
  4. Hub, J. S., de Groot, B. L., and van der Spoel, D. (2010) g_wham–A Free Weighted Histogram Analysis Implementation Including Robust Error and Autocorrelation Estimates. J. Chem. Theo. Comput. 6, 3713–3720. - Find at Cite-U-Like
  5. Souaille, M., and Roux, B. (2001) Extension to the weighted histogram analysis method: combining umbrella sampling with free energy calculations. Comput. Phys. Commun. 135, 40–57. - Find at Cite-U-Like
  6. Wang, J., Deng, Y., and Roux, B. (2006) Absolute Binding Free Energy Calculations Using Molecular Dynamics Simulations with Restraining Potentials. Biophys. J. 91, 2798–2814. - Find at Cite-U-Like