GROMACS 4.6 example: Direct ethanol solvation free energy

From AlchemistryWiki
Revision as of 23:40, 2 January 2013 by Michael Shirts (talk | contribs) (Created page with "{{fundamentals | cTopic=Examples}} ==Setting up the calculation with Gromacs== There are nine intermediate states defined to perform the calculation of the solvation of ethanol...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search



Setting up the calculation with Gromacs

There are nine intermediate states defined to perform the calculation of the solvation of ethanol. You will need to run nine different simulations. The will use the SAME .gro file (File:Ethanol.gro) , the SAME .top file, (File:Ethanol.mdp) and nine different .mdp (File:Ethanol.mdp) files. But the .mdp files will only be different by one line,

  init_fep_state = X

Where X is 0 through 8, inclusive, because there are 9 states.

Running the calculation with Gromacs

Run grompp and mdrun as normal. Specifically:


  grompp -f ethanol.X.mdp -c ethanol.gro -p ethanol.top -o ethanol.X.tpr -p $top -o $tpr -maxwarn 10

There may be some warnings, and you'll need to override, hence -maxwarn. X runs from 0 to 8.

The number of threads can be anything you want (less than the number run on the cluster) If running on the cluster, you will also want to set the -nopin flag. If you don't set the dhdl file independently, it will be saved to ethanol.X.xvg, and might get written over accidentally if you run g_energy. X again runs 0-8

    mdrun -nt 1 -deffnm ethanol.X -dhdl ethanol.X.dhdl.xvg

Strictly, you will only need the dhdl files, but looking at the others output files can be useful to determine


Analyze the calculation with Gromacs

After you have generated the dhdl files, you can then analyze them using pymbar with the dhdl output.

The correct invocation is:

  python alchemical-gromacs.py -f directory/prefix -t 300 -p 1 -n 2500 -v  > outpufile

-t is temperature -p is pressure -f is (prefix of the files, including directory)


It will use all files it finds with the given prefix, and it will assume they are numbered in order. We omit the first 500 ps (0.2 ps per sample) for equilibration (likely overkill for this problem, but better save than sorry).


You may see the following warnings:

  Warning on use of the timeseries module: If the inherent timescales of the system are long compared to those being analyzed, 
  this statistical inefficiency may be an underestimate.  The estimate presumes the use of many statistically independent samples
  Tests should be performed to assess whether this condition is satisfied.   Be cautious in the interpretation of the data


This is a standard warning -- the correlation time code will be an underestimate if there are correlation times that are long with respect to the simulation time. In this case, there aren't. With proteins, there probably are.

  /Users/mrshirts/work/papers/MBARFORDUMMIES/pymbar/trunk/pymbar/pymbar.py:308: RuntimeWarning: overflow encountered in exp
   log_f_R = - max_arg_R - numpy.log( numpy.exp(-max_arg_R) + numpy.exp(exp_arg_R - max_arg_R) ) - w_R
   /Users/mrshirts/work/papers/MBARFORDUMMIES/pymbar/trunk/pymbar/pymbar.py:494: RuntimeWarning: overflow encountered in exp
    fR =  1/(1+numpy.exp(w_R - C))

These warnings can pop up because we are using BAR initialization in the algorithm, which is a bit faster (not much). It will default to zeros, which is fine. As long as the MBAR calculation converges OK, these warnings can be ignored.

Understanding the analysis