Difference between revisions of "Test System Repository"

From AlchemistryWiki
Jump to navigation Jump to search
 
(40 intermediate revisions by 4 users not shown)
Line 4: Line 4:
 
To join a mailing list for a discussion of protein-ligand binding benchmarks, email michael.shirts at virginia.edu.  If you have signed up previously, you can log into the discussion (password protected) at https://collab.itc.virginia.edu/portal/xlogin
 
To join a mailing list for a discussion of protein-ligand binding benchmarks, email michael.shirts at virginia.edu.  If you have signed up previously, you can log into the discussion (password protected) at https://collab.itc.virginia.edu/portal/xlogin
  
= Minimum Content of a Test Set Depositions =
+
= Specifications of the content of binding benchmark tests =
 +
 
 +
''Current standards version is 0.5, dated Sept 27, 2013''
  
 
There will be three types of depositions for the binding benchmark test sets:
 
There will be three types of depositions for the binding benchmark test sets:
  
* system specifications
+
* [[System specifications]]
* potential energy results
+
* [[Potential energy results]]
* free energy results
+
* [[Free energy results]]
 
 
 
 
All tests must consist of a system specification, and at least one potential energy result from a specified software version.  After that, multiple people can contribute free energy results for the same system specification and potential energy result, or contribute potential energy results of the system for different simulation codes. They also might propose a new potential energy result based on their own preferences for simulations of the system (different cutoffs, etc).  Importantly, the "free energy results" should be an attempt to be independent of any such nonphysical approximations.
 
 
==System specification==
 
 
 
This should be enough to define the end states of the calculation.
 
It should consist in the specific input files for a standard molecular simulation package.  The specific files required will depend on the software used:
 
 
 
* GROMACS: a .gro file and a .top file
 
* AMBER: a .ptrtop file and a .crd file
 
* DESMOND: a .cms file
 
* CHARMM: ???
 
* LAMMPS: ???
 
 
Systems specifications require the same force field.  Thus, to test different force fields, separate related system specifications are required.  Ideally, they would share the same coordinates, but that might not be possible if one was looking at changes in protonation state, etc.
 
 
 
System specifications should also contain a README.txt file which provides information on how the
 
files were generated from raw input.
 
 
 
The system specification should include two sets of coordinate files: complex and solute files.  The topologies could be provides as two independent files, or some combination of files (for example, in GROMACS, one can #include files, so the two different .top files could consist almost entirely of #includes of
 
the same ligand and complex files).  The key point is both the initial coordinates and all molecular simulation parameters for both calculations are provided. Velocities may be included, but shouldn't matter.  Preference to have them not included (I'm not sure if this is possible for all codes) to avoid clutter than is not directly relevant.
 
 
Solvent molecules can either be included or not included.  If not included, it is assumed that is only to be used for implicit solvent calculations.
 
 
Files should be equilibrated at the experimental temperature and 1 atm. However, it's fine if they are equilibrated at any nearby temperature at 1 atm; as long as they can run without any additional minimization or equilibration at experimental conditions without crashing.
 
 
 
We will eventually want to have all tests cases in a number of file formats.  However, the lack of having the same input files should not limit benchmark sets being contributed.  Conversion can be performed manually or semimanually (using acepype, etc.)  It may be necessary to provide more digits of precision in parameters or
 
configurations to make it possible to convert between simulations.
 
 
System specifications from the same molecular system but including different variations--for example, with different protonation states, salt concentrations, etc., will be grouped together.
 
 
 
==Potential Energy Results==
 
 
 
A "potential energy results" submission consists in enough information to completely specify the energy of the two end states in a system specification,  given a specific file format version of the input files, and the version of the code used.  It should be NVE, and only the potential energy is reported (velocities are not reported).  Components of the potential energy should be reported whenever possible.  It should be linked to a specific system configuration.
 
 
 
* GROMACS: a .mdp file, and the zero time .edr output to human readable format.
 
* DESMOND: a .cfg file, and the .ene output
 
* AMBER: ???
 
* CHARMM: ???
 
* LAMMPS: ???
 
 
By providing full input files, any ambiguities as to how the energies were actually generated is removed. 
 
 
 
A REAMDE.txt accompanying these files should also include
 
 
 
* The exact command line options used to generate the energies, so that the process is completely repeatable
 
* Code version (if a non-standard release, should include commit version and date from the version control system)
 
* Compiler options used to compile the code (double vs. single, optimization, MPI, etc.)
 
 
 
Eventually, when we compare results between file formats version of the same system specification, the results should be consistent.
 
 
==Free Energy Results==
 
 
 
A "free energy results" data set contains all the information used to calculate the free energy of binding.  Unlike the potential energy results,
 
the goal should be a free energy that is independent of _all_ possible nonphysical parameters (vdW/Ewald cutoffs, barostat or thermostat time
 
constant, multistep integrators, etc.  So these nonphysical terms should be corrected for (or at least specificallymentioned that they are NOT corrected for).
 
 
Such a data set would include:
 
  
* All the inputs for the potential energy result (including the energies).  This could be done by specifically linking to a "potential energy result", but perhaps would be better just including a separate set of run files.
+
All tests consist of a system specification and at least one potential energy result from a specified software version.  After that, multiple people can contribute free energy results for the same system specification and potential energy result, or contribute potential energy results of the system for different simulation codes. They also might propose a new potential energy result based on their own preferences for simulations of the system (different cutoffs, etc).   Importantly, the "free energy results" should be an attempt to be independent of any such nonphysical approximations.
* Program name and version used for both simulation
 
* Dynamical information such as timestep, barostat thermostat informationAgain, this can best be provided by submitting the actual run files.
 
* NPT and NVT or NVT implicit solvent specified
 
* Method of analysis used to calculate free energy from simulation (TI, FEP, BAR/MBAR, WHAM) described, including program name and version if applicable.
 
* Length and number of simulations run
 
* Pathway used from initial to final state (alchemical (mathematical pathway described), PMF (in what variable), etc.)
 
* Final state used to connect free energies between solute and complex state (double decouple/annhiliation, testrainted end states)
 
* Any analytical correction used to correct the free energies.
 
* How statistical error bars are estimated.
 
 
Including this information would allow testing sampling methods and formalism, post-analysis methods, cutoffs, timesteps. Initially, these files will not be placed under explicit version control, though the wiki provides some level of version control.  Eventually, for example, for potential energy results, we would like to keep versions for multiple versions of code.
 
  
 
= Test Sets =
 
= Test Sets =
== [[ The Simple Small Molecule Solvation Benchmark Test Set ]] ==
+
== Small Molecule Solvation Benchmark Sets ==
  
* Small Molecule Hydration Benchmark Set 1: This test set was designed to test methods for computing hydration free energies of small molecules. It comprises a series of small molecules, parameter sets for three different software codes, and reference energies {{Cite|Paliwal2011}}.
+
* [[The Simple Small Molecule Solvation Benchmark Test Set]]: This test set was designed to test methods for computing hydration free energies of small molecules. It comprises a series of small molecules, parameter sets for three different software codes, and reference energies {{Cite|Paliwal2011}}.
 +
* [http://www.escholarship.org/uc/item/6sd403pz  FreeSolv (Mobley) Hydration Set]: This is an extensive (640+) molecule database of experimental and calculated hydration free energies for small neutral molecules. It includes GROMACS topology and coordinate files as well.
  
 
== Host-Guest Binding ==
 
== Host-Guest Binding ==
Perhaps three test cases. 
 
 
* Cucurbit[7]uril with benzene (partial charges artificially set to zero).  This tests binding of a nonpolar guest that encounters little barrier to exiting a rigid host.
 
* Cucurbit[7]uril with benzene (partial charges artificially set to zero).  This tests binding of a nonpolar guest that encounters little barrier to exiting a rigid host.
 
* Cucurbit[7]uril with guest B5 {{Cite|Moghaddam2011}}. This tests binding of a bulky cationic guest that encounters a substantial energy barrier to exiting a rigid host.
 
* Cucurbit[7]uril with guest B5 {{Cite|Moghaddam2011}}. This tests binding of a bulky cationic guest that encounters a substantial energy barrier to exiting a rigid host.
 
* Some guest binding beta-cyclodextrin.  This would test binding to a much more flexible host.
 
* Some guest binding beta-cyclodextrin.  This would test binding to a much more flexible host.
 
+
* Octa-acid with benzoic acid guest derivatives (from SAMPL4 and SAMPL5 blind prediction challenge){{Cite|Olsson2016}}.
  
 
== Protein-Ligand Binding ==
 
== Protein-Ligand Binding ==
  
The following test systems were proposed at the [[2012_Workshop_on_Free_Energy_Methods_in_Drug_Design| 2012 Workshop on Free Energy Methods in Drug Design]]. One proposal would be to include 5-10 ligands. However, we should discuss whether this many ligands are needed for numerical evaluation of methods.
+
The following test systems were proposed at the [[2012_Workshop_on_Free_Energy_Methods_in_Drug_Design| 2012 Workshop on Free Energy Methods in Drug Design]]. One proposal would be to include 5-10 ligands. However, we should discuss how many ligands are needed for numerical evaluation of methods.
  
* T4 Lysozyme, polar and apolar sites (methods should be able to get this)
+
* T4 Lysozyme, polar and apolar sites (methods should be able to get this). [[Media:Minimal.tar.gz|GROMACS format minimal set of input files]]{{Cite|Boyce2009}}. (A full set of topology/coordinate files for this set is also available, though the minimal set is probably adequate for most purposes. If desired the full set is available [https://dl.dropboxusercontent.com/u/3409095/paper_support/fullL99AM102Q.tar.gz here (511MB)])
* FKBP (rock solid, well-studied). [[Media:FKBP_AMBER_GAFF.tgz|AMBER parameterized input files in GROMACS format]]
+
* FKBP (rock solid, well-studied). Files in both GROMACS format and DESMOND-compatible .cms files, validated to give equivalent energies (up to energy calculation method differences)
 +
**  [[Media:FKBP_AMBER_GAFF.tgz|AMBER parameterized input files in GROMACS format]]
 +
**  [[Media:FKBP_desmond.tgz|The same input parameters converted into DESMOND format]]
 
* Trypsin (well studied, potential issues with sampling and charges it would be good for people to swing at)
 
* Trypsin (well studied, potential issues with sampling and charges it would be good for people to swing at)
 
* DNA gyrase (from Vertex's data collection curated by Richard Dixon).
 
* DNA gyrase (from Vertex's data collection curated by Richard Dixon).
* CCP model binding site
+
* CCP model binding site{{Cite|Rocklin2013}}. [[Media:CCP.zip|GROMACS format minimal set of input files]].
 +
* Absolute free energies - Diverse-ligands to bromodomain BRD4{{Cite|Aldeghi2016}}.  Download a complete zip from: [http://dx.doi.org/10.5281/zenodo.57131 http://dx.doi.org/10.5281/zenodo.57131].
  
 
=References=
 
=References=
Line 110: Line 45:
 
{{Cite|Paliwal2011|Paliwal, H and Shirts, M. R. (2011) An efficient method for the calculation of quantum mechanics/molecular mechanics free energies. J. Chem. Theory Comp. 7(12): 4115-4134, J. Chem. Theory Comput.|http://www.citeulike.org/group/14929/article/10029023}}
 
{{Cite|Paliwal2011|Paliwal, H and Shirts, M. R. (2011) An efficient method for the calculation of quantum mechanics/molecular mechanics free energies. J. Chem. Theory Comp. 7(12): 4115-4134, J. Chem. Theory Comput.|http://www.citeulike.org/group/14929/article/10029023}}
 
{{Cite|Moghaddam2011|Moghaddam,S., Yang,C., Rekharsky,M., Ko,Y.H., Kim,K., Inoue,Y., and Gilson,M.K. (2011) New Ultrahigh Affinity Host - Guest Complexes of Cucurbit[7]uril with Bicyclo[2.2.2]octane and Adamantane Guests: Thermodynamic Analysis and Evaluation of M2 Affinity Calculations. J.Am.Chem.Soc. 133:3570-3581.}}
 
{{Cite|Moghaddam2011|Moghaddam,S., Yang,C., Rekharsky,M., Ko,Y.H., Kim,K., Inoue,Y., and Gilson,M.K. (2011) New Ultrahigh Affinity Host - Guest Complexes of Cucurbit[7]uril with Bicyclo[2.2.2]octane and Adamantane Guests: Thermodynamic Analysis and Evaluation of M2 Affinity Calculations. J.Am.Chem.Soc. 133:3570-3581.}}
 +
{{Cite|Boyce2009|Boyce, S. E., Mobley, D. L., Rocklin, G. J., Graves, A. P., Dill, K. A. and Shoichet, B. K. (2009) Predicting ligand binding affinity with alchemical free energy methods in a polar model binding site. J. Mol. Biol. 394:747-763.}}
 +
{{Cite|Rocklin2013|Rocklin, G. J., Boyce, S. E., Fischer, M., Fish, I, Mobley, D. L., Shoichet, B. K., Dill, K. A. (2013) Blind prediction of charged ligand binding affinities in a model binding site. J. Mol. Biol. 425:4569-4583.}}
 +
{{Cite|Aldeghi2016|Aldeghi, M., Heifetz, A., Bodkin, M. J., Knapp, S., Biggin, P.C (2016).  Accurate calculation of the absolute free energy of binding for drug molecules.  Chem Sci. 7:207-218.}}
 +
{{Cite|Olsson2016| Olsson, M. A., Söderhjelm, P., Ryde U. (2016). J. Comp. Chem. 37:1589-1600.}}
 
</references>
 
</references>

Latest revision as of 10:55, 10 August 2016

Purpose of Test Sets

One of the biggest challenges to carefully validating and comparing free energy methods is defining and sharing well-defined test cases (molecular systems and force field parameters) with reliably known numerical results. If one is not sure of the value of the free energy dictated by the energy model and other physical parameters, it is impossible to make fine comparisons among methods. Additionally, different programs with different bookkeeping, or parameters that have been rounded in some way, can cause legitimate small differences between computed free energies, obscuring differences in the methods. The goal of this Repository is to help define and disseminate a stable set of test systems of varied nature and complexity for use by the free energy simulation community. Note that the free energies provided by these systems may not agree particularly well with experiment, but this is not necessary, because the purpose here is to test the numerical performance of the methods.

To join a mailing list for a discussion of protein-ligand binding benchmarks, email michael.shirts at virginia.edu. If you have signed up previously, you can log into the discussion (password protected) at https://collab.itc.virginia.edu/portal/xlogin

Specifications of the content of binding benchmark tests

Current standards version is 0.5, dated Sept 27, 2013

There will be three types of depositions for the binding benchmark test sets:

All tests consist of a system specification and at least one potential energy result from a specified software version. After that, multiple people can contribute free energy results for the same system specification and potential energy result, or contribute potential energy results of the system for different simulation codes. They also might propose a new potential energy result based on their own preferences for simulations of the system (different cutoffs, etc). Importantly, the "free energy results" should be an attempt to be independent of any such nonphysical approximations.

Test Sets

Small Molecule Solvation Benchmark Sets

  • The Simple Small Molecule Solvation Benchmark Test Set: This test set was designed to test methods for computing hydration free energies of small molecules. It comprises a series of small molecules, parameter sets for three different software codes, and reference energies [1].
  • FreeSolv (Mobley) Hydration Set: This is an extensive (640+) molecule database of experimental and calculated hydration free energies for small neutral molecules. It includes GROMACS topology and coordinate files as well.

Host-Guest Binding

  • Cucurbit[7]uril with benzene (partial charges artificially set to zero). This tests binding of a nonpolar guest that encounters little barrier to exiting a rigid host.
  • Cucurbit[7]uril with guest B5 [2]. This tests binding of a bulky cationic guest that encounters a substantial energy barrier to exiting a rigid host.
  • Some guest binding beta-cyclodextrin. This would test binding to a much more flexible host.
  • Octa-acid with benzoic acid guest derivatives (from SAMPL4 and SAMPL5 blind prediction challenge)[3].

Protein-Ligand Binding

The following test systems were proposed at the 2012 Workshop on Free Energy Methods in Drug Design. One proposal would be to include 5-10 ligands. However, we should discuss how many ligands are needed for numerical evaluation of methods.

References

  1. Paliwal, H and Shirts, M. R. (2011) An efficient method for the calculation of quantum mechanics/molecular mechanics free energies. J. Chem. Theory Comp. 7(12): 4115-4134, J. Chem. Theory Comput. - Find at Cite-U-Like
  2. Moghaddam,S., Yang,C., Rekharsky,M., Ko,Y.H., Kim,K., Inoue,Y., and Gilson,M.K. (2011) New Ultrahigh Affinity Host - Guest Complexes of Cucurbit[7]uril with Bicyclo[2.2.2]octane and Adamantane Guests: Thermodynamic Analysis and Evaluation of M2 Affinity Calculations. J.Am.Chem.Soc. 133:3570-3581.
  3. Olsson, M. A., Söderhjelm, P., Ryde U. (2016). J. Comp. Chem. 37:1589-1600.
  4. Boyce, S. E., Mobley, D. L., Rocklin, G. J., Graves, A. P., Dill, K. A. and Shoichet, B. K. (2009) Predicting ligand binding affinity with alchemical free energy methods in a polar model binding site. J. Mol. Biol. 394:747-763.
  5. Rocklin, G. J., Boyce, S. E., Fischer, M., Fish, I, Mobley, D. L., Shoichet, B. K., Dill, K. A. (2013) Blind prediction of charged ligand binding affinities in a model binding site. J. Mol. Biol. 425:4569-4583.
  6. Aldeghi, M., Heifetz, A., Bodkin, M. J., Knapp, S., Biggin, P.C (2016). Accurate calculation of the absolute free energy of binding for drug molecules. Chem Sci. 7:207-218.