2014 Workshop Monday Afternoon Discussion

From AlchemistryWiki
Jump to navigation Jump to search


Topic: Pipelines and workflow

Christopher Bayly and Thomas Woolf moderating

Bayly anecdote:

  • had designed workflow that used MD and tried to deploy it across sites
  • there was rebellion!
  • a key step was to look at trajectories by eye, but had trouble encoding this into an automated workflow
  • there was a desire to add in all sorts of other checks

"Things should be made as simple as possible, but no simpler"

Want to hear from method developers and users

Mobley: Another key factor is not giving people enough rope to hang themsleves if they're not ready for it. We need to also be able to automate particular failure mechanisms that will stop short of giving wrong answers.

Bayly: Need to protect users from themselves.

Chodera: Are there steps that cannot easily be automated, or is everything automatable via some specified best practices?

Teng Lin: We need more data. For example, ring growth is probably much harder than atom addition. Corporate-provided or community dataset.

Mobley: Three categories:

  1. things that will never work
  2. things that will work if you try hard enough
  3. things that will pretty much always work.

We ought to be able to codify things that the tool could check for: missing residues, missing atoms, reasonable ionic strength. Other things that we can't fix are having the wrong ligand. Middle case is where things are more or less efficient, but we don't have a lot of data on that. But we should avoid things that should never work; efficiency is a separate issue.

Patrick Grinaway: A grid search could be done by methods developers to try various things on a common test set to see if things will or will not fail with automated parameters.

Enrico Purisima: Well-defined checklist is more immediately doable. Any mission critical task should have a well-defined checklist.

JW: This have to be an iterated process where developers work closely with end users to see what kinds of difficulties end users are having. Rapid cycles of iterated optimization will help.

David: How about Schrödinger? Can we have you compile a checklist of common problems to share with the community?

Robert: Yes. Checklist needs to be based on needs of people running calculation.

Neale: Everybody has their own preferred way of doing things.

Ross: People might want to share their pipelines.

Michael: We've been trying to pull together Best Practices on the alchemistry.org website. There will be differences, but if we can get this all in one central place, we can hash out issues. Not sure everyone has enough resources to build a complete pipeline. Schrödinger has had enough resources in recent years to make great inroads into this. Academic groups don't have enough critical mass.

?: Are FEP methods robust enough to use? We need sensitivity tests to see if we deviate from checklist, how much the results will change. We need the iPhone of free energy methods that take decisions out of the hands of novice users.

Pat Walters: If we get something that works, we're happy to share. We need some place with worked examples---all the files for something that works.

Robert Abel: Not uncommon to see people do things like protonate basic amines, since in non-physics-based methods, this may not matter. But for physics methods, this may be a huge difference.

Patrick Grinaway: We need to come up with a good test set of challenging systems, since there are challenges that may be unique to certain kinds of systems.

Benoît Roux: Some of our assumptions about where ions are, binding mode, etc., if incorrect, can be overcome with sufficient sampling. For example, in kinases, most important loop in binding site is unresolved. How do you automatize this? Otherwise, you could just start from amino acid sequence. Where do we draw the line?

?: We should try to distinguish issues that are universal for structure-based methods and problems specifically associated with FEP. For any structure-based method, we need to be careful to connect system being modeled to the experimental system. With FEP, we have additional issues that given the right starting conditions/structures, what is the best way to implement calculations.

Clara Christ: What are the benefits of automation? Manual work is error-prone. Automation avoids errors that creep in by hand.

David Mobley: Should mention some pipeline efforts already in progress: Schrödinger workflow for FEP Mapper; Mobley lab has an internal tool for gromacs for doing FEP Mapper style things. Julien Michel's group is working on something similar for OpenMM and AMBER. Would take a simulation you have prepped for MD and automatically generate files for relative free energy calculations.

Benoît Roux: CHARMM GUI has been doing this for at least two years.

?: Come back to the checklist idea. Want to make sure that you're using the right structure for the PDB (wild-type vs mutant). Decision-making tree: Starting with a PDB, decide if we can use FEP or some other method; what decisions we need to make to get to result.

?: For a pipeline to be sustainable, there has to be demand for it at one end. Is demand the computational chemists, or the chemists themselves? One simple approach we've done for collaborations is to log every single idea that has ever been generated for a project. On one project, have logged 10,000 compounds! Very informative, since it gives us an idea of what kinds of questions chemists are asking, what challenges need to be overcome, etc.

Clara Christ: I would very much doubt that end user would be a chemist.

JW: Chemists don't want to run programs---the want to make compounds.

Pat: Word processor analogy. Handing me a work processor that misspells half the words! Nowhere close to having something that you'd want to give to chemists immediately.

Paul Czdrowski: Would love to have virtual machine with everything already set up. "I don't want to compile anymore."

J-F Trunchon: Because we were eyeballing trajectories for analysis, it was mostly our brains doing the work. Now, there are better tools to help prepare these systems. Big hurdle to get systems to do calculations on a day to day basis. Expert knowledge of how long we need to run calculation to make phenyl change, etc., how do we assess what type of calculation---these might be automatable.

Michel Cuendet: Methods of the sort that Gianni de Fabritiis described will be more amenable to large userbases in a few years---just toss the ligand in the box and simulate!

Christopher Bayly summary:

  • avoid things that we know will fail
  • different subproblems
  • test programs / tutorials we can run right away
  • detailed checklists