May 11, 2020 Open Force Field Initiative Advisory Board Meeting
Posted on 11 May 2020 by Karmen Condic-Jurkic
The first meeting of the OpenFF Initiative Advisory Board took place on May 11, 2020. The meeting notes are summarized below:
Data collection and benchmarking
- As we push into biopolymer FFs, what data should we be using to fit force fields?
- We are in the process of porting AMBER ff14sb into SMIRNOFF format.
- We have a plan for making an “AMBER-like” protein force field using the current small molecule bond/angle/vdw, with an AMBER protein charge model, then fitting torsions.
- What data should we use to evaluate the quality of proteins/nucleic acids? Data for other molecules (lipids/carbohydrates - though those will likely be tackled next year)?
- High-quality protein-ligand binding affinity datasets with rigid ligands
- Conformational decoys? (What experimental datasets inform this?)
- Similar sequences but different folds? (e.g. “Paracelsus Challenge”)
- Peptide data: NMR data on cyclic peptides
- Scott Lokey: logD and NMR data for cyclic peptides
- Check with Alan Mark on automated protein benchmark set
What to consider with respect to water models?
- TIP3P is likely to cause problems. OPC or OPC3 might be good candidates. PC3/TIP4P-Ew performed well, according to Onufriev’s study.
- Import data used in fitting TIP3P-FB; ultimately co-optimize water model
- Important properties:
- Hydration free energies insensitive to water model (but entropy/enthalpy decomposition is); polarization issues
- Diffusion coefficient would likely be more important, water model dependent
- Enthalpy of vaporization: Difficult property to reproduce; unclear where discrepancy arises from (e.g. quantum dynamical effect)
- Mixing properties: Enthalpies of mixing; partial molar volumes
- Partition coefficients: OCHEM (large database of octanol-water partition coefficients and other properties) ~ 20K [follow up with Sereina Riniker]; however, significant cancellation. Also OCHEM has a lot of license terms (OK for non-commercial, but otherwise a lot of restrictions: https://ochem.eu/home/show.do)
- Solvation free energies in solvents other than water
- Don’t neglect charged molecules (relative values, not absolute)
Interactions with the biomolecular force field community
- How can we best collaborate with the biopolymer FF development communities and provide/share resources (software, data, infrastructure)?
- Making it easy for novice Python programmers to extend the OFF toolkit – at the moment, understanding how various bits and pieces of OpenFF work together is a steep learning curve, although modifying code seems to be quite straightforward.
- Support for new parameter types / functional forms
- How best can we share curated standard train/test datasets? We want to make it programmatically accessible, but what other formats are good too?
- Flexibility – everyone has their own favourite way of dealing with data
- Data accessible with an API to grab data in batches
- Offer bulk download
- Identifiers within dataset are critical (names, SMILES, CAS, InChI, etc.)
- How can we best communicate with biopolymer FF development communities?
- Try attending community user group meetings
- Paris CHARMM / Tinker-HP/AMOEBA meeting is a great model
- Support for different functional forms?
- e.g. site-specific multipoles
What functional forms and other new FF components should we aim to support to enable future accurate FF development?
- Is there a way to work with MD package developers to get support for these into their packages?
- Show package developers persuasive data that the force field is valuable to motivate implementation of new functional forms (especially protein-ligand binding energies) – demonstrate that it works
- What about a library that provides advanced potentials until they can be implemented internally, like OpenKIM?
- This could make it easy for developers to try new forces with minimal effort
- How can we best support Quantum-trained ML potentials? Other ML potentials?
- TensorFlow support (for ANI) already integrated into gromos
How can we best streamline the deployment of our force fields into simulation packages?
- Should we focus on existing converters (e.g. ParmEd/InterMl), object models (an OFF/MolSSI common system object model?), libraries to read common object model serialization that other codes can plug into?
- Low-level converters for parameterized systems (like ParmEd) are likely to be the most successful
- New system construction tools (especially if broadly deployed, e.g. via CHARMM-GUI) could have impactful
- CanDo - Chris Schafmeister - “this is what people are going to use”
- Currently, we’re aiming to build a common biopolymer + small molecule Topology object model that will ideally replace/augment OpenMM or other force field representations, Topology objects.
- Probably out of scope of what OpenFF is developing are general trajectory representations, but we plan to work with other developers on these to help provide input/assistance.
- Ditto with any replacements for Molecule SDF/mol2/PDB files: out of scope for our effort, but we would like to be involved in larger efforts.