open force field

An open and collaborative approach to better force fields

May 11, 2020 Open Force Field Initiative Advisory Board Meeting

Posted on 11 May 2020 by Karmen Condic-Jurkic

The first meeting of the OpenFF Initiative Advisory Board took place on May 11, 2020. The meeting notes are summarized below:

Data collection and benchmarking

  • As we push into biopolymer FFs, what data should we be using to fit force fields?
    • We are in the process of porting AMBER ff14sb into SMIRNOFF format.
    • We have a plan for making an “AMBER-like” protein force field using the current small molecule bond/angle/vdw, with an AMBER protein charge model, then fitting torsions.
  • What data should we use to evaluate the quality of proteins/nucleic acids? Data for other molecules (lipids/carbohydrates - though those will likely be tackled next year)?
    • High-quality protein-ligand binding affinity datasets with rigid ligands
    • Conformational decoys? (What experimental datasets inform this?)
      • Similar sequences but different folds? (e.g. “Paracelsus Challenge”)
    • Peptide data: NMR data on cyclic peptides
      • Scott Lokey: logD and NMR data for cyclic peptides
    • Check with Alan Mark on automated protein benchmark set

What to consider with respect to water models?

  • TIP3P is likely to cause problems. OPC or OPC3 might be good candidates. PC3/TIP4P-Ew performed well, according to Onufriev’s study.
  • Import data used in fitting TIP3P-FB; ultimately co-optimize water model
  • Important properties:
    • Hydration free energies insensitive to water model (but entropy/enthalpy decomposition is); polarization issues
    • Diffusion coefficient would likely be more important, water model dependent
    • Enthalpy of vaporization: Difficult property to reproduce; unclear where discrepancy arises from (e.g. quantum dynamical effect)
    • Mixing properties: Enthalpies of mixing; partial molar volumes
    • Partition coefficients: OCHEM (large database of octanol-water partition coefficients and other properties) ~ 20K [follow up with Sereina Riniker]; however, significant cancellation. Also OCHEM has a lot of license terms (OK for non-commercial, but otherwise a lot of restrictions:
    • Solvation free energies in solvents other than water
    • Don’t neglect charged molecules (relative values, not absolute)

Interactions with the biomolecular force field community

  • How can we best collaborate with the biopolymer FF development communities and provide/share resources (software, data, infrastructure)?
    • Making it easy for novice Python programmers to extend the OFF toolkit – at the moment, understanding how various bits and pieces of OpenFF work together is a steep learning curve, although modifying code seems to be quite straightforward.
    • Support for new parameter types / functional forms
  • How best can we share curated standard train/test datasets? We want to make it programmatically accessible, but what other formats are good too?
    • Flexibility – everyone has their own favourite way of dealing with data
    • Data accessible with an API to grab data in batches
    • Offer bulk download
    • Identifiers within dataset are critical (names, SMILES, CAS, InChI, etc.)
  • How can we best communicate with biopolymer FF development communities?
    • Try attending community user group meetings
    • Paris CHARMM / Tinker-HP/AMOEBA meeting is a great model
  • Support for different functional forms?
    • e.g. site-specific multipoles

What functional forms and other new FF components should we aim to support to enable future accurate FF development?

  • Is there a way to work with MD package developers to get support for these into their packages?
    • Show package developers persuasive data that the force field is valuable to motivate implementation of new functional forms (especially protein-ligand binding energies) – demonstrate that it works
  • What about a library that provides advanced potentials until they can be implemented internally, like OpenKIM?
    • This could make it easy for developers to try new forces with minimal effort
  • How can we best support Quantum-trained ML potentials? Other ML potentials?
    • TensorFlow support (for ANI) already integrated into gromos

How can we best streamline the deployment of our force fields into simulation packages?

  • Should we focus on existing converters (e.g. ParmEd/InterMl), object models (an OFF/MolSSI common system object model?), libraries to read common object model serialization that other codes can plug into?
    • Low-level converters for parameterized systems (like ParmEd) are likely to be the most successful
    • New system construction tools (especially if broadly deployed, e.g. via CHARMM-GUI) could have impactful
    • CanDo - Chris Schafmeister - “this is what people are going to use”
  • Currently, we’re aiming to build a common biopolymer + small molecule Topology object model that will ideally replace/augment OpenMM or other force field representations, Topology objects.
  • Probably out of scope of what OpenFF is developing are general trajectory representations, but we plan to work with other developers on these to help provide input/assistance.
  • Ditto with any replacements for Molecule SDF/mol2/PDB files: out of scope for our effort, but we would like to be involved in larger efforts.