Support for open source cheminformatics toolkits is coming!
Posted on 30 Sep 2018 by John Chodera
We’ve been hard at work with a large-scale infrastructure overhaul of our main openforcefield toolkit following the Open Force Field Toolkit 0.1.0 prerelease, which provided an initial reference implementation of the SMIRNOFF 0.1 specification for direct chemical perception. Many thanks to all the users who provided helpful feedback in how we could improve overall usability!
While we expect the updated toolkit to be released very soon, we wanted to provide a summary of the coming changes to give you an idea of what we’ve been up to:
Support for fully open source cheminformatics toolkits
While the Toolkit 0.1.0 prerelease required the OpenEye Toolkits to provide facilities for molecule reading/writing, chemical perception via SMARTS, conformer generation, and partial charge assignment, we have been working hard to allow fully open cheminformatics toolkits (such as The RDKit) to be easily used instead.
We are introducing Toolkit Wrappers, which provide a simple, unified API to multiple toolkits to expose the cheminformatics features required by the openforcefield toolkit.
These features include:
- Reading and writing molecules from standard file formats (Tripos mol2, SDF, PDB, etc.)
- Conversion to/from SMILES strings
- Enumeration of tagged-atom SMARTS (SMIRKS) matches within chemical structures
- Partial charge assignment with corrected semiempirical charge assignments (e.g. AM1-BCC)
Initially, we are adding support for the following toolkits:
- The OpenEye Toolkit will continue to be supported, though a license is required for their use
- The RDKit, which provides support for molecule reading/writing, conversion to/from SMILES, and SMARTS matching, but does not provide AM1-BCC charge assignment capabilities
- The AmberTools for assigning AM1 and AM1-BCC charges via
Initially, this means that the RDKit and AmberTools together can provide complete functionality for assignment of SMIRNOFF parameters for small molecules.
We will include automatic installation of both of these toolkits as part of the next
conda release of the openforcefield toolkit.
A new draft SMIRNOFF 1.0 specification
While the SMIRNOFF 0.1 specification was a significant step forward in automated chemical perception based force fields, it was incomplete and inconsistent in some ways that would hinder its growth. This release will include a new backwards-incompatible draft SMIRNOFF 1.0 specification that will support rapid improvements in force field accuracy over the next year, featuring multiple improvements:
- Aromaticity model now defaults to
OEAroModel_MDL, and aromaticity model names drop OpenEye-specific prefixes
- Top-level tags are now required to specify units for any unit-bearing quantities to avoid the potential for mistakes from implied units.
- Potential energy component definitions were renamed to be more general:
<NonbondedForce>was renamed to
<HarmonicBondForce>was renamed to
<HarmonicAngleForce>was renamed to
<BondChargeCorrections>was renamed to
<ChargeIncrementModel>and generalized to accommodate an arbitrary number of tagged atoms
<GBSAForce>was renamed to
<PeriodicTorsionForce>was split into
<vdW>now specifies 1-2, 1-3, and 1-4 scaling factors via
scale13(default: 0), and
scale14(default: 0.5) attributes. Coulomb scaling parameters have been removed from
- Added the
<Electrostatics>tag to separately specify 1-2, 1-3, and 1-4 scaling factors for electrostatics, as well as the method used to compute electrostatics (
Coulomb) since this has a huge effect on the energetics of the system.
- Made it clear that
<Constraint>entries do not have to be between bonded atoms.
<VirtualSites>has been added, and the specification of charge increments harmonized with
potentialattribute was added to most forces to allow flexibility in extending forces to additional functional forms (or algebraic expressions) in the future.
potentialdefaults to the current recommended scheme if omitted.
<GBSA>now has defaults specified for
- Changes to how fractional bond orders are handled:
- Use of fractional bond order is now are specified at the force tag level, rather than the root level
- The fractional bond order method is specified via the
- The fractional bond order interpolation scheme is specified via the
- Section heading names were cleaned up.
- Example was updated to reflect use of the new
- Eliminated “Requirements” section, since it specified requirements for the software, rather than described an aspect of the SMIRNOFF specification
- Fractional bond orders are described in
<Bonds>, since they currently only apply to this term.
The current draft SMIRNOFF 1.0 specification can be found here: Draft SMIRNOFF 1.0 specification
Specifying the system to parameterize with SMIRNOFF previously required passing along a set of OpenEye Toolkit
To support the migration toward broader support for fully open source toolkits, we have introduced two new classes intended to aid the reading and manipulation of biomolecular systems for parameterization:
Molecule: A simple pure Python class for manipulating molecules and interconverting with different toolkits
Topology: A simple pure Python class for manipulating collections of molecules to simplify application of parameters
These classes aim to simplify reading chemical descriptions of systems from several popular formats to generate parameterized molecular systems.
For example, to read small molecule from an RDKit
# Create an openforcefield Molecule object from an RDKit Mol molecule molecule = Molecule.from_rdkit(rdmol)
or from a file
# Create an openforcefield Molecule object from a Tripos mol2 file using whatever toolkit(s) are available from openforcefield.topology import Molecule molecule = Molecule.from_file(mol2_filename)
Topology object from one or more
Molecule objects is easy:
# Create an openforcefield Topology object consisting of a single openforcefield Molecule from openforcefield.topology import Topology topology = Topology.from_molecules(molecule)
While we have not yet added biomolecular force fields in SMIRNOFF format, reading of PDB files containing small molecules will also be straightforward if your small molecules is also available (with any atom ordering) in some format that can be read from a file (such as a Tripos mol2 file):
# Create an openforcefield Topology object from a PDB file and Tripos mol2 files defining each component of the PDB file from simtk.openmm import app pdbfile = app.PDBFile(pdb_filename) unique_molecules = [ Molecule.from_file(mol2_filename) for mol2_filename in mol2_filenames ] topology = Topology.from_openmm(pdbfile.topology, unique_molecules=unique_molecules)
Topology will also support serialization to a standard that will make data interchange for biomolecular systems easier.
A new SMIRNOFF reference implementation (
The original SMIRNOFF 0.1 reference implementation (
openforcefield.typing.engines.ForceField) followed the OpenMM
simtk.openmm.app.ForceField API very closely.
As we seek to make the toolkit less OpenMM-centric and easier to use with various molecular simulation packages directly, we have refined its API to make this process easier, in addition to conforming with Python PEP8 naming standards.
Generation of parameterized OpenMM
System objects is still easy:
# Create an OpenMM System object from the openforcefield Topology object from openforcefield.typing.engines import ForceField forcefield = ForceField('smirnoff99frosst') openmm_system = forcefield.create_openmm_system(topology)
but now you will also be able to create input files for other packages easily via ParmEd (which will also be automatically installed with the
# Generate a ParmEd Structure object from the openforcefield Topology object, attaching positions parmed_structure = forcefield.create_parmed_structure(topology, positions) # Write AMBER input files parmed_structure.save('amber.inpcrd') parmed_structure.save('amber.prmtop') # Write gromacs input files parmed_structure.save('gromacs.gro') parmed_structure.save('gromacs.top') # Write CHARMM input files parmed_structure.save('charmm.psf') parmed_structure.save('charmm.crd')
Improved toolkit documentation and examples
A preview of the updated toolkit documentation can be found here. Throughout the documentation, numerous examples are provided to make it easy for users to simply copy-and-paste simple solutions to common tasks.