Interchange project update
Posted on 10 Nov 2021 by Matt Thompson
Thus far, the only simulation engine natively supported by export from the OpenFF software stack has been OpenMM. OpenMM offers great performance on GPUs and excellent flexibility via its API, but many scientists wish to use tools that are only compatible with other molecular simulation engines, like AMBER and GROMACS. There are tools that can convert OpenMM objects to file formats understood by these other engines, but these steps can have rough edges and add complexity to users’ workflows. For this reason, OpenFF has started the Interchange project. The objective of the Interchange project is to provide a first-class support for these operations with the OpenFF software stack. This enabled by a Python package of the same name that is described below and in its documentation.
This post aims to provide a high-level overview of the aims of the Interchange project, its current status, present capabilities, and some future directions.
Code in this will run if copy-pasted into a Jupyter notebook or Python interpreter. This is being
written in November 2021 and code snippets intended to be used with Interchange v0.1.3. The API is
not yet stable; the names, import paths, and structure of key components will likely be updated in
the future in accordance with user feedback. If you’d like to follow along, see installation
instructions at the bottom of the page or use this file to
create the same environment used while writing this post. Interchange
is distributed under the
MIT license and all functionality is accessible using only free and open-source tools.
Contents of an Interchange object
To emphasize a detail above, the key objective of the Interchange project at present is to make it easier to get OpenFF force fields in use in non-OpenMM workflows. The scope of this task is large, but it is finite; the SMIRNOFF specification used by OpenFF force fields provides clarity to many questions of information content in force fields, which is key for developing an interface to it. The OpenFF Toolkit also already provides convenient representations of molecular topologies and functions for parsing existing data into its models. Note that this does not close the door on interfacing with non-OpenFF models, it simply provides a starting point. Expanding this work outside of the OpenFF stack is discussed below (“Future work”).
An Interchange
object stores all the information known about a system; this includes its
chemistry, how that chemistry is represented by a force field, and how the system is organised in 3D
space. An Interchange
object has four components, each of which is also an attribute of the core
Python class: a chemical topology, handlers storing the physical interactions, positions
of all particles, and box vectors, if the system is periodic.
The topology
attribute stores chemical information in a graph-like structure, without specific
information about forces from the force field. Currently, this attribute is literally a
Topology
object from the OpenFF Toolkit. The OpenFF Toolkit Topology object includes necessary information
for molecular mechanics representations (masses, elements, and basic bond information) but also
richer chemical information used by cheminformatics toolkits and some portions of the SMIRNOFF
specification (bond orders, bond and atom stereochemistry, aromaticity). While this may change in
the future, this does mean that coarse-grained representations are not supported in any explicit
way.
The handlers
attribute is a mapping between chemical information and force field parameters as
they are applied to the system (i.e. post-parametrization). A unique identifier of a segment of the
chemical graph - for example, the four atoms participating in a dihedral - is mapped to a unique
identifier of the physical parameters assigned to them - in this example, the force constant, phase,
and periodicity of the dihedral force. This allows for some de-duplication of parameters; by
pointing to a representation of the force, each unique parameter is only stored once. The mapping
between chemical topology and force parameters serves as a way of connecting the forces actually
present in a simulation to how they were applied by the force field.
Particle positions are stored as a unit-tagged Nx3
matrix, where N
is the number of particles in
the topology.
Box vectors are as a unit-tagged 3x3
matrix, following conventional periodic box vectors and the
implementation in
OpenMM.
Positions and box vectors are stored internally as nanometers using a custom
Pint registry. Attempts are made to automatically convert from
other unit represtnations, including
openmm.unit
and unyt
representations. For each attribute, if
unitless arrays are passed to a setter, nanometers are assumed. If an array bearing different (but
compatible) units is passed, the conversion is done automatically.
It’s worth briefly mentioning some things that are not within the scope of what is stored here (at least now):
- Trajectory information, and other results of simulation runs, is left for post-processing libraries such as MDAnalysis and MDTraj.
- Runtime details, such as initial velocities, the choice of integrator, and how often to write to trajectory files, are left to the engine
- Hardware-specific details (i.e. which operations should be run on CPU vs. GPU cores) are also left to the engine
- Other parameters specific to free energy methods (or other schemes such as non-equilibrium dynamics) are not currently in scope, but this may change in the future.
For more on the information content of the Interchange
object, see a corresponding
section in the
documentation.
Preparing data for parametrization
The high-level entry point for using Interchange with the OpenFF stack is Interchange.from_smirnoff. The two required arguments are the key inputs for the construction of any molecular mechanics system: a chemical topology and a force field, with information about how the latter should be “applied.” This post focuses on use with the OpenFF stack, but one could imagine replacements for reach that are not part of OpenFF software.
Within the OpenFF stack, the two object models are provided by the toolkit: its
Topology
and
ForceField
classes. This post will not go into detail about the different ways that input data can be prepared;
there are many
ways to get
chemical input data into this object model! The chemical topology will simply be a caffeine molecule
and OpenFF 2.0.0, the latest mainline force field release from the Open Force Field Initiative.
from openff.toolkit.topology import Molecule, Topology
from openff.toolkit.typing.engines.smirnoff import ForceField
caffeine = Molecule.from_smiles("CN1C=NC2=C1C(=O)N(C(=O)N2C)C")
caffeine.generate_conformers(n_conformers=1)
topology = caffeine.to_topology()
sage = ForceField("openff_unconstrained-2.0.0.offxml")
To belabor the point about where exactly in a workflow we are focused, one can load in another available force field or generate a prepare a topology with an arbitrary number of different molecules.
Using Interchange
Now that everything is prepared, we simply construct an Interchnage
object with two lines of
code:
from openff.interchange.components.interchange import Interchange
interchange = Interchange.from_smirnoff(force_field=sage, topology=topology)
interchange
Interchange with 24 atoms, non-periodic topology
This object is currently missing positions and box information. Let’s use the conformer we generated earlier and assign box vectors to something more than twice the cutoff that Sage uses (0.9 nanometers).
interchange.box = [4, 4, 4]
interchange.positions = caffeine.conformers[0]
interchange.box
array([[4., 0., 0.],
[0., 4., 0.],
[0., 0., 4.]]) <Unit('nanometer')>
(This step may be more streamlined in the future, possibly with changes directly in the toolkit!)
For more complex systems, there may be other steps involved at different points in a workflow, such
as adding solvent and counter-ions or combining multiple Interchange
objects generated from
different force fields. But for this example, we’re (partially) done! The Interchange
object is
fully constructed and we can export its contents to objects understood by simulations engines.
As a brief aside,
migrating
existing code from using the OpenFF Toolkit’s OpenMM interface to using Interchange directly is
straightforward. Wherever
ForceField.create_openmm_system
is was used, simply replace it with Interchange.from_smirnoff
like so:
# Existing code generating an OpenMM System
openmm_system = sage.create_openmm_system(topology)
# New code generating an Interchange object
interchange_ = Interchange.from_smirnoff(sage, topology)
The functions have different names and outputs, but their inputs are identical!
Exporting to simulation engines
There are high-level API points for exporting to particular file formats. The focus of this project, for the most part, does not cover run-time options such as the choice of integreator, thermostat, and barostat. Discussions are underway on the possibility of providing partial input scripts that require some manual edits but include important information contained within the force field (cutoff methods and distances, in particular).
# GROMACS
interchange.to_gro("out.gro")
interchange.to_top("out.top")
# Amber
interchange.to_inpcrd("out.inpcrd")
interchange.to_prmtop("out.prmtop")
# LAMMPS
interchange.to_lammps("out.lmp")
# OpenMM (`openmm.System`)
openmm_system = interchange.to_openmm()
# The OpenFF Toolkit provides direct exports to `openmm.app.Topology`!
openmm_topology = topology.to_openmm()
Internally, each of these methods call a function somewhere in the openff.interchange.interop
module; accessing them as methods on the core object is meant to make it easier to actually do each
operation.
For more on the export layer, see a corresponding section in the documentation.
Validating exports
There is, unfortunately, not much of a standard in place for ensuring that parametrizing a chemical
topology with a given force field produces the expected results. For the purposes of this work, we
will focus on validating the results of exporting Interchange
objects to files and objects
understood by supported simulation engines. To this end, there are several functions in the
openff.interchange.drivers
module that each take in an Interchange
object, call out to an
engine a single-point energy calculation, and store the result.
This snippet continues from above, since we have conveniently already done the work of constructing
our object. Note that these functions are relatively unintelligent at the moment and simply look to
see if each engine is installed (i.e. if gmx
, sander
, etc. are in $PATH
). These might not run
on a machine that does not have them installed - feel free to skip lines if you’re following along.
from openff.interchange.drivers.amber import get_amber_energies
from openff.interchange.drivers.openmm import get_openmm_energies
from openff.interchange.drivers.gromacs import get_gromacs_energies
from openff.interchange.drivers.lammps import get_lammps_energies
print(get_gromacs_energies(interchange))
Energies:
Bond: 17.31004524230957 kJ / mol
Angle: 195.2955780029297 kJ / mol
Torsion: 13.520307540893555 kJ / mol
Nonbonded: None
vdW: 43.82065216824412 kJ / mol
Electrostatics: -707.1108989715576 kJ / mol
Note that all of these outputs will have a None
for their nonbonded energy, since the underlying
engines have separated this out into vdW
and Electrostatics
components already.
print(get_amber_energies(interchange))
Energies:
Bond: 4.1372 kcal / mol
Angle: 46.6767 kcal / mol
Torsion: 3.2314 kcal / mol
Nonbonded: None
vdW: 43.8144296 kJ / mol
Electrostatics: -707.0098095999999 kJ / mol
print(get_openmm_energies(interchange))
Energies:
Bond: 17.309860229492188 kJ / mol
Angle: 195.29534912109375 kJ / mol
Torsion: 13.520315170288086 kJ / mol
Nonbonded: None
vdW: 43.807859356853584 kJ / mol
Electrostatics: -707.037225348007 kJ / mol
print(get_lammps_energies(interchange))
Energies:
Bond: 4.1371621 kcal / mol
Angle: 46.67671 kcal / mol
Torsion: 3.2314347718102 kcal / mol
Nonbonded: None
vdW: 10.4644777915 kcal / mol
Electrostatics: -168.98040100000003 kcal / mol
There is also, for convenience of comparing these outputs, a high-level function that calls each of
the above functions and prepares the input to a single Pandas DataFrame
:
from openff.interchange.drivers.all import get_summary_data
summary_data = get_summary_data(interchange)
summary_data
Bond Angle Torsion vdW Electrostatics
OpenMM 17.309860 195.295349 13.520315 43.807859 -707.037225
Amber 17.310045 195.295313 13.520178 43.814430 -707.009810
GROMACS 17.310045 195.295578 13.520308 43.820652 -707.110899
LAMMPS 17.309886 195.295355 13.520323 43.783375 -707.013998
This allows us to quickly look see how much each engine tends to differ in its computation of each component of the potential energy function:
summary_data.std() # or .describe(), etc.
Bond 0.000100
Angle 0.000121
Torsion 0.000069
vdW 0.016327
Electrostatics 0.046856
dtype: float64
Are these errors on the order of 10-4 - 10-2 kJ/mol between engines'
evaluation of supposedly-equivalent inputs acceptable? That’s an open
question - and in this case, it’s likely that the
Interchange
’s exports to each have some small discrepancies and how each engine is called by its
driver is not entirely precise. Regardless, this is a useful framework for ensuring outputs to these
engines, and engines that may be supported in the future, are consistent with each other. This also
serves as a template for how reference energies could be distributed by force field developers:
given a molecule topology prepared in a prescribed way, force field XYZ should provide these
particular energies. At present, we have a decent idea of what engines think these energies are
but not what the original developers intended them to be; reference energies would enable a more
direct route to testing engines and perhaps uncover bugs (relatively minor, but perhaps
non-negligible) in engines themselves.
Python package
This project is hosted in a Python package (openff-interchange
) which stores the actual
Interchange
class. Development occurs in public on a GitHub
repository. The package itself is made up of
pieces that have looked in the above sections. To a rough approximation, let’s just look at some of
the files and folders in the module:
$ tree -L 1 openff/interchange
openff/interchange
├── __init__.py
├── _version.py
├── components # Core internal modules
├── conftest.py
├── data
├── drivers
├── energy_tests
├── exceptions.py
├── interop
├── interoperability_tests
├── models.py
├── py.typed
├── testing
├── types.py
├── unit_tests
└── utils.py
Many of these files and folers are for testing and/or Python-specific configuration, but some are
the source of what we’ve looked at today. components/
stores the key models used internally and
the Interchange
class itself, /drivers/
stores the functions used in energy-based tests, and
interop/
stores the code used to actually do each export.
Future work
Development is currently underway on features that will enable the typing and parametrization of
biopolymers with SMIRNOFF-style force fields. This work is progressing alongside a significant
refactor to the OpenFF toolkit to support biopolymers in its Molecule
and Topology
models. The
refactor is undergoing extensive internal testing as of November 2021 and we hope to have it in a
pre-release build of the OpenFF Toolkit in early 2021. This will enable Interchange to work with the
rest of the OpenFF stack on biopolymers, supporting our scientists currently working toward the
first protein force field produced by the initiative and external scientists who will use it in the
future.
Surveys of OpenFF’s user base indicate that most people use OpenMM, Amber, and/or GROMACS as their engine of choice, so compatibility with those engines has been the focus of development. (Support for LAMMPS was added on the basis of my experience with it.) In general, the current focus is on improving the reliability and feature parity of the engines that are currently support. On this basis, support for other exports is not highly-prioritized at the moment. If support for other engines would enable the use of OpenFF force fields in your workflows, we’d be happy to hear from you! (See “Feedback” at the end of this post).
The SMIRNOFF specification is not fully supported but we aim to fully support it in the future. Virtual sites and bond order-based parameter interpolation are supported on an experimental basis but the GBSA section is not currently prioritized due to relatively little scientific interest at the moment.
Lastly, support for importing from other object models is planned in general terms but is not a
topic of active development at this moment. As a proof of concept that this is possible - that
Interchange
is not wholly specific to SMIRNOFF force fields and the OpenFF stack - we worked
with the MoSDeF team at Vanderbilt University to add
Interchange.from_foyer
,
which, as the name implies, creates an Interchange
object from a
Foyer force field and a minimal chemical graph. Note that the structure
of the function is similar (force field plus chemical topology) but with different object swapped
in! We hope that this can work be extended to support some amount of importing from existing
force field infrastructure, such as using ligands parametrized with OpenFF force fields with
proteins parametrized with Amber force fields. We hope that Interchange
has been engineered in a
way that makes swapping out components feasible not only for us but also other developers.
Again, if there are other features that would enable Interchange
to get OpenFF force fields into
your workflows, please drop us a line.
Installation
Interchange can be installed via conda
(or mamba
) on conda-forge
:
$ conda install openff-interchange -c conda-forge
See here for more details.
Further reading
For more details, please consult the full documentation.
Examples using Interchange are available via binder, which runs Jupyter notebooks in a web browser without the need to install anything locally.
Feedback
For help with specific issues with the software or to request features, please visit the issue tracker. For more general feedback or discussions, please reach out to support@openforcefield.org