Scientific publications from the Open Force Field Initiative.
Improving force field accuracy by training against condensed phase mixture properties
Simon Boothroyd, Owen C. Madin, David L. Mobley, Lee-Ping Wang, John D. Chodera and Michael R. Shirts
Code accompanying the publication: SimonBoothroyd/binary-mixture-publication
Developing a sufficiently accurate classical force field representation of molecules is key to realizing the full potential of molecular simulation as a route to gaining fundamental insight into a broad spectrum of chemical and biological phenomena. This is only possible, however, if the many complex interactions between molecules of different species in the system are accurately captured by the model. Historically, the intermolecular van der Waals (vdW) interactions have primarily been trained against densities and enthalpies of vaporization of pure (single-component) systems, with occasional usage of hydration free energies. In this study, we demonstrate how including physical property data of binary mixtures can better inform these parameters, encoding more information about the underlying physics of the system in complex chemical mixtures. To demonstrate this, we re-train a select number of the Lennard-Jones parameters describing the vdW interactions of the OpenFF 1.0.0 (Parsley) fixed charge force field against training sets composed of densities and enthalpies of mixing for binary liquid mixtures as well as densities and enthalpies of vaporization of pure liquid systems, and assess the performance of each of these combinations. We show that retraining against the mixture data almost universally improves the force field’s ability to reproduce both pure and mixture properties, reducing some systematic errors that exist when training vdW interactions against properties of pure systems only.
The Open Force Field Evaluator: An automated, efficient, and scalable framework for the estimation of physical properties from molecular simulation
Simon Boothroyd, Lee-Ping Wang, David Mobley, John Chodera, and Michael Shirts
Code accompanying the publication: openforcefield/openff-evaluator
Parameterization and assessment of force fields against high-quality, condensed phase physical property data is an integral aspect of force field development. Here we present an entirely automated, highly scalable framework for evaluating physical properties and their gradients in terms of force field parameters. It is written as a modular and extensible Python framework, which employs an intelligent multiscale estimation approach that allows for the automated estimation of properties from simulation and cached simulation data, and a pluggable API for estimation of new properties. In this study we demonstrate the utility of the framework by benchmarking the OpenFF 1.0.0 small molecule force field, GAFF 1.8 and GAFF 2.1 force fields against a test set of binary density and enthalpy of mixing measurements curated using the frameworks utilities. Further, we demonstrate the framework’s utility as part of force field optimization by using it alongside ForceBalance, a framework for systematic force field optimization, to retrain a set of non-bonded van der Waals parameters against a training set of density and enthalpy of vaporization measurements.
Bayesian inference-driven model parameterization and model selection for 2CLJQ fluid models
Owen C. Madin, Simon Boothroyd, Richard A. Messerly, John D. Chodera, Josh Fass, and Michael R. Shirts
Code accompanying the publication: SimonBoothroyd/bayesiantesting
A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations, but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add additional complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In this work we demonstrate the use of Bayes factors for molecular model selection, using Monte Carlo sampling techniques to evaluate the evidence for different levels of complexity in the two-centered Lennard-Jones + quadrupole (2CLJQ) fluid model. Examining three levels of nested model complexity, we demonstrate that the use of variable quadrupole and bond length parameters in this model framework is justified only sometimes. We also explore the effect of the Bayesian prior distribution on the Bayes factors, as well as ways to propose meaningful prior distributions. This Bayesian Markov Chain Monte Carlo (MCMC) process is enabled by the use of analytical surrogate models that accurately approximate the physical properties of interest. This work paves the way for further atomistic model selection work via Bayesian inference and surrogate modeling
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks
David F. Hahn, Christopher I. Bayly, Hannah E. Bruce Macdonald, John D. Chodera, Antonia S. J. S. Mey, David L. Mobley, Laura Perez Benito, Christina E.M. Schindler, Gary Tresadern, Gregory L. Warren
Preprint ahead of publication: arXiv
Code accompanying the publication: openforcefield/FE-Benchmarks-Best-Practices
As new methods, force fields, and implementations are developed, assessing the expected accuracy of free energy calculations on real-world systems (benchmarking) becomes critical to provide users with an assessment of the accuracy expected when these methods are applied within their domain of applicability, and developers with a way to assess the expected impact of new methodologies. Here, we present guidelines for (1) curating experimental data to develop meaningful benchmark sets, (2) preparing benchmark inputs according to best practices to facilitate widespread adoption, and (3) analysis of the resulting predictions to enable statistically meaningful comparisons among methods and force fields.
Development and Benchmarking of Open Force Field v1.0.0, the Parsley Small Molecule Force Field
Yudong Qiu, Daniel Smith, Simon Boothroyd, Hyesu Jang, Jeffrey Wagner, Caitlin C. Bannan, Trevor Gokey, Victoria T. Lim, Chaya Stern, Andrea Rizzi, Xavier Lucas, Bryon Tjanaka, Michael R. Shirts, Michael Gilson, John Chodera, Christopher I. Bayly, David Mobley, Lee-Ping Wang
Published: Journal of Chemical Theory and Computation 17:10:6262-6280, 2021 [DOI]
Code accompanying the publication: openforcefield/openforcefield-forcebalance/tree/v1.0.0-RC2
We describe the structure and optimization of the Open Force Field 1.0.0 small molecule force field, code-named Parsley. Parsley uses the SMIRKS-native Open Force Field (SMIRNOFF) parameter assignment formalism in which parameter types are assigned directly by chemical perception, in contrast to traditional atom type-based approaches. This method provides a natural means to incorporate increasingly diverse chemistry without needlessly increasing force field complexity. In this work, we present essentially a full optimization of the valence parameters in the force field. The optimization was carried out with the ForceBalance tool and was informed by reference quantum chemical data that include torsion potential energy profiles, optimized gas-phase structures, and vibrational frequencies. These data were computed and are maintained with QCArchive, an open-source and freely available distributed computing and database software ecosystem. Tests of the resulting force field against compounds and data types outside the training set show improvements in optimized geometries and conformational energetics and demonstrate that Parsley’s accuracy for liquid properties is similar to that of other general force fields.
End-to-end differentiable molecular mechanics force field construction
Yuanqing Wang, Josh Fass, and John D. Chodera
Code accompanying the publication: choderalab/espaloma
Molecular mechanics force fields have been a workhorse for computational chemistry and drug discovery. Here, we propose a new approach to force field parameterization in which graph convolutional networks are used to perceive chemical environments and assign molecular mechanics (MM) force field parameters. The entire process of chemical perception and parameter assignment is differentiable end-to-end with respect to model parameters, allowing new force fields to be easily constructed from MM or QM force fields, extended, and applied to arbitrary biomolecules.
Capturing non-local through-bond effects when fragmenting molecules for quantum chemical torsion scans
Chaya D. Stern, Christopher I. Bayly, Daniel G. A. Smith, Josh Fass, Lee-Ping Wang, David L. Mobley, and John D. Chodera
Code accompanying the publication: openforcefield/fragmenter
We show how the Wiberg Bond Order (WBO) can be used to construct small molecule fragmentation schemes that will avoid disrupting the chemical environment around torsions. The resulting fragmentation scheme powers the QCSubmit tool used to fragment and inject small molecule datasets into the QCFractal computation pipeline for deposition into the QCArchive quantum chemistry archive the Open Force Field Initiative uses for constructing force fields, as well as powering bespoke torsion refitting for individual molecules.
Improving Small Molecule Force Fields by Identifying and Characterizing Small Molecules with Inconsistent Parameters
Jordan Ehrman, Victoria T. Lim, Caitlin C. Bannan, Nam Thi, Daisy Kyu, and David Mobley
Published: Journal of Computer-Aided Molecular Design 35:271-284, 2021 [DOI]
Code accompanying the publication: mobleylab/off-ffcompare
We present a pipeline for comparing the geometries of small molecule conformers optimized with different force fields. We aimed to identify molecules or chemistries that are particularly informative for future force field development because they display inconsistencies between force fields. We applied our pipeline to a subset of the eMolecules database, and highlighted molecules that appear to be parameterized inconsistently across different force fields. We then identified over-represented functional groups in these molecule sets. The molecules and moieties identified by this pipeline may be particularly helpful for future force field parameterization.
Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning/molecular mechanics potentials
Dominic Rufa, Hannah Bruce Macdonald, Josh Fass, Marcus Wieder, Patrick Grinaway, Adrian Roitberg, Olexandr Isayev and John Chodera
Code accompanying the publication: choderalab/perses
This study combines a new generation of hybrid ML/MM potentials and a nonequilibrium perturbation approach to predict protein-ligand binding affinities. With this approach, a standard, GPU-accelerated MM alchemical free energy calculation can be corrected in a simple post-processing step to efficiently recover ML/MM free energies, while delivering a significant accuracy improvement with small additional computational effort. The authors show that it is possible to significantly reduce the error in absolute binding free energies with this new hybrid ML/MM approach ANI2xx/AMBER14SB/TIP3P on Tyk2 benchmarking system. The same set of FE calculations performed with OpenFF-1.0.0 instead of ANI2xx to model ligands achieves RMSE statistically indistinguishable from the Schrodinger JACS result for the tested system, which implies that we should expect even better results with the latest Parsley update (OpenFF-1.2.0).
Benchmark Assessment of Molecular Geometries and Energies from Small Molecule Force Fields
Victoria T. Lim and David L. Mobley
Published: F1000Research 9:1390, 2020 [DOI]
Code accompanying the publication: mobleylab/benchmarkff
In this work, we aim to compare six force fields: GAFF, GAFF2, MMFF94, MMFF94S, SMIRNOFF99Frosst, and the openff-1.0.0 (Parsley) force field by focusing on small molecules (< 50 heavy atoms). On a dataset comprising over 26,000 molecular structures, we analyzed their force field-optimized geometries and conformer energies compared to reference quantum mechanical (QM) data. We show that most of these force fields are comparable in accuracy at reproducing gas-phase QM geometries and energetics, but that GAFF/GAFF2/Parsley do slightly better in reproducing QM energies and that MMFF94/MMFF94S perform slightly better in geometries. Parsley version OpenFF-1.0.0 shows considerable improvement over its predecessor SMIRNOFF99Frosst, while OpenFF-1.2.0 performs even better with accuracy comparable to other available general force fields. We identify particular outlying chemical groups for further force field improvement.
Driving torsion scans with wavefront propagation
Yudong Qiu, Daniel G. A. Smith, Chaya D. Stern, Mudong Feng, Hyesu Jang, and Lee-Ping Wang
Published: The Journal of Chemical Physics 152:244116, 2020 [DOI]
Code accompanying the publication: lpwgroup/torsiondrive/
In this paper, we propose a systematic and versatile workflow called TorsionDrive to generate energy-minimized structures on a grid of torsion constraints by means of a recursive wavefront propagation algorithm, which resolves the deficiencies of conventional scanning approaches and generates higher quality QM data for force field development. The method is implemented in an open-source software package that is compatible with many QM software packages and energy minimization codes. The paper also describes integration with the MolSSI QCArchive distributed computing ecosystem.
Binding thermodynamics of host-guest systems with SMIRNOFF99Frosst 1.0.5 from the Open Force Field Initiative
David R. Slochower, Niel Henriksen, Lee-Ping Wang, John D. Chodera, David L. Mobley, and Michael K. Gilson
Published: Journal of Chemical Theory and Computation 15:6225, 2019 [DOI]
Code accompanying the publication: slochower/smirnoff-host-guest-manuscript
We evaluate the accuracy of SMIRNOFF99Frosst, using free energy calculations of 43 α and β-cyclodextrin host-guest pairs and compare with matched calculations using two versions of GAFF. These results suggest that SMIRNOFF99Frosst performs competitively with existing small molecule force fields and is a parsimonious starting point for optimization.
Graph Nets for Partial Charge Prediction
Yuanqing Wang and Josh Fass and Chaya D. Stern and John Chodera
Code accompanying the publication: choderalab/gimlet
Graph convolutional and message-passing networks can be a powerful tool for predicting physical properties of small molecules when coupled to a simple physical model that encodes the relevant invariances. Here, we show the ability of graph nets to predict partial atomic charges for use in molecular dynamics simulations and physical docking.
ChemPer: An Open Source Tool for Automatically Generating SMIRKS Patterns
Caitlin C. Bannan, David Mobley
Code accompanying the publication: MobleyLab/chemper
In this work, we present ChemPer – a new tool for generating SMIRKS patterns based on clustered fragments (i.e. bonds, angles, or torsions) which should be assigned the same force field parameter. We demonstrate the utility of ChemPer by clustering fragments based on a reference force field, and then regenerating those parameters starting with a simple set of alkanes, ethers, and alcohols.
Systematic Optimization of Water Models Using Liquid/Vapor Surface Tension Data
Yudong Qiu, Paul S. Nerenberg, Teresa Head-Gordon, Lee-Ping Wang
Published: The Journal of Physical Chemistry B 123:7061, 2019 [DOI]
Code accompanying the publication: leeping/forcebalance
This work investigates whether experimental surface tension measurements, which are less sensitive to quantum and self-polarization corrections, are able to replace the usual reliance on the heat of vaporization as experimental reference data for fitting force field models of molecular liquids.
Uncertainty quantification confirms unreliable extrapolation toward high pressures for united-atom Mie λ-6 force field
Richard A. Messerly, Michael R. Shirts, and Andrei F. Kazakov
Published: The Journal of Chemical Physics 149:114109, 2018 [DOI]
We demonstrate how Bayesian approaches can be used to estimate the reliability of predictions made with molecular mechanics force fields.
Toward learned chemical perception of force field typing rules
Camila Zanette, Caitlin C. Bannan, Christopher I. Bayly, Josh Fass, Michael K. Gilson, Michael R. Shirts, John Chodera, and David L. Mobley
Published: Journal of Chemical Theory and Computation 15:402, 2019 [DOI]
Code accompanying the publication: openforcefield/smarty
Here, we introduce a proof of principle for automatically sampling chemical perception compared to traditional atom typed force fields and our SMIRNOFF format.
Facile Synthesis of a Diverse Library of Mono-3-substituted β-Cyclodextrin Analogues
Kathryn Kellett, Brendan M. Duggan and Michael K. Gilson
Published: Supramolecular Chemistry 31:251, 2019 [DOI]
We show the facile synthesis of a library of diverse mono-3-substituted β-cyclodextrin analogues, that have the potential to be used to collect host-guest binding data to test and improve simulation force fields.
Escaping atom types using direct chemical perception
David Mobley, Caitlin C. Bannan, Andrea Rizzi, Christopher I. Bayly, John D. Chodera, Victoria T Lim, Nathan M. Lim, Kyle A. Beauchamp, Michael R. Shirts, Michael K. Gilson, and Peter K. Eastman
This paper introduces the SMIRNOFF format in the context of traditional force fields, explains the development and validation of our new small molecule force field smirnoff99Frosst, and highlights some directions the initiative is headed.
Toward Expanded Diversity of Host–Guest Interactions via Synthesis and Characterization of Cyclodextrin Derivatives
Kathryn Kellett, S. A. Kantonen, Brendan M. Duggan and Michael K. Gilson
Published: The Journal of Solution Chemistry 47:1597, 2018 [DOI]
This paper shows the synthesis of a mono-3-functionalized cyclodextrin and how cyclodextrin derivatives can effect the binding of guest molecules using 1D/2D NMR and ITC experiments.
Approaches for Calculating Solvation Free Energies and Enthalpies Demonstrated with an Update of the FreeSolv Database
Guilherme Duarte Ramos Matos, Daisy Y. Kyu, Hannes H. Loeffler, John D. Chodera, Michael R. Shirts, and David L. Mobley
Code accompanying the publication: mobleylab/freesolv
We review alchemical methods for computing solvation free energies and present an update (version 0.5) to the FreeSolv database of experimental and calculated hydration free energies of neutral compounds.
Towards Automated Benchmarking of Atomistic Forcefields: Neat Liquid Densities and Static Dielectric Constants from the ThermoML Data Archive
Kyle A. Beauchamp, Julie M. Behr, Ariën S. Rustenburg, Christopher I. Bayly, Kenneth Kroenlein, and John D. Chodera
Code accompanying the publication: choderalab/LiquidBenchmark
Progress in forcefield validation and parameterization has been hindered by the availability of high-quality machine-readable physical property data for small organic molecules. We show how the NIST ThermoML dataset provides a solution to this problem, and demonstrate its utility in benchmarking the GAFF/AM1-BCC small molecule forcefield on neat liquid densities and static dielectric constants to uncover problems in the representation of low-dielectric environments.