Data
All our datasets are available on GitHub and on QCArchive, with those used in force field optimization and benchmarking are also available on Zenodo. Feel free to contact us if you have any questions!
Quantum chemistry data
The Open Force Field Initiative uses QCArchive infrastructure to compute, store and access quantum chemistry data. Our data generation and submission scripts for each dataset are available in our OpenFF QCArchive Dataset Submission repository.
A select number of datasets used to train or benchmark our flagship force field or those routinely leveraged by our collaborators are also available as Zenodo records. These records contain “dataset views” and Docker images equiped with a Jupyter notebook entry points containing examples of how to access the data we use to fit our force fields. These dataset views are SQLite files exported from QCArchive with calculation records serialized with msgpack and compressed with zstandard, a combination that provides data with a lossless reduced size.
Datasets available on Zenodo as dataset views
Flagship Forcefield Datasets
- QC Fitting Datasets for OpenFF SMIRNOFF Sage 2.0.0
- QC Fitting Datasets for OpenFF SMIRNOFF Sage 2.1.0
- QC Fitting Datasets for OpenFF SMIRNOFF Sage 2.2.0
Benchmarking and Other Datasets
- QC Optimization Dataset: OpenFF Industry Benchmark Season 1 v1.2
- QC Singlepoint Dataset: MLPepper RECAP Optimized Fragments v1.1
- QC Singlepoint Dataset: TorsionNet500 Single Points Dataset v1.0
- QC Torsiondrive Dataset: OpenFF Rowley Biaryl v1.0
- QC Singlepoint Dataset: OpenFF ESP Fragment Conformers v1.0
Physical properties
We use NIST ThermoML archive to access condensed phase physical properties of various compounds included in our force field optimization and benchmarking. The utilities for automated selection and curation of these datasets are available as a part of OpenFF Evaluator, developed by Simon Boothroyd.
An older version of selected physical properties datasets can be found in our Open Forcefield Data repository.
Protein-ligand free energies
Our protein-ligand benchmarking dataset for calculating binding free energies can be accessed in our ProteinLigandBenchmarks repository.
MiniDrugBank
Our MiniDrugBank repository tracks the creation and evolution of the MiniDrugBank Molecule set, filtered from DrugBank Release Version 5.0.1.