Quanformer is a Python-based pipeline for generating conformers, preparing quantum mechanical (QM) calculations, and processing QM results for a set of molecules and their conformers. *** This repo has a new location here:
MIT License
README last updated: Nov 28 2018
Quanformer is a Python-based pipeline for generating conformers, preparing quantum mechanical (QM) calculations, and processing QM results for a set of input molecules. This pipeline is robust enough to use with hundreds of conformers per molecule and tens or hundreds of molecules. You will need access to either Psi4 or Turbomole for running QM calculations.
For each molecule, conformers are generated and optimized with the MM94S force field. Then input files for QM calculations are prepared for geometry optimizations, single point energy (SPE) calculations, or Hessian calculations. The user can specify any QM method and basis set that is supported in the QM software package. After the calculations have finished, this pipeline will extract final energies and geometries as well as collect job-related details such as calculation time and number of optimization steps. Analysis scripts are provided for comparing conformer energies from different QM methods, comparing calculation times from different methods, and generating nicely-formatted plots.
Example application:
In concept, this example would look like:
initialize_confs.py
→ confs_to_psi.py
→ filter_confs.py
→ [QM jobs] → filter_confs.py
→ analysis
In practice, the executor.py
code provides the interface for the various stages and components.
That being said, each component was written to be able to run independently of the others so variations of this pipeline can be conducted.
Instructions are provided below for following this example workflow.
Pipeline components and description:
Script | Stage | Brief description |
---|---|---|
avgTimeEne.py |
analysis | analyze calculation stats and relative energies for a single batch of mols |
confs_to_psi.py |
setup | generate Psi4 input files for each conformer/molecule |
confs2turb.py |
setup | generate Turbomole input files for each conformer/molecule |
opt_vs_spe.py |
analysis | compare how diff OPT energy is from pre-OPT single point energy |
executor.py |
N/A | main interface connecting "setup" and "results" scripts for Psi4 |
filter_confs.py |
setup/results | remover conformers of molecules that may be same structure |
get_psi_results.py |
results | get job results from Psi4 |
getTurbResults.py |
results | get job results from Turbomole |
match_minima.py |
analysis | match conformers from sets of different optimizations |
match_plot.py |
analysis | additional plots that can be used from match_minima.py results |
plotTimes.py |
analysis | plot calculation time averaged over the conformers for each molecule |
proc_tags.py |
results | store QM energies & conformer details as data tags in SDF molecule files |
quan2modsem.py |
analysis | interface with modified Seminario Python code |
initialize_confs.py |
setup | generate molecular structures and conformers for input SMILES string |
stitchSpe.py |
analysis | calculate relative conformer energies from sets of different SPEs |
There are other scripts in this repository that are not integral to the pipeline. These are found in the tools
directory. See the README file there.
The input (SMILES or SDF) file must be in the main directory.
The layout is mainDirectory/moleculeName/conformerNumber/[qm_job_here]
.
SDF files are numbered with the following code system. Let's say the pipeline starts with a file called basename.smi
and contains the list of SMILES strings.
basename.sdf
. This contains all molecules and all conformers of each molecule.basename-100.sdf
, where -100
means all molecules have been MM-optimized.basename-200.sdf
, in which the MM-optimized molecules are filtered to remove any redundant structures (i.e., duplicate minima).basename-210.sdf
, which contains the QM-calculated molecules of the -200
file.basename-220.sdf
.This process can go through a second round of QM calculations. QM calculations can be either geometry optimizations or
single point energy calculations. If the basename-200.sdf
is fed into both routes, then each route will have its own
basename-210.sdf
file. Don't do this in the same directory obviously, else one file will be overwritten. The endmost
product will be basename-222.sdf
though one could certainly stop before QM stage 2.
Why bother keeping the -221
files? They can be used to compare relative energies of single point energy calculations,
or geometry optimizations, since (mol1,confA) will start from the same structure of the compared files. After filtering,
the number of conformers may be reduced, so it can be hard to compare one to one.
An -f
prefix means that the Omega-generated conformers were filtered based on their structures, but that these have not
been MM-optimized. For example, basename-f020.sdf
means filtered from OpenEye Omega, no MM opt/filter, yes QM opt/filter, no QM stage 2.
In summary,
1xx
= MM opt but no filter2xx
= MM opt and filterx1x
= QM opt but no filterx2x
= QM opt and filterxx1
= either QM second opt or SPE and no filterxx2
= either QM second opt or SPE and filterThe instructions below describe how to take a set of molecules from their starting SMILES strings to:
file-200.sdf
file-210.sdf
file-220.sdf
file-221.sdf
file-222.sdf
Create input file with SMILES strings and names for each molecule. See subsections below on "Naming molecules in the input SMILES file" and "File name limitations".
Generate conformers, perform quick MM optimization, and create Psi4 input files.
python executor.py -f file.smi --setup -m 'mp2' -b 'def2-sv(p)'
Run Psi4 QM calculations.
jobcount.sh
script in the tools directory can be helpful for counting number of total/remaining jobs.xyzByStep.sh
script in the tools directory.xyzByStep.sh 10 output.dat view.xyz
Get Psi4 results.
python executor.py -f file-200.sdf --results
In a different directory (e.g., subdirectory), set up Psi4 OPT2 calculations from last results.
python executor.py -f file-220.sdf --setup -t 'opt' -m 'b3lyp-d3mbj' -b 'def2-tzvp'
python executor.py -f file-220.sdf --setup -t 'spe' -m 'b3lyp-d3mbj' -b 'def2-tzvp'
Run Psi4 jobs. (See notes on step 3.)
Get Psi4 results from second-level calculations.
python executor.py -f file-220.sdf --results -t 'opt'
python executor.py -f file-220.sdf --results -t 'spe'
(opt.) Get wall clock times, num opt steps, relative energies.
python avgTimeEne.py --relene -f file.sdf -m 'b3lyp-d3mbj' -b 'def2-tzvp'
-- [TODO recheck]Combine results from various job types to calculate model uncertainty.
analysis.md
Base names (e.g. basename.smi
, basename.sdf
) can contain underscores but no dash (-) and no pound sign (#).
basename-set1.smi
basename#set1.smi
basename_set1.smi
Smiles file should contain, in each line: SMILES_STRING molecule_title
and be named in format of basename.smi
.
CC(C(C(C)O)O)O AlkEthOH_c42
CCCC AlkEthOH_c1008
CCOC(C)(C)C(C)(C)O AlkEthOH_c1178
This pipeline is meant to be used with SDF files because it can store multiple molecules as well as data tags associated with each molecule. That being said, it has been applied in a few scenarios with MOL2 files (one molecule and all its conformers). If you try a non-SDF file, do check that the molecule name and the total charge are listed correctly in the Psi4 input files.
This pipeline uses some preset parameters, which can be modified in the function calls of executor.py
or in the parent code.
Descriptions coming soon. [TODO]
initialize_confs.py
: resolve_clash=True
, for resolving steric clashesinitialize_confs.py
: do_opt=True
, for performing quick steepest descent optimizationPertaining to software packages:
Pertaining to files and formatting:
Pertaining to QM method:
MP2
- second order Moller-Plesset perturbation theory (adds electron correlation effects upon Hartree-Fock)B3LYP
- DFT hybrid functional, (Becke, three-parameter, Lee-Yang-Parr) exchange-correlation functionalPBE0
- DFT functional hybrid functional, (Perdew–Burke-Ernzerhof)D3
- Grimme et al. dispersion correction method, (ref)D3BJ
- D3 with Becke-Johnson damping, (ref)D3MBJ
- Sherrill et al. modifications to D3BJ approach, (ref)Pertaining to basis set:
def2
- 'default' basis sets with additional polarization fx compared to 'def-'SV(P)
- double zeta valence with polarization on all non-hydrogen atomsTZVP
- triple zeta valence with polarization on all atoms