Projecting Molecules into Synthesizable Chemical Spaces (ICML 2024)
MIT License
🎯 Projecting Molecules into Synthesizable Chemical Spaces (ICML 2024)
Please clone the repository with the --recurse-submodules
flag to include the third-party submodules.
git clone --recurse-submodules https://github.com/luost26/ChemProjector.git
# Install conda environment
conda env create -f env.yml -n chemprojector
conda activate chemprojector
# Install ChemProjector package
pip install -e .
The default CUDA version is 11.8. If you have to use a different version, please modify the env.yml
file accordingly.
We provide preprocessed building block data. You can download it from here and put it in the data
directory.
However, the data is derived from Enamine's building block catalog, which are available only upon request.
Therefore, you should first request the data from Enamine here and download the US Stock catalog into the data
directory.
Then run the following script which will check whether you have a copy of the Enamine's catalog and unarchive the preprocessed data for you:
python unarchive_wizard.py
You may also process the building block data by yourself. Please refer to the scripts/preprocess_data
directory for more details.
You can download the trained weights from here and put them in the data/trained_weights
directory.
You can create a list of SMILES strings in CSV format (example: data/example.csv
) and run the following command to project them into the synthesizable chemical space.
python sample.py \
--model-path data/trained_weights/original_default.ckpt \
--input data/example.csv \
--output results/example.csv
Using the test split:
./scripts/synthesis_planning_test_split.sh
or using the ChEMBL dataset:
./scripts/synthesis_planning_chembl.sh
Please refer to the scripts/sbdd
directory for details.
Please refer to the scripts/goal_directed
directory for details.
python train.py ./configs/original_default.yml
@inproceedings{luo2024chemprojector,
title={Projecting Molecules into Synthesizable Chemical Spaces},
author={Shitong Luo and Wenhao Gao and Zuofan Wu and Jian Peng and Connor W. Coley and Jianzhu Ma},
booktitle={Forty-first International Conference on Machine Learning},
year={2024}
}