Cost-Efficient Repurposing of a Monolingual SMILES-Based Chemical Transformer to SELFIES
收藏DataCite Commons2025-05-06 更新2025-05-17 收录
下载链接:
https://data.mendeley.com/datasets/27j2zg6f5x/1
下载链接
链接失效反馈官方服务:
资源简介:
This repository supports the manuscript “Cost-Efficient Repurposing of a Monolingual SMILES-Based Chemical Transformer to SELFIES,” providing all necessary data, models, and code for reproducing the reported experiments and figures. It includes two core datasets (SMILES_to_SELFIES.csv and Filtered_QM9.csv) for SELFIES-based finetuning and QM9 regression, along with a zip archive (selfies_finetuned_model.zip) containing the final ChemBERTa model finetuned on SELFIES. Also provided are four Jupyter notebooks—Finetuning and Figures.ipynb, QM9 regression: SELFIES FT model.ipynb, QM9 regression: ChemBERTa-77M-MLM model.ipynb, and QM9 Regression: ChemBERTa-zinc-base-v1 model.ipynb—which illustrate the steps to generate all analysis, plots, and performance metrics. Each notebook includes code and outputs showing the end-to-end methodology, from data preparation through model evaluation.
提供机构:
Mendeley Data
创建时间:
2025-01-27



