five

Cost-Efficient Repurposing of a Monolingual SMILES-Based Chemical Transformer to SELFIES

收藏
DataCite Commons2025-05-06 更新2025-05-17 收录
下载链接:
https://data.mendeley.com/datasets/27j2zg6f5x/1
下载链接
链接失效反馈
官方服务:
资源简介:
This repository supports the manuscript “Cost-Efficient Repurposing of a Monolingual SMILES-Based Chemical Transformer to SELFIES,” providing all necessary data, models, and code for reproducing the reported experiments and figures. It includes two core datasets (SMILES_to_SELFIES.csv and Filtered_QM9.csv) for SELFIES-based finetuning and QM9 regression, along with a zip archive (selfies_finetuned_model.zip) containing the final ChemBERTa model finetuned on SELFIES. Also provided are four Jupyter notebooks—Finetuning and Figures.ipynb, QM9 regression: SELFIES FT model.ipynb, QM9 regression: ChemBERTa-77M-MLM model.ipynb, and QM9 Regression: ChemBERTa-zinc-base-v1 model.ipynb—which illustrate the steps to generate all analysis, plots, and performance metrics. Each notebook includes code and outputs showing the end-to-end methodology, from data preparation through model evaluation.
提供机构:
Mendeley Data
创建时间:
2025-01-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作