Derify/pubchem_10m_genmol_similarity
收藏Hugging Face2025-09-09 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/Derify/pubchem_10m_genmol_similarity
下载链接
链接失效反馈官方服务:
资源简介:
PubChem 10M GenMol指纹相似性数据集是一个基于PubChem 10M数据集增强的版本,通过GenMol模型生成了分子相似性数据。该数据集包含了以SMILES字符串表示的分子结构及其对应的分子指纹、相似度评分和各种分子属性。该数据集用于训练Chem-MRL模型,并确保了生成的分子通过RDKit的有效性检查,相似度评分确保了与参考分子的结构相关性,同时过滤了重复分子以保持数据集质量。
The PubChem 10M GenMol Fingerprint Similarity Dataset is an augmented version of the PubChem 10M dataset, enhanced with molecular similarity data generated using GenMol, as described in the paper GenMol: A Drug Discovery Generalist with Discrete Diffusion. The dataset contains molecular structures represented as SMILES strings along with their corresponding molecular fingerprints, similarity scores, and various molecular properties. It is used for training the Chem-MRL model and ensures the generated molecules pass RDKit validity checks, similarity scores ensure structural relevance to reference molecules, and duplicate molecules are filtered to maintain dataset quality.
提供机构:
Derify



