haydn-jones/PubChem
收藏Hugging Face2024-12-12 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/haydn-jones/PubChem
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含化学物质的相关信息,主要字段包括CID(化学物质唯一标识符)、SMILES(简化分子线性输入规范)和SELFIES(自引用嵌入字符串)。数据集分为训练集、验证集和测试集,分别包含95207924、11900990和11900991个示例。数据集总大小为36600584436.0字节,下载大小为12629892833字节。该数据集与化学、生物学和医学领域相关,规模在1亿到10亿之间。
This dataset contains information related to chemical substances, with main fields including CID (Chemical Identifier), SMILES (Simplified Molecular Input Line Entry System), and SELFIES (Self-referencing Embedded Strings). The dataset is divided into training, validation, and test sets, containing 95207924, 11900990, and 11900991 examples respectively. The total size of the dataset is 36600584436.0 bytes, with a download size of 12629892833 bytes. This dataset is related to the fields of chemistry, biology, and medicine, and its scale is between 100 million and 1 billion.
提供机构:
haydn-jones



