Reconnected InChI-IUPAC Dataset for Metal-Containing Compounds
收藏DataCite Commons2026-05-03 更新2026-05-04 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/KDDCKZ
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 1 million pairs of standard InChI, reconnected InChI, and IUPAC systematic names for metal-containing compounds, intended to support research on InChI-to-IUPAC translation. Each record includes category labels and metal annotations such as category, has_metal and primary_metal, as well as derived length fields: standard_inchi_len, reconnected_inchi_len and iupac_len. The dataset has been cleaned by removing invalid or placeholder names, enforcing InChI and IUPAC format checks, applying length constraints, normalising whitespace, and deduplicating records based on unique structure-name pairs.
本数据集收录了100万组针对含金属化合物的标准InChI(InChI)、重连InChI(reconnected InChI)以及IUPAC系统命名(IUPAC systematic names),旨在为InChI至IUPAC的命名转换研究提供支撑。每条记录均包含类别标签与金属相关注释项,涵盖category、has_metal、primary_metal等注释字段,同时附带衍生长度字段:standard_inchi_len、reconnected_inchi_len及iupac_len。本数据集已完成数据清洗流程,具体包括移除无效与占位符命名、开展InChI与IUPAC格式合规性校验、应用长度约束规则、规范化空白字符处理,以及基于唯一结构-命名对完成记录去重。
提供机构:
Harvard Dataverse
创建时间:
2026-02-22



