five

Yield curation USPTO rsmi/csv datasets

收藏
DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/Yield_curation_USPTO_rsmi_csv_datasets/14414039/1
下载链接
链接失效反馈
官方服务:
资源简介:
In 2017 Lowe shared curated and published USPTO based chemical reaction datasets in csv format. Based on this, Schwaller et al. published curated reaction smiles (they in turn used the curated set disclosed by Jin and coworkers). Both versions have the drawback of containing only partially curated yields. <br><br>In those datasets, two columns are available, TextMinedYield and Calculated yield. Many entries there don't contain any, partial, or incorrect numbers. For certain forms of reaction analysis focusing on yield as only available correlation, that information becomes essentially useless since there is no correlation to reaction conditions (unless one would data-mine the CML files or original XML).<br><br>By correcting and merging the yield into a new column, followed by eliminating faulty entries, the noise in the data set is reduced. The new datasets are reduced by nearly 50%.<br><br>Attached are two kinds of datasets (of each, Lowe and Schwaller):<i>A "cropped" version</i>, containing only the reaction smiles and the curated yield (and an added ID), and only entries with valid yields. Everything else was filtered out.<i>A second type, a "full" version</i>, including the curated yields and all original input columns and entries (no filtration). The latter might come in handy for other applications where one doesn't agree with the applied removal of invalid entries, or to apply further curation.<br>More details can be found on Github containing Python scripts used to procure the attached datasets and a Readme file.<br>For the less adept programmer, a graphical workflow based on the open-source data analysis platform Knime(R) is also available. The latter contains furthermore a proof of concept reaction splitter (data not included here).
提供机构:
figshare
创建时间:
2021-04-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作