Datasets: Molecular Entities as Structured Data on the Web
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/n9xwfs5fcj
下载链接
链接失效反馈官方服务:
资源简介:
Internet search engines have remodeled the use of the internet, making it easy to find the content we are interested in. The Web was originally designed to exchange natural language documents. It is difficult for machines to interpret this type of data. Structured data placed on websites solves this problem by allowing search engines to "understand" the content better. This can also be applied to chemical data.
We have developed three tools to convert chemical data into structured data. SDFEater allows to convert SDF files, Molstruct converts CSV files and MEgen is a web application that allows entering data in a form. Using our tools, we generated 10 datasets including 5 main datasets (DS1, DS2, DS3, DS4, and DS5) and 5 small datasets (DS1s, DS2s, DS3s, DS4s, and DS5s) consisting of 10 files with one molecule each. They are based on well-known chemical databases (ChEBI, DrugBank, PubChem) as well as other data (WikiData). We make them available in JSON-LD HTML, JSON-LD, RDFa, and Microdata structured data formats.
More details about the inputs and outputs as well as how the data is generated can be found in README.txt.
互联网搜索引擎重塑了互联网的使用形态,让用户能够便捷获取自身感兴趣的网络内容。万维网最初的设计目标是实现自然语言文档的交换,但此类数据难以被机器有效解读。网站部署的结构化数据可解决这一难题,帮助搜索引擎更好地“理解”网页内容,该思路同样可应用于化学数据领域。
本团队开发了三款可将化学数据转换为结构化数据的工具:SDFEater用于转换SDF文件,Molstruct用于CSV文件格式转换,MEgen则是一款支持通过表单录入数据的Web应用。依托上述工具,我们共生成10组数据集,其中包含5组主数据集(DS1、DS2、DS3、DS4及DS5)与5组小型数据集(DS1s、DS2s、DS3s、DS4s及DS5s),后者均由10个单分子文件组成。这批数据集依托知名化学数据库(ChEBI、DrugBank、PubChem)及其他数据源(WikiData)构建,并以JSON-LD HTML、JSON-LD、RDFa及Microdata四种结构化数据格式对外发布。
有关数据的输入输出细节及数据集生成方式的更多信息,请参阅README.txt文件。
创建时间:
2021-04-21



