SourceData-NLP
收藏arXiv2023-10-31 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/EMBO/SourceData
下载链接
链接失效反馈官方服务:
资源简介:
SourceData-NLP数据集是由欧洲分子生物学组织创建的,专注于细胞和分子生物学领域的科学出版物。该数据集包含超过620,000个生物实体的标注,这些标注来自3,223篇论文中的18,689个图表。数据集强调对图表说明中生物实体的标注,涵盖了从小分子到物种的多种生物组织层次。创建过程中,数据集通过常规的出版流程进行整理,确保了数据的质量和准确性。SourceData-NLP数据集的应用领域广泛,旨在通过自动化知识提取,帮助研究人员快速更新和理解文献中的关键概念和实验设计,进而推动生物医学领域的发展。
The SourceData-NLP dataset was created by the European Molecular Biology Organization, focusing on scientific publications in the domain of cellular and molecular biology. It comprises annotations for more than 620,000 biological entities, extracted from 18,689 figures across 3,223 research papers. The dataset places core emphasis on annotating biological entities within figure legends, covering multiple hierarchical tiers of biological organization spanning from small molecules to complete species. During its curation, the dataset was processed following standard publication workflows to ensure its quality and accuracy. The SourceData-NLP dataset has a wide range of application scenarios, aiming to assist researchers in rapidly updating their understanding and grasping key concepts and experimental designs in scientific literature via automated knowledge extraction, thereby advancing the development of the biomedical field.
提供机构:
欧洲分子生物学组织
创建时间:
2023-10-31



