five

bigbio/flambe

收藏
Hugging Face2024-07-18 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/bigbio/flambe
下载链接
链接失效反馈
官方服务:
资源简介:
FlaMBe数据集旨在从生物医学文本中提取程序性知识,特别关注学术论文中描述的单细胞研究方法。该数据集包含55篇全文文章和1,195篇摘要的注释,覆盖了近710,000个标记,并以其对组织/细胞类型、软件工具和计算方法的全面命名实体识别(NER)和消歧(NED)而著称。此外,数据集还链接了相关知识库中的实体标识符,并注释了近400个工作流关系。

FlaMBe is a dataset aimed at procedural knowledge extraction from biomedical texts, particularly focusing on single cell research methodologies described in academic papers. It includes annotations from 55 full-text articles and 1,195 abstracts, covering nearly 710,000 tokens, and is distinguished by its comprehensive named entity recognition (NER) and disambiguation (NED) for tissue/cell types, software tools, and computational methods. This dataset, to our knowledge, is the largest of its kind for tissue/cell types, links entities to identifiers in relevant knowledge bases and annotates nearly 400 workflow relations between tool-context pairs.
提供机构:
bigbio
原始信息汇总

数据集概述

基本信息

数据集描述

  • 目标: 从生物医学文本中提取程序性知识,特别关注单细胞研究方法论
  • 内容: 包含来自55篇全文文章和1,195篇摘要的标注,覆盖约710,000个令牌
  • 特点: 全面的命名实体识别(NER)和消歧(NED),涵盖组织/细胞类型、软件工具和计算方法
  • 规模: 据我们所知,是同类数据集中最大的,涉及组织/细胞类型
  • 关联: 将实体链接到相关知识库,并标注近400个工具-上下文对的流程关系

任务类型

  • 命名实体识别(NER)
  • 命名实体消歧(NED)

引用信息

@inproceedings{ author = {Dannenfelser, Ruth and Zhong, Jeffrey and Zhang, Ran and Yao, Vicky}, title = {Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts}, publisher = {Advances in Neural Information Processing Systems}, volume = {36}, year = {2024}, url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/23e3d86c9a19d0caf2ec997e73dfcfbd-Paper-Datasets_and_Benchmarks.pdf}, }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作