five

bigbio/mirna

收藏
Hugging Face2022-12-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/bigbio/mirna
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含301个Medline引用,这些文档的摘要文本中提到了miRNA。基因、疾病和miRNA实体被手动标注。数据集分为训练集和测试集,分别来自201和100个文档。数据集的主要任务是命名实体识别(NER)和命名实体消歧(NED)。数据集是单语言的,仅包含英文内容,并且是公开的,可以在PubMed上找到。数据集的许可证是CC BY-NC 3.0。

This dataset comprises 301 Medline citations, with miRNAs mentioned in the abstracts of all included documents. Entities of genes, diseases, and miRNAs were manually annotated. The dataset is split into training and test sets, which are sourced from 201 and 100 documents respectively. The core tasks of this dataset are Named Entity Recognition (NER) and Named Entity Disambiguation (NED). This is a monolingual dataset containing only English content, and it is publicly available and retrievable on PubMed. The dataset is licensed under CC BY-NC 3.0.
提供机构:
bigbio
原始信息汇总

数据集概述

基本信息

  • 名称: miRNA
  • 语言: 英语
  • 许可证: CC BY NC 3.0
  • 多语言支持: 单语(英语)
  • PubMed可用性: 是
  • 公开可用性: 是

数据集内容

  • 文档数量: 301篇Medline文献
  • 文件组成: 分为训练集(201篇文献)和测试集(100篇文献)
  • 标注内容: 手动标注了基因、疾病和miRNA实体

任务类型

  • 命名实体识别 (NER)
  • 命名实体消歧 (NED)

引用信息

@Article{Bagewadi2014, author={Bagewadi, Shweta and Bobi{{c}}, Tamara and Hofmann-Apitius, Martin and Fluck, Juliane and Klinger, Roman}, title={Detecting miRNA Mentions and Relations in Biomedical Literature}, journal={F1000Research}, year={2014}, month={Aug}, day={28}, publisher={F1000Research}, volume={3}, pages={205-205}, keywords={MicroRNAs; corpus; prediction algorithms}, abstract={ INTRODUCTION: MicroRNAs (miRNAs) have demonstrated their potential as post-transcriptional gene expression regulators, participating in a wide spectrum of regulatory events such as apoptosis, differentiation, and stress response. Apart from the role of miRNAs in normal physiology, their dysregulation is implicated in a vast array of diseases. Dissection of miRNA-related associations are valuable for contemplating their mechanism in diseases, leading to the discovery of novel miRNAs for disease prognosis, diagnosis, and therapy. MOTIVATION: Apart from databases and prediction tools, miRNA-related information is largely available as unstructured text. Manual retrieval of these associations can be labor-intensive due to steadily growing number of publications. Additionally, most of the published miRNA entity recognition methods are keyword based, further subjected to manual inspection for retrieval of relations. Despite the fact that several databases host miRNA-associations derived from text, lower sensitivity and lack of published details for miRNA entity recognition and associated relations identification has motivated the need for developing comprehensive methods that are freely available for the scientific community. Additionally, the lack of a standard corpus for miRNA-relations has caused difficulty in evaluating the available systems. We propose methods to automatically extract mentions of miRNAs, species, genes/proteins, disease, and relations from scientific literature. Our generated corpora, along with dictionaries, and miRNA regular expression are freely available for academic purposes. To our knowledge, these resources are the most comprehensive developed so far. RESULTS: The identification of specific miRNA mentions reaches a recall of 0.94 and precision of 0.93. Extraction of miRNA-disease and miRNA-gene relations lead to an F1 score of up to 0.76. A comparison of the information extracted by our approach to the databases miR2Disease and miRSel for the extraction of Alzheimers disease related relations shows the capability of our proposed methods in identifying correct relations with improved sensitivity. The published resources and described methods can help the researchers for maximal retrieval of miRNA-relations and generation of miRNA-regulatory networks. AVAILABILITY: The training and test corpora, annotation guidelines, developed dictionaries, and supplementary files are available at http://www.scai.fraunhofer.de/mirna-corpora.html. }, note={26535109[pmid]}, note={PMC4602280[pmcid]}, issn={2046-1402}, url={https://pubmed.ncbi.nlm.nih.gov/26535109}, language={eng} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作