araag2/EBM_NLP
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/araag2/EBM_NLP
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: conversational
features:
- name: id
dtype: string
- name: prompt
list:
- name: role
dtype: string
- name: content
dtype: string
- name: completion
list:
- name: role
dtype: string
- name: content
dtype: string
- name: Label
dtype: string
splits:
- name: train
num_bytes: 36626198
num_examples: 13371
- name: test
num_bytes: 1520013
num_examples: 552
download_size: 8896008
dataset_size: 38146211
- config_name: processed
features:
- name: id
dtype: string
- name: og_id
dtype: string
- name: PMID
dtype: string
- name: Category
dtype: string
- name: Instruction
dtype: string
- name: Context
dtype: string
- name: Label
dtype: string
- name: Spans
sequence: string
- name: Tagged_Spans
list:
- name: end_token
dtype: int64
- name: label_id
dtype: int64
- name: start_token
dtype: int64
- name: text
dtype: string
splits:
- name: train
num_bytes: 31213858
num_examples: 13371
- name: test
num_bytes: 1309571
num_examples: 552
download_size: 8093947
dataset_size: 32523429
- config_name: source
features:
- name: id
dtype: string
- name: PMID
dtype: string
- name: Split
dtype: string
- name: Quality
dtype: string
- name: Abstract
dtype: string
- name: Tokens
sequence: string
- name: Participants_Labels
sequence: int64
- name: Interventions_Labels
sequence: int64
- name: Outcomes_Labels
sequence: int64
splits:
- name: train
num_bytes: 47544758
num_examples: 4457
- name: test
num_bytes: 1961875
num_examples: 184
download_size: 7561664
dataset_size: 49506633
configs:
- config_name: conversational
data_files:
- split: train
path: conversational/train-*
- split: test
path: conversational/test-*
- config_name: processed
data_files:
- split: train
path: processed/train-*
- split: test
path: processed/test-*
- config_name: source
data_files:
- split: train
path: source/train-*
- split: test
path: source/test-*
license: cc-by-sa-4.0
task_categories:
- token-classification
language:
- en
tags:
- medical
pretty_name: EBM_NLP
size_categories:
- 10K<n<100K
---
# EBM-NLP
## Dataset Description
| | Links |
|:-------------------------------:|:-------------:|
| **Homepage:** | [Huggingface](https://github.com/bepnye/EBM-NLP) |
| **Original Repository:** | [Github](https://github.com/bepnye/EBM-NLP) |
| **Paper:** | [arXiv](https://arxiv.org/abs/1806.04185) |
| **Contact (Main Original Author):** | Benjamin Nye (nye.b@husky.neu.edu) |
| **Contact (Curator):** | Artur Guimarães (artur.guimas@gmail.com) |
### Dataset Summary
`We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the ‘PICO’ elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine.`
### Data Instances
```
{
'TO:DO': ...,
...
}
```
### Data Fields
TO:DO
## Additional Information
### Dataset Curators
#### Original Paper
- Benjamin Nye (nye.b@husky.neu.edu) - Northeastern University
- Junyi Jessy Li (jessy@austin.utexas.edu) - UT Austin
- Roma Patel (romapatel996@gmail.com) - Rutgers University
- Yinfei Yang (yangyin7@gmail.com) - No affiliation
- Iain J. Marshall (iain.marshall@kcl.ac.uk) - King's College London
- Ani Nenkova (nenkova@seas.upenn.edu) - UPenn
- Byron C. Wallace (b.wallace@northeastern.edu) - Northeastern University
#### Huggingface Curator
- [Artur Guimarães](https://araag2.netlify.app/) (artur.guimas@gmail.com) - INESC-ID / University of Lisbon - Instituto Superior Técnico
### Licensing Information
[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
### Citation Information
```bibtex
@inproceedings{nye2018corpus,
title={A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature},
author={Nye, Benjamin and Li, Junyi Jessy and Patel, Roma and Yang, Yinfei and Marshall, Iain and Nenkova, Ani and Wallace, Byron C},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={197--207},
year={2018}
}
```
### Contributions
Thanks to [araag2](https://github.com/araag2) for adding this dataset.
提供机构:
araag2



