araag2/EBM_NLP

Name: araag2/EBM_NLP
Creator: araag2
Published: 2026-04-21 16:39:23
License: 暂无描述

Hugging Face2026-04-21 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/araag2/EBM_NLP

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: conversational features: - name: id dtype: string - name: prompt list: - name: role dtype: string - name: content dtype: string - name: completion list: - name: role dtype: string - name: content dtype: string - name: Label dtype: string splits: - name: train num_bytes: 36626198 num_examples: 13371 - name: test num_bytes: 1520013 num_examples: 552 download_size: 8896008 dataset_size: 38146211 - config_name: processed features: - name: id dtype: string - name: og_id dtype: string - name: PMID dtype: string - name: Category dtype: string - name: Instruction dtype: string - name: Context dtype: string - name: Label dtype: string - name: Spans sequence: string - name: Tagged_Spans list: - name: end_token dtype: int64 - name: label_id dtype: int64 - name: start_token dtype: int64 - name: text dtype: string splits: - name: train num_bytes: 31213858 num_examples: 13371 - name: test num_bytes: 1309571 num_examples: 552 download_size: 8093947 dataset_size: 32523429 - config_name: source features: - name: id dtype: string - name: PMID dtype: string - name: Split dtype: string - name: Quality dtype: string - name: Abstract dtype: string - name: Tokens sequence: string - name: Participants_Labels sequence: int64 - name: Interventions_Labels sequence: int64 - name: Outcomes_Labels sequence: int64 splits: - name: train num_bytes: 47544758 num_examples: 4457 - name: test num_bytes: 1961875 num_examples: 184 download_size: 7561664 dataset_size: 49506633 configs: - config_name: conversational data_files: - split: train path: conversational/train-* - split: test path: conversational/test-* - config_name: processed data_files: - split: train path: processed/train-* - split: test path: processed/test-* - config_name: source data_files: - split: train path: source/train-* - split: test path: source/test-* license: cc-by-sa-4.0 task_categories: - token-classification language: - en tags: - medical pretty_name: EBM_NLP size_categories: - 10K<n<100K --- # EBM-NLP ## Dataset Description | | Links | |:-------------------------------:|:-------------:| | **Homepage:** | [Huggingface](https://github.com/bepnye/EBM-NLP) | | **Original Repository:** | [Github](https://github.com/bepnye/EBM-NLP) | | **Paper:** | [arXiv](https://arxiv.org/abs/1806.04185) | | **Contact (Main Original Author):** | Benjamin Nye (nye.b@husky.neu.edu) | | **Contact (Curator):** | Artur Guimarães (artur.guimas@gmail.com) | ### Dataset Summary `We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the ‘PICO’ elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine.` ### Data Instances ``` { 'TO:DO': ..., ... } ``` ### Data Fields TO:DO ## Additional Information ### Dataset Curators #### Original Paper - Benjamin Nye (nye.b@husky.neu.edu) - Northeastern University - Junyi Jessy Li (jessy@austin.utexas.edu) - UT Austin - Roma Patel (romapatel996@gmail.com) - Rutgers University - Yinfei Yang (yangyin7@gmail.com) - No affiliation - Iain J. Marshall (iain.marshall@kcl.ac.uk) - King's College London - Ani Nenkova (nenkova@seas.upenn.edu) - UPenn - Byron C. Wallace (b.wallace@northeastern.edu) - Northeastern University #### Huggingface Curator - [Artur Guimarães](https://araag2.netlify.app/) (artur.guimas@gmail.com) - INESC-ID / University of Lisbon - Instituto Superior Técnico ### Licensing Information [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en) ### Citation Information ```bibtex @inproceedings{nye2018corpus, title={A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature}, author={Nye, Benjamin and Li, Junyi Jessy and Patel, Roma and Yang, Yinfei and Marshall, Iain and Nenkova, Ani and Wallace, Byron C}, booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, pages={197--207}, year={2018} } ``` ### Contributions Thanks to [araag2](https://github.com/araag2) for adding this dataset.

提供机构：

araag2

5,000+

优质数据集

54 个

任务类型

进入经典数据集