bigbio/neurotrial_ner

Name: bigbio/neurotrial_ner
Creator: bigbio
Published: 2024-12-06 09:46:22
License: 暂无描述

Hugging Face2024-12-06 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/bigbio/neurotrial_ner

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en bigbio_language: - English license: cc0-1.0 bigbio_license_shortname: CC0_1p0 multilinguality: monolingual pretty_name: NeuroTrialNer homepage: https://github.com/Ineichen-Group/NeuroTrialNER/tree/main bigbio_pubmed: false bigbio_public: true bigbio_tasks: - NAMED_ENTITY_RECOGNITION --- # Dataset Card for NeuroTrialNer ## Dataset Description - **Homepage:** https://github.com/Ineichen-Group/NeuroTrialNER/tree/main - **Pubmed:** False - **Public:** True - **Tasks:** NER NeuoTrialNER is an annotated dataset for named entities in clinical trial registry data in the domain of neurology/psychiatry. The corpus comprises 1093 clinical trial title and brief summaries from ClinicalTrials.gov. It has been annotated by two to three annotators for key trial characteristics, i.e., condition (e.g., Alzheimer's disease), therapeutic intervention (e.g., aspirin), and control arms (e.g., placebo). ## Citation Information ``` @inproceedings{doneva-etal-2024-neurotrialner, title = "{N}euro{T}rial{NER}: An Annotated Corpus for Neurological Diseases and Therapies in Clinical Trial Registries", author = "Doneva, Simona Emilova and Ellendorff, Tilia and Sick, Beate and Goldman, Jean-Philippe and Cannon, Amelia Elaine and Schneider, Gerold and Ineichen, Benjamin Victor", editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-main.1050", pages = "18868--18890", abstract = "Extracting and aggregating information from clinical trial registries could provide invaluable insights into the drug development landscape and advance the treatment of neurologic diseases. However, achieving this at scale is hampered by the volume of available data and the lack of an annotated corpus to assist in the development of automation tools. Thus, we introduce NeuroTrialNER, a new and fully open corpus for named entity recognition (NER). It comprises 1093 clinical trial summaries sourced from ClinicalTrials.gov, annotated for neurological diseases, therapeutic interventions, and control treatments. We describe our data collection process and the corpus in detail. We demonstrate its utility for NER using large language models and achieve a close-to-human performance. By bridging the gap in data resources, we hope to foster the development of text processing tools that help researchers navigate clinical trials data more easily.", } ```

--- 语言： - 英语 bigbio_language： - 英语许可协议：CC0 1.0 bigbio_license_shortname：CC0_1p0 多语言属性：单语正式展示名称：NeuroTrialNER 官方主页：https://github.com/Ineichen-Group/NeuroTrialNER/tree/main bigbio_pubmed：否 bigbio_public：是 bigbio_tasks： - 命名实体识别（Named Entity Recognition，NER） --- # NeuroTrialNER 数据集卡片 ## 数据集概述 - **官方主页：** https://github.com/Ineichen-Group/NeuroTrialNER/tree/main - **PubMed关联：** 否 - **公开状态：** 是 - **任务：** 命名实体识别（NER） NeuroTrialNER是一款面向神经学/精神病学领域临床试验注册数据的命名实体标注数据集。该语料库包含来自ClinicalTrials.gov的1093条临床试验标题与简短摘要。该数据集由2至3名标注人员针对临床试验的关键特征完成标注，具体涵盖病症（如阿尔茨海默病）、治疗干预手段（如阿司匹林）以及对照组（如安慰剂）。 ## 引用信息 @inproceedings{doneva-etal-2024-neurotrialner, title = "{N}euro{T}rial{NER}：面向临床试验注册库中神经系统疾病与治疗方案的标注语料库", author = "Doneva, Simona Emilova and Ellendorff, Tilia and Sick, Beate and Goldman, Jean-Philippe and Cannon, Amelia Elaine and Schneider, Gerold and Ineichen, Benjamin Victor", editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung", booktitle = "2024年自然语言处理经验方法会议论文集", month = "11月", year = "2024", address = "美国佛罗里达州迈阿密", publisher = "国际计算语言学协会", url = "https://aclanthology.org/2024/emnlp-main.1050", pages = "18868--18890", abstract = "从临床试验注册库中提取并聚合信息，可为药物开发格局提供宝贵洞察，推动神经系统疾病治疗手段的进步。然而，受限于现有数据体量庞大，且缺乏可助力自动化工具开发的标注语料库，大规模实现这一目标仍存在阻碍。为此，我们推出NeuroTrialNER——一款全新的全开源命名实体识别（NER）语料库。该语料库包含从ClinicalTrials.gov获取的1093条临床试验摘要，针对神经系统疾病、治疗干预手段与对照治疗方案进行了标注。我们详细描述了数据收集流程与语料库本身，并展示了其在NER任务中的应用价值：借助大语言模型（Large Language Model，LLM），我们实现了接近人类的性能表现。我们希望通过填补数据资源缺口，助力文本处理工具的开发，帮助研究人员更便捷地查阅与利用临床试验数据。", }

提供机构：

bigbio

5,000+

优质数据集

54 个

任务类型

进入经典数据集