TigerResearch/MedCT-clinical-notes
收藏Hugging Face2024-12-03 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/TigerResearch/MedCT-clinical-notes
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
A real-world clinical dataset in Chinese, for a variety of LLM-based applications:
* For the NER and NEL tasks, 7.4K real-world clinical notes in Chinese (medct_ner_notes.csv), and 61K entity mention annotations per MedCT graph (medct_ner_annotations.csv).
* For the search task, 20 clinical queries (medct_search_queries.csv), and 2K discharge notes with relevance annotations (medct_search_notes.csv).
* For the clinical notes summarization task, 91 raw discharge notes with summary by human, LLM and MedCT-augmented generations (medct_summary_notes.csv), with preference Likert-scale annotated by human physicians (medct_summary_scores.csv).
许可证:Apache-2.0
本数据集为面向各类基于大语言模型(LLM)应用的中文真实临床数据集:
* 针对命名实体识别(NER)与实体链接(NEL)任务,包含7400条中文真实临床病历文本(medct_ner_notes.csv),以及基于MedCT图谱的61000条实体提及标注数据(medct_ner_annotations.csv)。
* 针对搜索任务,包含20条临床查询语句(medct_search_queries.csv),以及带有相关性标注的2000份出院病历文本(medct_search_notes.csv)。
* 针对临床病历摘要生成任务,包含91份原始出院病历文本,分别配有人工、大语言模型(LLM)以及MedCT增强生成的摘要(medct_summary_notes.csv),并附带由临床医师采用李克特(Likert)量表进行偏好性标注的评分数据(medct_summary_scores.csv)。
提供机构:
TigerResearch



