five

TsinghuaC3I/UltraMedical

收藏
Hugging Face2024-04-28 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/TsinghuaC3I/UltraMedical
下载链接
链接失效反馈
官方服务:
资源简介:
UltraMedical数据集是一个大规模、高质量的生物医学指令数据集,包含410,000个合成和手动整理的样本。数据集的构建遵循多样性和复杂性的原则。数据结构包括id、type、conversations、answer和score等字段。示例展示了数据集中的一个样本,说明了如何生成问题和回答,并提供了评分。
提供机构:
TsinghuaC3I
原始信息汇总

UltraMedical 数据集概述

数据集描述

UltraMedical 是一个大规模、高质量的生物医学指令数据集,包含 410,000 个合成和人工精选样本。该数据集的构建遵循多样性和复杂性的原则。

数据集统计信息

以下表格展示了 UltraMedical 数据集中各个数据集的统计信息,其中标记为 ★ 的数据集代表定制的合成数据,其他数据集则来自公开可用的数据。# Filtered 表示经过模型评分过滤后的数据大小,# Instructions 表示数据集的原始大小。

类别 合成数据 数据集名称 # Instructions 指令平均长度 指令平均评分 # Filtered
医学考试 MedQA 10.2k 128.94 ± 44.4 7.35 ± 0.98 9.3k
MedMCQA 183k 23.12 ± 15.44 4.73 ± 2.14 59k
✔︎ ★ MedQA-Evol 51.8k 76.52 ± 24.97 8.07 ± 0.9 51.8k
✔︎ ★ TextBookQA 91.7k 75.92 ± 25.77 7.72 ± 0.79 91.7k
文献 PubMedQA 211k 218.2 ± 51.01 7.95 ± 1.08 88.7k
开放式 ChatDoctor 100k 98.93 ± 50.81 6.83 ± 2.16 31.1k
MedQuad 47k 8.21 ± 2.38 4.54 ± 2.43 6k
✔︎ MedInstruct-52k 52k 36.05 ± 22.96 5.25 ± 2.16 23k
✔︎ Medical-Instruction-120k 120k 84.93 ± 50.85 5.36 ± 3.18 25k
✔︎ ★ WikiInstruct 23k 46.73 ± 11.1 8.8 ± 0.52 23k
混合 Mixed ☆ UltraMedical 410k 101.63 ± 79.39 8.2 ± 0.96 410k

数据集结构

UltraMedical 数据集的结构如下:

  • id:结合数据集名称和原始数据集中的顺序号。
  • conversations:第一个元素包含由 GPT-4 生成的人工或合成问题,第二个元素包含由 GPT-4 回答的响应。
  • answer:从原始数据改编而来。
  • score:由 gpt-3.5-turbo 根据输入指令进行的评估。

示例

json { "id": "TextBookQA,Anatomy_Gray_28", "type": "mc", "conversations": [ { "value": "For a 55-year-old male with chronic obstructive pulmonary disease in acute distress and unable to maintain an upright position, which radiographic technique is optimal, and how should the potential magnification of the cardiac silhouette be factored into the interpretation?

A. AP chest radiograph, without differentiating it from PA radiograph in assessing heart size B. Lateral chest radiograph, ignoring the significance of the patients posture during film placement C. AP chest radiograph, considering potential cardiac silhouette magnification in analysis D. PA chest radiograph, with no adjustment for the patients inability to stand", "from": "human" }, { "value": "For a patient in acute distress who is unable to maintain an upright position, the most practical and commonly used radiographic technique is the anteroposterior (AP) chest radiograph. This is because the patient may not be able to stand for a posteroanterior (PA) chest radiograph, which is the standard view for chest imaging when the patient is able to stand upright and take a deep breath.

The PA view is preferred under normal circumstances because it minimizes the magnification of the heart. In a PA chest radiograph, the heart is closer to the image receptor, which reduces the enlargement of the cardiac silhouette that can occur due to the divergent nature of x-rays. However, in an AP chest radiograph, which is taken from the front, the heart is further away from the image receptor, and the x-rays have to travel a greater distance from the source to the film, which results in some magnification of the heart.

When interpreting an AP chest radiograph, especially in the context of a patient who is unable to sit or stand upright, it is essential to factor in the potential magnification of the cardiac silhouette. This magnification can lead to the appearance of an enlarged heart, which could be misinterpreted as cardiomegaly if the radiograph is not carefully analyzed with this consideration in mind.

Therefore, the optimal choice for a 55-year-old male with chronic obstructive pulmonary disease (COPD) in acute distress who cannot maintain an upright position is:

C. AP chest radiograph, considering potential cardiac silhouette magnification in analysis

This choice acknowledges the practical limitations of the patients condition and the importance of interpreting the radiograph with an understanding of the potential for magnification of the heart.

So, the answer is C.", "from": "gpt" } ], "answer": "C", "score": "This question would be rated as a 9. It requires a deep understanding of radiographic techniques, the impact of patient positioning on imaging, and the interpretation of cardiac silhouette magnification. Additionally, it involves technical terminology related to radiography and medical conditions." }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作