TsinghuaC3I/UltraMedical
收藏UltraMedical 数据集概述
数据集描述
UltraMedical 是一个大规模、高质量的生物医学指令数据集,包含 410,000 个合成和人工精选样本。该数据集的构建遵循多样性和复杂性的原则。
数据集统计信息
以下表格展示了 UltraMedical 数据集中各个数据集的统计信息,其中标记为 ★ 的数据集代表定制的合成数据,其他数据集则来自公开可用的数据。# Filtered 表示经过模型评分过滤后的数据大小,# Instructions 表示数据集的原始大小。
| 类别 | 合成数据 | 数据集名称 | # Instructions | 指令平均长度 | 指令平均评分 | # Filtered |
|---|---|---|---|---|---|---|
| 医学考试 | ✘ | MedQA | 10.2k | 128.94 ± 44.4 | 7.35 ± 0.98 | 9.3k |
| ✘ | MedMCQA | 183k | 23.12 ± 15.44 | 4.73 ± 2.14 | 59k | |
| ✔︎ | ★ MedQA-Evol | 51.8k | 76.52 ± 24.97 | 8.07 ± 0.9 | 51.8k | |
| ✔︎ | ★ TextBookQA | 91.7k | 75.92 ± 25.77 | 7.72 ± 0.79 | 91.7k | |
| 文献 | ✘ | PubMedQA | 211k | 218.2 ± 51.01 | 7.95 ± 1.08 | 88.7k |
| 开放式 | ✘ | ChatDoctor | 100k | 98.93 ± 50.81 | 6.83 ± 2.16 | 31.1k |
| ✘ | MedQuad | 47k | 8.21 ± 2.38 | 4.54 ± 2.43 | 6k | |
| ✔︎ | MedInstruct-52k | 52k | 36.05 ± 22.96 | 5.25 ± 2.16 | 23k | |
| ✔︎ | Medical-Instruction-120k | 120k | 84.93 ± 50.85 | 5.36 ± 3.18 | 25k | |
| ✔︎ | ★ WikiInstruct | 23k | 46.73 ± 11.1 | 8.8 ± 0.52 | 23k | |
| 混合 | Mixed | ☆ UltraMedical | 410k | 101.63 ± 79.39 | 8.2 ± 0.96 | 410k |
数据集结构
UltraMedical 数据集的结构如下:
id:结合数据集名称和原始数据集中的顺序号。conversations:第一个元素包含由GPT-4生成的人工或合成问题,第二个元素包含由GPT-4回答的响应。answer:从原始数据改编而来。score:由gpt-3.5-turbo根据输入指令进行的评估。
示例
json { "id": "TextBookQA,Anatomy_Gray_28", "type": "mc", "conversations": [ { "value": "For a 55-year-old male with chronic obstructive pulmonary disease in acute distress and unable to maintain an upright position, which radiographic technique is optimal, and how should the potential magnification of the cardiac silhouette be factored into the interpretation?
A. AP chest radiograph, without differentiating it from PA radiograph in assessing heart size B. Lateral chest radiograph, ignoring the significance of the patients posture during film placement C. AP chest radiograph, considering potential cardiac silhouette magnification in analysis D. PA chest radiograph, with no adjustment for the patients inability to stand", "from": "human" }, { "value": "For a patient in acute distress who is unable to maintain an upright position, the most practical and commonly used radiographic technique is the anteroposterior (AP) chest radiograph. This is because the patient may not be able to stand for a posteroanterior (PA) chest radiograph, which is the standard view for chest imaging when the patient is able to stand upright and take a deep breath.
The PA view is preferred under normal circumstances because it minimizes the magnification of the heart. In a PA chest radiograph, the heart is closer to the image receptor, which reduces the enlargement of the cardiac silhouette that can occur due to the divergent nature of x-rays. However, in an AP chest radiograph, which is taken from the front, the heart is further away from the image receptor, and the x-rays have to travel a greater distance from the source to the film, which results in some magnification of the heart.
When interpreting an AP chest radiograph, especially in the context of a patient who is unable to sit or stand upright, it is essential to factor in the potential magnification of the cardiac silhouette. This magnification can lead to the appearance of an enlarged heart, which could be misinterpreted as cardiomegaly if the radiograph is not carefully analyzed with this consideration in mind.
Therefore, the optimal choice for a 55-year-old male with chronic obstructive pulmonary disease (COPD) in acute distress who cannot maintain an upright position is:
C. AP chest radiograph, considering potential cardiac silhouette magnification in analysis
This choice acknowledges the practical limitations of the patients condition and the importance of interpreting the radiograph with an understanding of the potential for magnification of the heart.
So, the answer is C.", "from": "gpt" } ], "answer": "C", "score": "This question would be rated as a 9. It requires a deep understanding of radiographic techniques, the impact of patient positioning on imaging, and the interpretation of cardiac silhouette magnification. Additionally, it involves technical terminology related to radiography and medical conditions." }



