five

med-synth-questions-qwen3-235b-a22b-2507

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/openmed-community/med-synth-questions-qwen3-235b-a22b-2507
下载链接
链接失效反馈
官方服务:
资源简介:
# openmed-community/med-synth-questions-qwen3-235b-a22b-2507 ## What is this? **Med Synth Questions — Qwen3-235B-A22B-2507** is an instruction-only dataset of **104,335 English medical questions** generated with [Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) via OpenRouter. Questions were created from **proprietary medical documents authored by physicians** in the **Medcases** application. The dataset contains **questions only**-no source passages or proprietary text are included. - **Split:** `train` (104,335 rows) - **Schema:** `{'input': str, 'generation_settings': dict, 'timestamp': str}` - **License:** **CC0-1.0** (public-domain dedication) - **Provenance note:** Source materials are proprietary to MedIT Solutions / Medcases; they are *not* redistributed here. --- ## Dataset structure ```json DatasetDict({ train: Dataset({ features: ['input', 'generation_settings', 'timestamp'], num_rows: 104335 }) }) ```` **Features** - `input` *(string)* — a single, self-contained medical question. - `generation_settings` *(dict)* — structured metadata typically including: - `model` (e.g., `"qwen/qwen3-235b-a22b-2507"`), - `provider` (e.g., `"openrouter"`), - request parameters (e.g., `max_tokens`, `num_questions_requested`, `num_questions_generated`). - `timestamp` *(string)* — ISO-8601 creation time. **Example** ```json { "input": "Hey, can you walk me through how the patient’s smoking history played into the diagnosis of a palate tumor?", "generation_settings": { "max_tokens": 4096, "model": "qwen/qwen3-235b-a22b-2507", "num_questions_generated": 5, "num_questions_requested": 5, "provider": "openrouter" }, "timestamp": "2025-08-17T18:43:27.659300" } ```` --- ## Intended uses * **Instruction-only fine-tuning scaffolds** (pair with your own answer-generation pipeline). * **RAG/eval** — as a bank of domain-specific queries for retrieval and QA evaluation. * **Question-generation research** — analyze prompt styles, difficulty, and topic coverage. ### Out-of-scope / caveats * **No answers** are provided; downstream users should generate or annotate answers. * Questions are derived from clinician-authored materials but may reflect **biases, gaps, or outdated info**; validate before use. * **Not medical advice.** Do not use for clinical decision-making. --- ## How to load ```python from datasets import load_dataset ds = load_dataset("openmed-community/med-synth-questions-qwen3-235b-a22b-2507", split="train") row = ds[0] print(row["input"]) print(row["generation_settings"]) print(row["timestamp"]) ``` --- ## Licensing & responsible use * **Dataset license:** **CC0-1.0** (public-domain dedication). Downstream users may copy, modify, and redistribute. Please acknowledge the source when feasible. * **Provenance:** Underlying *source* documents are proprietary to MedIT Solutions / Medcases and are **not** included. * **Model & provider terms:** Questions were generated with **Qwen3** served via **OpenRouter**. This dataset itself does not grant additional rights to model weights or hosted endpoints. --- ## Provenance & credit * **Source environment:** [Medcases.io](https://medcases.io) (virtual-patient / medical-education platform) by [MedIT Solutions](https://meditsolutions.pl). * **Generator model:** `Qwen/Qwen3-235B-A22B-Instruct-2507` via OpenRouter. * **Curation:** openmed-community. --- ## Changelog * **2025-08-17** — Initial release (`train`, 104,335 questions). --- ## Disclaimer This resource is provided **for research and educational use**. It is **not** a source of medical advice. Always follow relevant laws, ethics, platform/model terms, and institutional review requirements. Use responsibly. --- ## Reproduce To reproduce or adapt the pipeline, see our open-source [Synthetic Questions Generation tool](https://github.com/mkurman/synthetic-questions-generation)

# openmed-community/med-synth-questions-qwen3-235b-a22b-2507 ## 本数据集是什么? **Med Synth Questions — Qwen3-235B-A22B-2507** 是一个仅包含指令的数据集,涵盖104335条英文医疗问题,由[Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507)通过OpenRouter平台生成。 本数据集的问题源自Medcases应用中由医师撰写的专有医疗文档,且仅包含问题本身,未附带源文本段落或专有内容。 - **数据集拆分:** `train`(共104335条数据行) - **数据结构(Schema):** `{'input': str, 'generation_settings': dict, 'timestamp': str}` - **授权协议:** **CC0-1.0**(公共领域贡献协议) - **来源说明:** 源素材归MedIT Solutions / Medcases所有,本数据集未附带此类源素材。 --- ## 数据集结构 json DatasetDict({ train: Dataset({ features: ['input', 'generation_settings', 'timestamp'], num_rows: 104335 }) }) ## 字段说明 - `input`(字符串类型):单条独立完整的医疗问题。 - `generation_settings`(字典类型):结构化元数据,通常包含以下内容: - `model`(例如:`"qwen/qwen3-235b-a22b-2507"`):生成所用的模型 - `provider`(例如:`"openrouter"`):模型服务提供商 - 请求参数(例如:`max_tokens`、`num_questions_requested`、`num_questions_generated`) - `timestamp`(字符串类型):ISO-8601格式的创建时间戳。 ## 示例 json { "input": "Hey, can you walk me through how the patient’s smoking history played into the diagnosis of a palate tumor?", "generation_settings": { "max_tokens": 4096, "model": "qwen/qwen3-235b-a22b-2507", "num_questions_generated": 5, "num_questions_requested": 5, "provider": "openrouter" }, "timestamp": "2025-08-17T18:43:27.659300" } --- ## 预期用途 1. **仅用于指令微调基座**(可搭配自定义的答案生成流水线使用)。 2. **检索增强生成(Retrieval-Augmented Generation,简称RAG)与模型评估**:作为领域专属查询库,用于检索任务与问答系统评估。 3. **问题生成研究**:用于分析提示词风格、问题难度与主题覆盖范围。 ### 适用范围限制与注意事项 1. **本数据集未附带答案**,下游使用者需自行生成或标注答案。 2. 问题源自临床医师撰写的素材,但可能存在偏差、信息缺口或过时内容,使用前请自行验证。 3. **本数据集不提供医疗建议**,不得用于临床决策。 --- ## 加载方式 python from datasets import load_dataset ds = load_dataset("openmed-community/med-synth-questions-qwen3-235b-a22b-2507", split="train") row = ds[0] print(row["input"]) print(row["generation_settings"]) print(row["timestamp"]) --- ## 授权协议与合规使用 1. **数据集授权:** **CC0-1.0**(公共领域贡献协议)。下游使用者可复制、修改并重新分发本数据集,若可行请注明原来源。 2. **来源说明:** 底层源文档归MedIT Solutions / Medcases所有,未随本数据集一同发布。 3. **模型与服务条款:** 本数据集的问题由通过OpenRouter部署的**Qwen3**模型生成,本数据集本身不赋予任何针对模型权重或托管端点的额外权利。 --- ## 来源与致谢 1. **来源平台:** [Medcases.io](https://medcases.io)(虚拟患者/医学教育平台),由[MedIT Solutions](https://meditsolutions.pl)开发。 2. **生成模型:** 通过OpenRouter调用的`Qwen/Qwen3-235B-A22B-Instruct-2507`。 3. **数据集整理:** openmed-community。 --- ## 更新日志 * **2025-08-17**:首次发布(仅包含`train`拆分,共104335条问题)。 --- ## 免责声明 本资源仅用于**研究与教育用途**,不构成医疗建议。请始终遵守相关法律法规、伦理准则、平台与模型服务条款以及机构审查要求,合规且负责任地使用本数据集。 --- ## 复现方法 若需复现或改造本数据集生成流程,请参考我们开源的[合成问题生成工具](https://github.com/mkurman/synthetic-questions-generation)
提供机构:
maas
创建时间:
2025-09-03
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个包含104,335条英文医学问题的指令数据集,由Qwen3-235B模型通过OpenRouter生成,问题基于Medcases平台的医生撰写材料。数据集仅包含问题,不提供答案或源文本,采用CC0-1.0许可,适用于指令微调、检索评估和问题生成研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作