m23k-tokenized

Name: m23k-tokenized
Creator: maas
Published: 2025-12-05 12:04:52
License: 暂无描述

魔搭社区2025-12-05 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/UCSC-VLAA/m23k-tokenized

下载链接

链接失效反馈

官方服务：

资源简介：

<div align="center"> <h1> <b>m1</b>: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models </h1> <p> A simple test-time scaling strategy, with minimal fine-tuning, can unlock strong medical reasoning within large language models. </p> </div> ## ⚡ Introduction Hi! Welcome to the huggingface repository for m1 (https://github.com/UCSC-VLAA/m1)! **m1** is a medical LLM designed to enhance reasoning through efficient test-time scaling. It enables lightweight models to match or exceed the performance of much larger counterparts by extending inference-time “thinking.” Unlike methods that rely on complex RL or expert supervision, m1 achieves strong results through: - **Fine-tuning on a small, high-quality set of verified medical reasoning examples**, showing that even with just 1K–23K examples, m1-7B *surpasses* models like HuatuoGPT-o1-7B and UltraMedical-8B, and m1-32B *rivals* 70B-scale models. - **Scaling reasoning at inference using token budgets**, which consistently improves performance across medical QA tasks—up to an optimal ~4K token budget, beyond which performance may degrade due to overthinking. - **Identifying medical knowledge as the key bottleneck**, revealing that additional reasoning alone cannot overcome knowledge gaps; instead, improvements require better data quality and increased model capacity. Paper: https://huggingface.co/papers/2504.00869 Code: https://github.com/UCSC-VLAA/m1

**m1**：释放大语言模型（Large Language Model）医疗推理的测试时缩放潜力仅需极少量微调的极简测试时缩放（Test-Time Scaling）策略，即可激发大语言模型出色的医疗推理能力。 ## ⚡ 引言欢迎访问m1的Hugging Face仓库（https://github.com/UCSC-VLAA/m1）！ **m1**是一款旨在通过高效测试时缩放提升推理能力的医疗大语言模型（Large Language Model, LLM）。它可通过延长推理阶段的“思考”过程，让轻量级模型比肩甚至超越规模更大的同类模型。与依赖复杂强化学习（Reinforcement Learning, RL）或专家监督的方法不同，m1通过以下方式实现出色性能： - **基于少量经过验证的高质量医疗推理样本进行微调（Fine-tuning）**：仅需1000至23000条样本，m1-7B模型即可超越HuatuoGPT-o1-7B、UltraMedical-8B等同类模型，而m1-32B模型则可与70B规模的模型相媲美。 - **基于Token（Token）预算在推理阶段扩展推理过程**：该方法可持续提升医疗问答（Question Answering, QA）任务的性能，最优Token预算约为4000，超过该阈值后，模型可能因过度思考导致性能下降。 - **将医疗知识确定为核心瓶颈**：研究表明，仅靠额外的推理过程无法弥补知识缺口，性能提升需要更优质的数据与更大的模型容量（Model Capacity）。论文链接：https://huggingface.co/papers/2504.00869 代码链接：https://github.com/UCSC-VLAA/m1

提供机构：

maas

创建时间：

2025-04-21

搜集汇总

数据集介绍