openeurollm/Dolci-Think-SFT-7B-decontaminated
收藏Hugging Face2026-03-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/openeurollm/Dolci-Think-SFT-7B-decontaminated
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: dataset_source
dtype: string
- name: id
dtype: string
splits:
- name: train
num_bytes: 77877465405
num_examples: 2267351
download_size: 77877465405
dataset_size: 77877465405
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
decontamination:
source_dataset: allenai/Dolci-Think-SFT-7B
benchmarks:
- path: HuggingFaceH4/MATH-500
subset: default
split: test
- path: HuggingFaceH4/aime_2024
subset: default
split: train
- path: math-ai/aime25
subset: default
split: test
- path: math-ai/amc23
subset: default
split: test
- path: daman1209arora/jeebench
subset: default
split: test
- path: Idavidrein/gpqa
subset: gpqa_diamond
split: train
- path: ali-elganzory/livecodebench-code_generation_lite
subset: release_v6
split: test
- path: openai/openai_humaneval
subset: openai_humaneval
split: test
- path: google-research-datasets/mbpp
subset: full
split: train+test+validation+prompt
- path: google/IFEval
subset: default
split: train
- path: tatsu-lab/alpaca_eval
subset: alpaca_eval
split: eval
- path: lmarena-ai/arena-hard-auto
subset: default
split: train
contamination_stats:
- subset: default
split: train
total: 2268178
removed: 827
---
## Decontamination
This dataset is a decontaminated version of [allenai/Dolci-Think-SFT-7B](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-7B).
### Benchmarks used
- **MATH500**: `HuggingFaceH4/MATH-500` (subset=default, split=test)
- **AIME24**: `HuggingFaceH4/aime_2024` (subset=default, split=train)
- **AIME25**: `math-ai/aime25` (subset=default, split=test)
- **AMC23**: `math-ai/amc23` (subset=default, split=test)
- **JEEBench**: `daman1209arora/jeebench` (subset=default, split=test)
- **GPQADiamond**: `Idavidrein/gpqa` (subset=gpqa_diamond, split=train)
- **LiveCodeBench**: `ali-elganzory/livecodebench-code_generation_lite` (subset=release_v6, split=test)
- **HumanEval**: `openai/openai_humaneval` (subset=openai_humaneval, split=test)
- **MBPP**: `google-research-datasets/mbpp` (subset=full, split=train+test+validation+prompt)
- **IFEval**: `google/IFEval` (subset=default, split=train)
- **AlpacaEval**: `tatsu-lab/alpaca_eval` (subset=alpaca_eval, split=eval)
- **Arena-Hard-v2.0**: `lmarena-ai/arena-hard-auto` (subset=default, split=train) (data_files=['data/arena-hard-v2.0/question.jsonl'])
### Decontamination settings
<table>
<thead>
<tr><th>Parameter</th><th>Value</th></tr>
</thead>
<tbody>
<tr><td>N-gram size</td><td>8</td></tr>
<tr><td>Match threshold</td><td>0.5</td></tr>
</tbody>
</table>
### Split and benchmark details
<table>
<thead>
<tr>
<th>Subset</th>
<th>Split</th>
<th>Docs in split (dataset)</th>
<th>Benchmark</th>
<th>Contaminated (dataset)</th>
<th>Contamination rate (dataset)</th>
<th>Docs (benchmark)</th>
<th>Contaminated (benchmark)</th>
<th>Contamination rate (benchmark)</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="12">default</td>
<td rowspan="12">train</td>
<td rowspan="12">2,368,984</td>
<td>MATH500</td>
<td>304</td>
<td>0.0128%</td>
<td>500</td>
<td>65</td>
<td>13.00%</td>
</tr>
<tr>
<td>AIME24</td>
<td>0</td>
<td>0.0000%</td>
<td>30</td>
<td>0</td>
<td>0.0000%</td>
</tr>
<tr>
<td>AIME25</td>
<td>0</td>
<td>0.0000%</td>
<td>30</td>
<td>0</td>
<td>0.0000%</td>
</tr>
<tr>
<td>AMC23</td>
<td>16</td>
<td>0.0007%</td>
<td>40</td>
<td>3</td>
<td>7.50%</td>
</tr>
<tr>
<td>JEEBench</td>
<td>0</td>
<td>0.0000%</td>
<td>515</td>
<td>0</td>
<td>0.0000%</td>
</tr>
<tr>
<td>GPQADiamond</td>
<td>0</td>
<td>0.0000%</td>
<td>198</td>
<td>0</td>
<td>0.0000%</td>
</tr>
<tr>
<td>LiveCodeBench</td>
<td>36</td>
<td>0.0015%</td>
<td>1055</td>
<td>10</td>
<td>0.9479%</td>
</tr>
<tr>
<td>HumanEval</td>
<td>22</td>
<td>0.0009%</td>
<td>164</td>
<td>4</td>
<td>2.44%</td>
</tr>
<tr>
<td>MBPP</td>
<td>311</td>
<td>0.0131%</td>
<td>974</td>
<td>121</td>
<td>12.42%</td>
</tr>
<tr>
<td>IFEval</td>
<td>32</td>
<td>0.0014%</td>
<td>541</td>
<td>15</td>
<td>2.77%</td>
</tr>
<tr>
<td>AlpacaEval</td>
<td>82</td>
<td>0.0035%</td>
<td>805</td>
<td>29</td>
<td>3.60%</td>
</tr>
<tr>
<td>Arena-Hard-v2.0</td>
<td>24</td>
<td>0.0010%</td>
<td>750</td>
<td>6</td>
<td>0.8000%</td>
</tr>
</tbody>
</table>
### Dataset summary
<table>
<thead>
<tr><th>Metric</th><th>Value</th></tr>
</thead>
<tbody>
<tr><td>Total documents in dataset</td><td>2,268,178</td></tr>
<tr><td>Contaminated documents (removed)</td><td>827</td></tr>
<tr><td>Documents after decontamination</td><td>2,267,351</td></tr>
<tr><td>Contamination rate (dataset)</td><td>0.0365%</td></tr>
</tbody>
</table>
---
# Dolci-Think-SFT
Sources include a mixture of existing reasoning traces:
* [OpenThoughts 3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M) (Apache 2.0): Extended to 32K context length and downsampled code prompts to 16X multiple, to 941,166 total prompts. Access our version, Dolci OpenThoughts 3 here.
* [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified) (Apache 2.0) via the SFT-Verified split, 104,569 prompts.
* [Nemotron Post-training dataset](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) (CC BY 4), code split only, 113,777 prompts.
New prompts and new reasoning traces from us (all ODC-BY-1.0):
* Dolci Think Persona IF: New precise instruction following prompts and traces created with [Nvidia's Nemotron Post-training Personas](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA). 223,123 prompts.
* Dolci Precise IF: New multi-constraint instruction following data building off Pyatkin, Valentina, et al. "[Generalizing Verifiable Instruction Following](https://arxiv.org/abs/2507.02833)." (2025). 135,792 prompts.
* [Dolci Think Python](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python): 466,677 prompts (subsampled from larger mix).
Existing prompts with new reasoning traces, largely repurposed from Tülu 3 / OLMo 2, with new traces generated by a mix of DeepSeek R1 and DeepSeek R1 0528:
* [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M) (ODC-BY-1.0), 83,054 prompts.
* [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1) (Apache 2.0), 6,800 prompts.
* [CoCoNot](https://huggingface.co/datasets/allenai/coconot) (ODC-BY-1.0), 10,227 prompts.
* [WildGuardMix ](https://huggingface.co/datasets/allenai/wildguardmix) (Apache 2.0), 38,315 prompts.
* [WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak) (ODC-BY-1.0) 41,100 prompts.
* [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset) (Apache 2.0), 98,597 prompts.
* [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT) (MIT), 4,981 prompts.
* Olmo Identity Prompts, 58 samples (we trained with 290, 5 repetitions per prompt, uploaded single repetition to HuggingFace)
The counts are smaller than the original prompt sources pulled from Tülu 3 / OLMo 2 due to more extensive filtering for data quality and by topics within the Azure API (blocked requests).
This dataset was used for 7B post-training, the [7B dataset](https://huggingface.co/datasets/allenai/Dolci-Think-SFT) is slightly different.
## Dataset Structure
Each example in the dataset contains the standard instruction-tuning data points as follow:
- `id` (str): a unique identifier
- `messages` (list): message format used for supervised fine-tuning (this contains user prompt and assistant responses)
- `source` (str): the source dataset for the given sample
Every datapoint contains the model's reasoning in `<think>...</think>` and NO `<answer>...</answer>` tags -- the answer follows directly after `</think>`.
## Model Family
| **Stage** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** |
|--------------------------|-----------------------|------------------------|---------------------------|
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
| **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) |
| **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) |
| **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) |
## License
This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
## Citation
```
@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}
```
数据集信息:
特征字段:
- 名称:messages,为列表类型,包含:
- 名称:content,数据类型为字符串
- 名称:role,数据类型为字符串
- 名称:dataset_source,数据类型为字符串
- 名称:id,数据类型为字符串
数据集划分:
- 名称:train,字节数:77877465405,样本数:2267351
下载大小:77877465405
数据集占用大小:77877465405
配置项:
- 配置名称:default,数据文件:
- 划分集:train,路径:data/train-*
数据去污染:
源数据集:allenai/Dolci-Think-SFT-7B
基准数据集列表:
- 路径:HuggingFaceH4/MATH-500,子集:default,划分集:test
- 路径:HuggingFaceH4/aime_2024,子集:default,划分集:train
- 路径:math-ai/aime25,子集:default,划分集:test
- 路径:math-ai/amc23,子集:default,划分集:test
- 路径:daman1209arora/jeebench,子集:default,划分集:test
- 路径:Idavidrein/gpqa,子集:gpqa_diamond,划分集:train
- 路径:ali-elganzory/livecodebench-code_generation_lite,子集:release_v6,划分集:test
- 路径:openai/openai_humaneval,子集:openai_humaneval,划分集:test
- 路径:google-research-datasets/mbpp,子集:full,划分集:train+test+validation+prompt
- 路径:google/IFEval,子集:default,划分集:train
- 路径:tatsu-lab/alpaca_eval,子集:alpaca_eval,划分集:eval
- 路径:lmarena-ai/arena-hard-auto,子集:default,划分集:train
污染统计:
- 子集:default,划分集:train,总样本数:2268178,移除样本数:827
---
## 数据去污染
本数据集是[allenai/Dolci-Think-SFT-7B](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-7B)的去污染版本。
### 所用基准数据集
- **MATH500**:`HuggingFaceH4/MATH-500`(子集=default,划分集=test)
- **AIME24**:`HuggingFaceH4/aime_2024`(子集=default,划分集=train)
- **AIME25**:`math-ai/aime25`(子集=default,划分集=test)
- **AMC23**:`math-ai/amc23`(子集=default,划分集=test)
- **JEEBench**:`daman1209arora/jeebench`(子集=default,划分集=test)
- **GPQADiamond**:`Idavidrein/gpqa`(子集=gpqa_diamond,划分集=train)
- **LiveCodeBench**:`ali-elganzory/livecodebench-code_generation_lite`(子集=release_v6,划分集=test)
- **HumanEval**:`openai/openai_humaneval`(子集=openai_humaneval,划分集=test)
- **MBPP**:`google-research-datasets/mbpp`(子集=full,划分集=train+test+validation+prompt)
- **IFEval**:`google/IFEval`(子集=default,划分集=train)
- **AlpacaEval**:`tatsu-lab/alpaca_eval`(子集=alpaca_eval,划分集=eval)
- **Arena-Hard-v2.0**:`lmarena-ai/arena-hard-auto`(子集=default,划分集=train)(数据文件=['data/arena-hard-v2.0/question.jsonl'])
### 去污染设置
| 参数 | 数值 |
|------|------|
| N-gram 大小 | 8 |
| 匹配阈值 | 0.5 |
### 划分与基准数据集详情
| 子集 | 划分集 | 划分集中的文档数(数据集侧) | 基准数据集 | 污染样本数(数据集侧) | 污染率(数据集侧) | 基准数据集中的文档数 | 污染样本数(基准数据集侧) | 污染率(基准数据集侧) |
|------|--------|----------------------------|------------|----------------------|--------------------|--------------------|--------------------------|------------------------|
| default | train | 2,368,984 | MATH500 | 304 | 0.0128% | 500 | 65 | 13.00% |
| default | train | 2,368,984 | AIME24 | 0 | 0.0000% | 30 | 0 | 0.0000% |
| default | train | 2,368,984 | AIME25 | 0 | 0.0000% | 30 | 0 | 0.0000% |
| default | train | 2,368,984 | AMC23 | 16 | 0.0007% | 40 | 3 | 7.50% |
| default | train | 2,368,984 | JEEBench | 0 | 0.0000% | 515 | 0 | 0.0000% |
| default | train | 2,368,984 | GPQADiamond | 0 | 0.0000% | 198 | 0 | 0.0000% |
| default | train | 2,368,984 | LiveCodeBench | 36 | 0.0015% | 1055 | 10 | 0.9479% |
| default | train | 2,368,984 | HumanEval | 22 | 0.0009% | 164 | 4 | 2.44% |
| default | train | 2,368,984 | MBPP | 311 | 0.0131% | 974 | 121 | 12.42% |
| default | train | 2,368,984 | IFEval | 32 | 0.0014% | 541 | 15 | 2.77% |
| default | train | 2,368,984 | AlpacaEval | 82 | 0.0035% | 805 | 29 | 3.60% |
| default | train | 2,368,984 | Arena-Hard-v2.0 | 24 | 0.0010% | 750 | 6 | 0.8000% |
### 数据集概览
| 指标 | 数值 |
|------|------|
| 数据集总文档数 | 2,268,178 |
| 已移除的污染文档数 | 827 |
| 去污染后剩余文档数 | 2,267,351 |
| 数据集整体污染率 | 0.0365% |
---
# Dolci-Think-SFT
数据源包含多种现有推理轨迹:
* [OpenThoughts 3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)(遵循Apache 2.0协议):将上下文长度拓展至32K,并将代码提示按16倍比例下采样,最终得到941,166条提示样本。可在此获取我们的Dolci OpenThoughts 3版本。
* [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified)(遵循Apache 2.0协议):通过SFT-Verified划分集获取,共104,569条提示样本。
* [Nemotron Post-training dataset](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1)(遵循CC BY 4协议):仅包含代码划分集,共113,777条提示样本。
由我们原创的提示样本与推理轨迹(均遵循ODC-BY-1.0协议):
* Dolci Think Persona IF:基于[Nvidia的Nemotron Post-training Personas](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA)构建的高精度指令跟随提示样本与推理轨迹,共223,123条提示。
* Dolci Precise IF:基于Pyatkin, Valentina, 等人发表的《Generalizing Verifiable Instruction Following》(2025)构建的多约束指令跟随数据集,共135,792条提示样本。
* [Dolci Think Python](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python):共466,677条提示样本(从更大规模的混合样本集中下采样得到)。
基于现有提示样本新增推理轨迹,主要源自Tülu 3 / OLMo 2,其新增推理轨迹由DeepSeek R1与DeepSeek R1 0528联合生成:
* [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M)(遵循ODC-BY-1.0协议),共83,054条提示样本。
* [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1)(遵循Apache 2.0协议),共6,800条提示样本。
* [CoCoNot](https://huggingface.co/datasets/allenai/coconot)(遵循ODC-BY-1.0协议),共10,227条提示样本。
* [WildGuardMix](https://huggingface.co/datasets/allenai/wildguardmix)(遵循Apache 2.0协议),共38,315条提示样本。
* [WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak)(遵循ODC-BY-1.0协议),共41,100条提示样本。
* [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset)(遵循Apache 2.0协议),共98,597条提示样本。
* [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT)(遵循MIT协议),共4,981条提示样本。
* Olmo Identity Prompts,共58条样本(我们训练时使用了290条,每条提示重复5次,仅上传单次重复版本至HuggingFace)。
由于针对数据质量与Azure API(已拦截的请求)中的主题进行了更严格的过滤,本数据集的样本数量小于最初从Tülu 3 / OLMo 2中提取的原始提示源样本数量。
本数据集用于7B模型的后训练,[7B数据集](https://huggingface.co/datasets/allenai/Dolci-Think-SFT)略有差异。
## 数据集结构
数据集中的每条样本均包含标准的监督微调指令调优数据格式,具体如下:
- `id`(字符串类型):唯一标识符
- `messages`(列表类型):用于监督微调的对话消息格式(包含用户提示与助手回复)
- `source`(字符串类型):当前样本所属的源数据集
每条样本均包含模型的推理过程,封装于`<think>...</think>`标签内,且无`<answer>...</answer>`标签,答案紧随`</think>`标签之后。
## 模型家族
| **阶段** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** |
|--------------------------|-----------------------|------------------------|---------------------------|
| **基础模型** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
| **监督微调(SFT)** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) |
| **直接偏好优化(DPO)** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) |
| **最终模型(RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) |
## 许可证
本数据集遵循ODC-BY协议进行授权,仅可用于研究与教育用途,并需遵守Ai2的[负责任使用指南](https://allenai.org/responsible-use)。
## 引用格式
@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}
提供机构:
openeurollm



