OpenThoughts-114k-Code_decontaminated
收藏魔搭社区2026-01-06 更新2025-03-01 收录
下载链接:
https://modelscope.cn/datasets/open-r1/OpenThoughts-114k-Code_decontaminated
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset description
This dataset is the same as [open-r1/OpenThoughts-114k-Code](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code) decontaminated against the benchmark datasets.
The decontamination has been run using the script in [huggingface/open-r1](https://github.com/huggingface/open-r1/pull/416):
```shell
python scripts/decontaminate.py \
--dataset "open-r1/OpenThoughts-114k-Code" \
-c
...
Removed 2 samples from 'aime_2025'
Removed 28 samples from 'math_500'
Removed 3482 samples from 'lcb'
Initial size: 19890, Final size: 16378
```
数据集说明
本数据集与[open-r1/OpenThoughts-114k-Code](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code)完全一致,仅针对基准数据集完成了去污染处理。
本次去污染操作通过[huggingface/open-r1](https://github.com/huggingface/open-r1/pull/416)仓库中的脚本执行:
shell
python scripts/decontaminate.py
--dataset "open-r1/OpenThoughts-114k-Code"
-c
...
从'aime_2025'数据集中移除了2条样本
从'math_500'数据集中移除了28条样本
从'lcb'数据集中移除了3482条样本
初始样本规模:19890,最终样本规模:16378
提供机构:
maas
创建时间:
2025-02-25



