ontocord/Dolci-Think-SFT-7B-decontaminated
收藏Hugging Face2026-03-06 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ontocord/Dolci-Think-SFT-7B-decontaminated
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
config_name: default
splits:
- name: train
num_examples: 2256891
license: apache-2.0
tags:
- decontaminated
---
# Dolci-Think-SFT-7B-decontaminated
Decontaminated version of [allenai/Dolci-Think-SFT-7B](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-7B).
## Decontamination Details
- **Method**: 13-gram overlap detection
- **Original samples**: 2,268,178
- **Cleaned samples**: 2,256,891
- **Removed samples**: 11,287 (0.50%)
### Benchmarks Checked
MMLU, Ifeval, ARC, COPA, LAMBADA, OpenBookQA, Winogrande, BoolQ, HellaSwag, PIQA, Gsm8k, ALERT, GPQA, MATH, MBPP, HumanEval, SimpleQA, CommonsenseQA, DoNotAnswer, AIME24, LiveCodeBench, MATH500
提供机构:
ontocord



