leapeto/tokenskip-math-qwen3-4b-thinking-n8

Name: leapeto/tokenskip-math-qwen3-4b-thinking-n8
Creator: leapeto
Published: 2026-04-28 04:12:38
License: 暂无描述

Hugging Face2026-04-28 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/leapeto/tokenskip-math-qwen3-4b-thinking-n8

下载链接

链接失效反馈

官方服务：

资源简介：

TokenSkip-MATH CoTs — Qwen3-4B-Thinking (N=8) + LLMLingua-2 压缩数据集是一个用于在MATH基准上复现TokenSkip方法的中间产物。该数据集包含了使用Qwen/Qwen3-4B-Thinking-2507模型生成的96,000条原始链式思考（CoT）输出，以及通过LLMLingua-2在四个不同压缩比率（0.1, 0.3, 0.5, 0.7）下压缩后的35,660条记录。数据集旨在支持压缩推理的监督微调（SFT）研究，提供原始和压缩后的推理文本对，便于比较和分析。数据集还包含了详细的收集和压缩设置、记录模式、加载方法、预期用途、已知限制以及引用信息。

The TokenSkip-MATH CoTs — Qwen3-4B-Thinking (N=8) + LLMLingua-2 compressions dataset is an intermediate artifact for reproducing the TokenSkip method on the MATH benchmark. It includes 96,000 original chain-of-thought (CoT) outputs generated by the Qwen/Qwen3-4B-Thinking-2507 model, compressed at four different ratios (0.1, 0.3, 0.5, 0.7) using LLMLingua-2, resulting in 35,660 compressed records. The dataset is designed to support research on supervised fine-tuning (SFT) with compressed reasoning, providing paired original and compressed reasoning texts for comparison and analysis. It also includes detailed collection and compression settings, record schemas, loading instructions, intended use, known limitations, and citation information.

提供机构：

leapeto

5,000+

优质数据集

54 个

任务类型

进入经典数据集