collinear-ai/valley-of-reasoning-data

Name: collinear-ai/valley-of-reasoning-data
Creator: collinear-ai
Published: 2025-10-08 13:26:48
License: 暂无描述

Hugging Face2025-10-08 更新2025-10-18 收录

下载链接：

https://hf-mirror.com/datasets/collinear-ai/valley-of-reasoning-data

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集包含7个子集，用于在论文《The Valley of Code Reasoning: Scaling Knowledge Distillation of Large Language Models》的实验中。其中，子集train_1k、train_10k和train_30k用于研究数据集大小对编码性能的影响；子集easy_medium_4k、hard_4k、correct_6k和incorrect_6k用于研究数据的正确性或难度水平对编码性能的影响。每个子集都包含输入、输出、文本和令牌计数等列，输入列提供编码任务提示，输出列包含模型的响应及推理轨迹。

This dataset consists of 7 subsets used in the experiments for the paper The Valley of Code Reasoning: Scaling Knowledge Distillation of Large Language Models. Subsets train_1k, train_10k, and train_30k are used to study the effect of dataset size on coding performance, while easy_medium_4k, hard_4k, correct_6k, and incorrect_6k are used to study the effect of correctness or difficulty level in data on coding performance. Each subset includes columns such as input, output, text, and token_count, with the input column providing coding task prompts and the output column containing the models response and reasoning traces.

提供机构：

collinear-ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集