five

collinear-ai/valley-of-reasoning-data

收藏
Hugging Face2025-10-08 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/collinear-ai/valley-of-reasoning-data
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集包含7个子集,用于在论文《The Valley of Code Reasoning: Scaling Knowledge Distillation of Large Language Models》的实验中。其中,子集train_1k、train_10k和train_30k用于研究数据集大小对编码性能的影响;子集easy_medium_4k、hard_4k、correct_6k和incorrect_6k用于研究数据的正确性或难度水平对编码性能的影响。每个子集都包含输入、输出、文本和令牌计数等列,输入列提供编码任务提示,输出列包含模型的响应及推理轨迹。

This dataset consists of 7 subsets used in the experiments for the paper The Valley of Code Reasoning: Scaling Knowledge Distillation of Large Language Models. Subsets train_1k, train_10k, and train_30k are used to study the effect of dataset size on coding performance, while easy_medium_4k, hard_4k, correct_6k, and incorrect_6k are used to study the effect of correctness or difficulty level in data on coding performance. Each subset includes columns such as input, output, text, and token_count, with the input column providing coding task prompts and the output column containing the models response and reasoning traces.
提供机构:
collinear-ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作