CodeIO-PyEdu-Reasoning-Raw

Name: CodeIO-PyEdu-Reasoning-Raw
Creator: maas
Published: 2025-12-05 16:23:52
License: 暂无描述

魔搭社区2025-12-05 更新2025-02-22 收录

下载链接：

https://modelscope.cn/datasets/hkust-nlp/CodeIO-PyEdu-Reasoning-Raw

下载链接

链接失效反馈

官方服务：

资源简介：

# CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction <p align="left"> 📑 <a href="https://huggingface.co/papers/2502.07316" target="_blank">Paper</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://codei-o.github.io/" target="_blank">Project Page</a> &nbsp&nbsp | &nbsp&nbsp 💾 <a href="https://huggingface.co/collections/hkust-nlp/codei-o-67a978e28fd926b56a4f55a2" target="_blank">Released Resources</a> &nbsp&nbsp | &nbsp&nbsp 📦 <a href="https://github.com/hkust-nlp/CodeIO" target="_blank">Repo</a> We release the raw data for our processed PythonEdu-Reasoning dataset. The data format for each line in the `0_368500_filtered_v2_ds25.sced.jsonl` is as follows: ``` { "problem_description": <the problem description of the function>, "io_requirements": <the input/output requirements and constraints>, "refcode": <the reference code, including imported packages (optional), auxiliary functions (optional) and main entrypoint function>, "funcname": <the function name for the entrypoint function>, "ios": [ { "input": <the input arguments>, "output":<the returned value> }, ... ], "source": <the source of the raw code files>, "category": <the reasoning type we assign to this sample>, "meta": <meta information about this sample> } ``` Some of the `ios` are empty. The reason is that when executing the code, the input/output sizes are too large and exceed our required constraints. Thus, they are not stored or used later. *Note: Due to imperfect LLM-based transformations, some problem descriptions do not contain enough information to describe the code. We leave this as future work to further enhance our data and update it to a better version. ## Citation If you find these resources helpful, please kindly cite as: ``` @article{li2025codeio, title={CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction}, author={Li, Junlong and Guo, Daya and Yang, Dejian and Xu, Runxin and Wu, Yu and He, Junxian}, journal={arXiv preprint arXiv:2502.07316}, year={2025} } ```

# CodeI/O：通过代码输入-输出预测凝练推理模式 <p align="left"> 📑 <a href="https://huggingface.co/papers/2502.07316" target="_blank">论文</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://codei-o.github.io/" target="_blank">项目主页</a> &nbsp&nbsp | &nbsp&nbsp 💾 <a href="https://huggingface.co/collections/hkust-nlp/codei-o-67a978e28fd926b56a4f55a2" target="_blank">已发布资源</a> &nbsp&nbsp | &nbsp&nbsp 📦 <a href="https://github.com/hkust-nlp/CodeIO" target="_blank">代码仓库</a> 我们发布了经过处理的PythonEdu-Reasoning数据集的原始数据。 `0_368500_filtered_v2_ds25.sced.jsonl` 文件中的每一行数据格式如下： { "problem_description": <该函数对应的问题描述>, "io_requirements": <输入输出要求与约束条件>, "refcode": <参考代码，包含导入的包（可选）、辅助函数（可选）以及主入口函数>, "funcname": <入口函数的函数名>, "ios": [ { "input": <输入参数>, "output":<返回值> }, ... ], "source": <原始代码文件的来源>, "category": <我们为该样本标注的推理类型>, "meta": <该样本的元信息> } 部分`ios`为空。原因是在代码执行过程中，输入/输出的规模过大，超出了我们设定的约束条件，因此未对其进行存储或后续使用。 *注意：由于基于大语言模型（LLM）的转换存在不完善之处，部分问题描述未能包含足够的代码相关描述信息。我们将进一步优化该数据集并更新至更佳版本作为未来的研究工作。 ## 引用若您认为本数据集资源对您的研究有所帮助，请引用如下文献： @article{li2025codeio, title={CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction}, author={Li, Junlong and Guo, Daya and Yang, Dejian and Xu, Runxin and Wu, Yu and He, Junxian}, journal={arXiv preprint arXiv:2502.07316}, year={2025} }

提供机构：

maas

创建时间：

2025-02-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集