five

CodeIO-PyEdu-Reasoning-Raw

收藏
魔搭社区2025-12-05 更新2025-02-22 收录
下载链接:
https://modelscope.cn/datasets/hkust-nlp/CodeIO-PyEdu-Reasoning-Raw
下载链接
链接失效反馈
官方服务:
资源简介:
# CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction <p align="left"> 📑 <a href="https://huggingface.co/papers/2502.07316" target="_blank">Paper</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://codei-o.github.io/" target="_blank">Project Page</a> &nbsp&nbsp | &nbsp&nbsp 💾 <a href="https://huggingface.co/collections/hkust-nlp/codei-o-67a978e28fd926b56a4f55a2" target="_blank">Released Resources</a> &nbsp&nbsp | &nbsp&nbsp 📦 <a href="https://github.com/hkust-nlp/CodeIO" target="_blank">Repo</a> We release the raw data for our processed PythonEdu-Reasoning dataset. The data format for each line in the `0_368500_filtered_v2_ds25.sced.jsonl` is as follows: ``` { "problem_description": <the problem description of the function>, "io_requirements": <the input/output requirements and constraints>, "refcode": <the reference code, including imported packages (optional), auxiliary functions (optional) and main entrypoint function>, "funcname": <the function name for the entrypoint function>, "ios": [ { "input": <the input arguments>, "output":<the returned value> }, ... ], "source": <the source of the raw code files>, "category": <the reasoning type we assign to this sample>, "meta": <meta information about this sample> } ``` Some of the `ios` are empty. The reason is that when executing the code, the input/output sizes are too large and exceed our required constraints. Thus, they are not stored or used later. *Note: Due to imperfect LLM-based transformations, some problem descriptions do not contain enough information to describe the code. We leave this as future work to further enhance our data and update it to a better version. ## Citation If you find these resources helpful, please kindly cite as: ``` @article{li2025codeio, title={CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction}, author={Li, Junlong and Guo, Daya and Yang, Dejian and Xu, Runxin and Wu, Yu and He, Junxian}, journal={arXiv preprint arXiv:2502.07316}, year={2025} } ```

# CodeI/O:通过代码输入-输出预测凝练推理模式 <p align="left"> 📑 <a href="https://huggingface.co/papers/2502.07316" target="_blank">论文</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://codei-o.github.io/" target="_blank">项目主页</a> &nbsp&nbsp | &nbsp&nbsp 💾 <a href="https://huggingface.co/collections/hkust-nlp/codei-o-67a978e28fd926b56a4f55a2" target="_blank">已发布资源</a> &nbsp&nbsp | &nbsp&nbsp 📦 <a href="https://github.com/hkust-nlp/CodeIO" target="_blank">代码仓库</a> 我们发布了经过处理的PythonEdu-Reasoning数据集的原始数据。 `0_368500_filtered_v2_ds25.sced.jsonl` 文件中的每一行数据格式如下: { "problem_description": <该函数对应的问题描述>, "io_requirements": <输入输出要求与约束条件>, "refcode": <参考代码,包含导入的包(可选)、辅助函数(可选)以及主入口函数>, "funcname": <入口函数的函数名>, "ios": [ { "input": <输入参数>, "output":<返回值> }, ... ], "source": <原始代码文件的来源>, "category": <我们为该样本标注的推理类型>, "meta": <该样本的元信息> } 部分`ios`为空。原因是在代码执行过程中,输入/输出的规模过大,超出了我们设定的约束条件,因此未对其进行存储或后续使用。 *注意:由于基于大语言模型(LLM)的转换存在不完善之处,部分问题描述未能包含足够的代码相关描述信息。我们将进一步优化该数据集并更新至更佳版本作为未来的研究工作。 ## 引用 若您认为本数据集资源对您的研究有所帮助,请引用如下文献: @article{li2025codeio, title={CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction}, author={Li, Junlong and Guo, Daya and Yang, Dejian and Xu, Runxin and Wu, Yu and He, Junxian}, journal={arXiv preprint arXiv:2502.07316}, year={2025} }
提供机构:
maas
创建时间:
2025-02-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作