giulio98/xlcost-single-prompt
收藏Hugging Face2022-11-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/giulio98/xlcost-single-prompt
下载链接
链接失效反馈官方服务:
资源简介:
这是[XLCoST基准](https://github.com/reddy-lab-code-research/XLCoST)的一个子集,用于在程序级别进行文本到代码的生成,支持**2**种编程语言:`Python, C++`。该数据集基于[codeparrot/xlcost-text-to-code](https://huggingface.co/datasets/codeparrot/xlcost-text-to-code)进行了以下改进:* NEWLINE、INDENT和DEDENT被替换为相应的ASCII码。* 使用autopep8对Python代码和clang-format对C++代码进行了重新格式化。* 引入了新的列以允许使用pass@k指标进行评估。* 删除了驱动代码中包含多个函数调用的程序。数据集包含英文文本及其对应的代码翻译。文本包含一组串联的代码注释,用于合成程序。
This is a subset of the [XLCoST benchmark](https://github.com/reddy-lab-code-research/XLCoST), tailored for program-level text-to-code generation, supporting **2** programming languages: `Python, C++`. This dataset is modified based on [codeparrot/xlcost-text-to-code](https://huggingface.co/datasets/codeparrot/xlcost-text-to-code) with the following improvements:
* NEWLINE, INDENT, and DEDENT tokens are replaced with their corresponding ASCII codes.
* Python code is reformatted using autopep8, while C++ code is formatted with clang-format.
* New columns are added to facilitate evaluation using the pass@k metric.
* Programs containing multiple function calls in their driver code are excluded.
The dataset comprises English natural language texts and their corresponding code implementations. The texts consist of a set of concatenated code comments used for program synthesis.
提供机构:
giulio98
原始信息汇总
数据集概述
基本信息
- 名称: xlcost-single-prompt
- 任务类别: 文本生成
- 任务ID: 语言建模
- 许可证: cc-by-sa-4.0
- 多语言性: 多语言
- 语言: 代码
- 语言创建方式: 众包、专家生成
数据集描述
- 来源: 基于codeparrot/xlcost-text-to-code改进
- 改进内容:
- 替换NEWLINE, INDENT和DEDENT为相应ASCII码
- 使用autopep8和clang-format格式化代码文本
- 引入新列以支持pass@k评估
- 移除包含多个函数调用的驱动代码
- 包含语言: Python, C++
数据集结构
- 加载方式: 需指定语言(Python或C++)
- 数据集结构:
- 训练集: 8306行
- 测试集: 812行
- 验证集: 427行
- 数据字段:
- text: 自然语言描述
- context: 导入库/全局变量
- code: 程序级代码
- test: 测试函数调用
- output: 函数调用的预期输出
- fn_call: 函数调用名称
数据分割
- 分割: 训练、测试、验证
引用信息
@misc{zhu2022xlcost, title = {XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence}, url = {https://arxiv.org/abs/2206.08474}, author = {Zhu, Ming and Jain, Aneesh and Suresh, Karthik and Ravindran, Roshan and Tipirneni, Sindhu and Reddy, Chandan K.}, year = {2022}, eprint={2206.08474}, archivePrefix={arXiv} }



