five

neulab/mconala

收藏
Hugging Face2023-02-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/neulab/mconala
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 task_categories: - text-generation - translation language: - es - ja - ru tags: - code generation pretty_name: mconala size_categories: - n<1K --- # Dataset Card for MCoNaLa ## Dataset Description - **Homepage:** https://github.com/zorazrw/multilingual-conala - **Repository:** https://github.com/zorazrw/multilingual-conala - **Paper:** https://arxiv.org/pdf/2203.08388.pdf - **Leaderboard:** https://explainaboard.inspiredco.ai/leaderboards?show_mine=false&sort_dir=desc&sort_field=created_at&dataset=mconala ### Dataset Summary MCoNaLa is a Multilingual Code/Natural Language Challenge dataset with 896 NL-Code pairs in three languages: Spanish, Japanese, and Russian. ### Languages Spanish, Japanese, Russian; Python ## Dataset Structure ### How to Use ```bash from datasets import load_dataset # Spanish subset load_dataset("neulab/mconala", "es") DatasetDict({ test: Dataset({ features: ['question_id', 'intent', 'rewritten_intent', 'snippet'], num_rows: 341 }) }) # Japanese subset load_dataset("neulab/mconala", "ja") DatasetDict({ test: Dataset({ features: ['question_id', 'intent', 'rewritten_intent', 'snippet'], num_rows: 210 }) }) # Russian subset load_dataset("neulab/mconala", "ru") DatasetDict({ test: Dataset({ features: ['question_id', 'intent', 'rewritten_intent', 'snippet'], num_rows: 345 }) }) ``` ### Data Fields |Field|Type|Description| |---|---|---| |question_id|int|StackOverflow post id of the sample| |intent|string|Title of the Stackoverflow post as the initial NL intent| |rewritten_intent|string|nl intent rewritten by human annotators| |snippet|string|Python code solution to the NL intent| ### Data Splits The dataset contains 341, 210, and 345 samples in Spanish, Japanese, and Russian. ### Citation Information ``` @article{wang2022mconala, title={MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages}, author={Zhiruo Wang, Grace Cuenca, Shuyan Zhou, Frank F. Xu, Graham Neubig}, journal={arXiv preprint arXiv:2203.08388}, year={2022} } ```
提供机构:
neulab
原始信息汇总

数据集概述

数据集描述

  • 名称: MCoNaLa
  • 类型: 多语言代码/自然语言挑战数据集
  • 语言: 西班牙语, 日语, 俄语
  • 任务类别: 文本生成, 翻译
  • 标签: 代码生成
  • 许可证: cc-by-sa-4.0
  • 大小: 小于1000条记录

数据集结构

数据字段

字段 类型 描述
question_id int StackOverflow 帖子ID
intent string StackOverflow 帖子标题
rewritten_intent string 由人工注释者重写的自然语言意图
snippet string 解决自然语言意图的Python代码

数据分割

  • 西班牙语: 341条记录
  • 日语: 210条记录
  • 俄语: 345条记录

引用信息

@article{wang2022mconala, title={MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages}, author={Zhiruo Wang, Grace Cuenca, Shuyan Zhou, Frank F. Xu, Graham Neubig}, journal={arXiv preprint arXiv:2203.08388}, year={2022} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作