neulab/mconala
收藏Hugging Face2023-02-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/neulab/mconala
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
task_categories:
- text-generation
- translation
language:
- es
- ja
- ru
tags:
- code generation
pretty_name: mconala
size_categories:
- n<1K
---
# Dataset Card for MCoNaLa
## Dataset Description
- **Homepage:** https://github.com/zorazrw/multilingual-conala
- **Repository:** https://github.com/zorazrw/multilingual-conala
- **Paper:** https://arxiv.org/pdf/2203.08388.pdf
- **Leaderboard:** https://explainaboard.inspiredco.ai/leaderboards?show_mine=false&sort_dir=desc&sort_field=created_at&dataset=mconala
### Dataset Summary
MCoNaLa is a Multilingual Code/Natural Language Challenge dataset with 896 NL-Code pairs in three languages: Spanish, Japanese, and Russian.
### Languages
Spanish, Japanese, Russian; Python
## Dataset Structure
### How to Use
```bash
from datasets import load_dataset
# Spanish subset
load_dataset("neulab/mconala", "es")
DatasetDict({
test: Dataset({
features: ['question_id', 'intent', 'rewritten_intent', 'snippet'],
num_rows: 341
})
})
# Japanese subset
load_dataset("neulab/mconala", "ja")
DatasetDict({
test: Dataset({
features: ['question_id', 'intent', 'rewritten_intent', 'snippet'],
num_rows: 210
})
})
# Russian subset
load_dataset("neulab/mconala", "ru")
DatasetDict({
test: Dataset({
features: ['question_id', 'intent', 'rewritten_intent', 'snippet'],
num_rows: 345
})
})
```
### Data Fields
|Field|Type|Description|
|---|---|---|
|question_id|int|StackOverflow post id of the sample|
|intent|string|Title of the Stackoverflow post as the initial NL intent|
|rewritten_intent|string|nl intent rewritten by human annotators|
|snippet|string|Python code solution to the NL intent|
### Data Splits
The dataset contains 341, 210, and 345 samples in Spanish, Japanese, and Russian.
### Citation Information
```
@article{wang2022mconala,
title={MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages},
author={Zhiruo Wang, Grace Cuenca, Shuyan Zhou, Frank F. Xu, Graham Neubig},
journal={arXiv preprint arXiv:2203.08388},
year={2022}
}
```
提供机构:
neulab
原始信息汇总
数据集概述
数据集描述
- 名称: MCoNaLa
- 类型: 多语言代码/自然语言挑战数据集
- 语言: 西班牙语, 日语, 俄语
- 任务类别: 文本生成, 翻译
- 标签: 代码生成
- 许可证: cc-by-sa-4.0
- 大小: 小于1000条记录
数据集结构
数据字段
| 字段 | 类型 | 描述 |
|---|---|---|
| question_id | int | StackOverflow 帖子ID |
| intent | string | StackOverflow 帖子标题 |
| rewritten_intent | string | 由人工注释者重写的自然语言意图 |
| snippet | string | 解决自然语言意图的Python代码 |
数据分割
- 西班牙语: 341条记录
- 日语: 210条记录
- 俄语: 345条记录
引用信息
@article{wang2022mconala, title={MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages}, author={Zhiruo Wang, Grace Cuenca, Shuyan Zhou, Frank F. Xu, Graham Neubig}, journal={arXiv preprint arXiv:2203.08388}, year={2022} }



