cdleong/piglatin-mt
收藏Hugging Face2022-10-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cdleong/piglatin-mt
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license:
- mit
multilinguality:
- translation
size_categories:
- 10K<n<100K
source_datasets:
- original
task_categories:
- translation
task_ids: []
language_details: eng and engyay
---
## Dataset Description
- **Homepage:** cdleong.github.io
# Dataset Summary:
Pig-latin machine and English parallel machine translation corpus.
Based on [The Project Gutenberg EBook of "De Bello Gallico" and Other Commentaries](https://www.gutenberg.org/ebooks/10657)
Converted to pig-latin with https://github.com/bpabel/piglatin
Blank lines removed.
## Dataset Structure
```
DatasetDict({
train: Dataset({
features: ['translation'],
num_rows: 14778
})
validation: Dataset({
features: ['translation'],
num_rows: 1000
})
})
```
### Data Instances
```
{
'translation':
{
'eng': 'thrown into disorder they returned with more precipitation than is usual',
'engyay': 'own-thray into-ay isorder-day ey-thay eturned-ray ith-way ore-may ecipitation-pray an-thay is-ay usual-ay'
}
}
```
### Data Fields
- `translation`: a dictionary containing two strings paired with a key indicating the corresponding language.
### Data Splits
- `train`: most of the data, 13,232 samples total.
- `dev`: 1k holdout samples, created with the datasets.train_test_split() function
提供机构:
cdleong
原始信息汇总
数据集概述
基本信息
- 语言: 英语(eng)和英语拉丁语(engyay)
- 许可证: MIT
- 多语言性: 翻译
- 大小: 10K<n<100K
- 源数据集: 原始数据
- 任务类别: 翻译
数据集描述
- 摘要: 猪拉丁语与英语平行机器翻译语料库。
- 来源: 基于《The Project Gutenberg EBook of "De Bello Gallico" and Other Commentaries》,转换为猪拉丁语。
数据集结构
- 数据集字典: 包含训练集和验证集
- 训练集: 包含14,778行数据,特征为translation
- 验证集: 包含1,000行数据,特征为translation
数据实例
-
示例:
{ translation: { eng: thrown into disorder they returned with more precipitation than is usual, engyay: own-thray into-ay isorder-day ey-thay eturned-ray ith-way ore-may ecipitation-pray an-thay is-ay usual-ay } }
数据字段
- translation: 包含两个字符串,分别对应英语和英语拉丁语。
数据分割
- 训练集: 13,232个样本
- 验证集: 1,000个样本,通过
datasets.train_test_split()函数创建



