Helsinki-NLP/ted_iwlst2013
收藏Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/Helsinki-NLP/ted_iwlst2013
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- found
language_creators:
- found
language:
- ar
- de
- en
- es
- fa
- fr
- it
- nl
- pl
- pt
- ro
- ru
- sl
- tr
- zh
license:
- unknown
multilinguality:
- multilingual
size_categories:
- 100K<n<1M
source_datasets:
- original
task_categories:
- translation
task_ids: []
paperswithcode_id: null
pretty_name: TedIwlst2013
dataset_info:
- config_name: ar-en
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- ar
- en
splits:
- name: train
num_bytes: 37413446
num_examples: 152838
download_size: 12065234
dataset_size: 37413446
- config_name: de-en
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- de
- en
splits:
- name: train
num_bytes: 30295518
num_examples: 143836
download_size: 10931406
dataset_size: 30295518
- config_name: en-es
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- es
splits:
- name: train
num_bytes: 32522545
num_examples: 157895
download_size: 11642092
dataset_size: 32522545
- config_name: en-fa
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- fa
splits:
- name: train
num_bytes: 22228781
num_examples: 80510
download_size: 6579696
dataset_size: 22228781
- config_name: en-fr
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- fr
splits:
- name: train
num_bytes: 34355481
num_examples: 160420
download_size: 12061420
dataset_size: 34355481
- config_name: en-it
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- it
splits:
- name: train
num_bytes: 32916537
num_examples: 159391
download_size: 11774644
dataset_size: 32916537
- config_name: en-nl
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- nl
splits:
- name: train
num_bytes: 29679822
num_examples: 145951
download_size: 10712032
dataset_size: 29679822
- config_name: en-pl
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- pl
splits:
- name: train
num_bytes: 29776339
num_examples: 149120
download_size: 10999482
dataset_size: 29776339
- config_name: en-pt
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- pt
splits:
- name: train
num_bytes: 32179607
num_examples: 155995
download_size: 11493053
dataset_size: 32179607
- config_name: en-ro
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- ro
splits:
- name: train
num_bytes: 32958421
num_examples: 158483
download_size: 11936172
dataset_size: 32958421
- config_name: en-ru
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- ru
splits:
- name: train
num_bytes: 36529465
num_examples: 133660
download_size: 11167700
dataset_size: 36529465
- config_name: en-sl
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- sl
splits:
- name: train
num_bytes: 2831344
num_examples: 14960
download_size: 1060712
dataset_size: 2831344
- config_name: en-tr
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- tr
splits:
- name: train
num_bytes: 28016103
num_examples: 137028
download_size: 10038531
dataset_size: 28016103
- config_name: en-zh
features:
- name: id
dtype: string
- name: translation
dtype:
translation:
languages:
- en
- zh
splits:
- name: train
num_bytes: 30205477
num_examples: 154579
download_size: 11714497
dataset_size: 30205477
config_names:
- ar-en
- de-en
- en-es
- en-fa
- en-fr
- en-it
- en-nl
- en-pl
- en-pt
- en-ro
- en-ru
- en-sl
- en-tr
- en-zh
---
# Dataset Card for TedIwlst2013
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** http://opus.nlpl.eu/TED2013.php
- **Repository:** None
- **Paper:** hhttp://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf
- **Leaderboard:** None
- **Point of Contact:** [More Information Needed]
### Dataset Summary
[More Information Needed]
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
[More Information Needed]
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
[More Information Needed]
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@abhishekkrthakur](https://github.com/abhishekkrthakur) for adding this dataset.
提供机构:
Helsinki-NLP
原始信息汇总
数据集概述
基本信息
- 名称: TedIwlst2013
- 语言: 多语言,包括阿拉伯语、德语、英语、西班牙语、法语、意大利语、荷兰语、波兰语、葡萄牙语、罗马尼亚语、俄语、斯洛文尼亚语、土耳其语、中文等
- 许可证: 未知
- 多语言性: 多语言
- 大小: 100K<n<1M
- 源数据集: 原始数据
- 任务类别: 翻译
数据集结构
配置详情
- 配置名称: 多种语言对,如ar-en, de-en, en-es等
- 特征:
- id: 字符串类型
- translation: 包含多种语言对的翻译信息
- 分割:
- train: 每个语言对都有对应的训练集,包含数据大小和示例数量
示例配置
配置名称: ar-en
- 训练集:
- 数据大小: 37413446字节
- 示例数量: 152838
- 下载大小: 12065234字节
- 数据集大小: 37413446字节
配置名称: de-en
- 训练集:
- 数据大小: 30295518字节
- 示例数量: 143836
- 下载大小: 10931406字节
- 数据集大小: 30295518字节
配置名称: en-es
- 训练集:
- 数据大小: 32522545字节
- 示例数量: 157895
- 下载大小: 11642092字节
- 数据集大小: 32522545字节
配置名称: en-fa
- 训练集:
- 数据大小: 22228781字节
- 示例数量: 80510
- 下载大小: 6579696字节
- 数据集大小: 22228781字节
配置名称: en-fr
- 训练集:
- 数据大小: 34355481字节
- 示例数量: 160420
- 下载大小: 12061420字节
- 数据集大小: 34355481字节
配置名称: en-it
- 训练集:
- 数据大小: 32916537字节
- 示例数量: 159391
- 下载大小: 11774644字节
- 数据集大小: 32916537字节
配置名称: en-nl
- 训练集:
- 数据大小: 29679822字节
- 示例数量: 145951
- 下载大小: 10712032字节
- 数据集大小: 29679822字节
配置名称: en-pl
- 训练集:
- 数据大小: 29776339字节
- 示例数量: 149120
- 下载大小: 10999482字节
- 数据集大小: 29776339字节
配置名称: en-pt
- 训练集:
- 数据大小: 32179607字节
- 示例数量: 155995
- 下载大小: 11493053字节
- 数据集大小: 32179607字节
配置名称: en-ro
- 训练集:
- 数据大小: 32958421字节
- 示例数量: 158483
- 下载大小: 11936172字节
- 数据集大小: 32958421字节
配置名称: en-ru
- 训练集:
- 数据大小: 36529465字节
- 示例数量: 133660
- 下载大小: 11167700字节
- 数据集大小: 36529465字节
配置名称: en-sl
- 训练集:
- 数据大小: 2831344字节
- 示例数量: 14960
- 下载大小: 1060712字节
- 数据集大小: 2831344字节
配置名称: en-tr
- 训练集:
- 数据大小: 28016103字节
- 示例数量: 137028
- 下载大小: 10038531字节
- 数据集大小: 28016103字节
配置名称: en-zh
- 训练集:
- 数据大小: 30205477字节
- 示例数量: 154579
- 下载大小: 11714497字节
- 数据集大小: 30205477字节
数据集贡献者
- 贡献者: @abhishekkrthakur



