five

Helsinki-NLP/ted_iwlst2013

收藏
Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/Helsinki-NLP/ted_iwlst2013
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - found language_creators: - found language: - ar - de - en - es - fa - fr - it - nl - pl - pt - ro - ru - sl - tr - zh license: - unknown multilinguality: - multilingual size_categories: - 100K<n<1M source_datasets: - original task_categories: - translation task_ids: [] paperswithcode_id: null pretty_name: TedIwlst2013 dataset_info: - config_name: ar-en features: - name: id dtype: string - name: translation dtype: translation: languages: - ar - en splits: - name: train num_bytes: 37413446 num_examples: 152838 download_size: 12065234 dataset_size: 37413446 - config_name: de-en features: - name: id dtype: string - name: translation dtype: translation: languages: - de - en splits: - name: train num_bytes: 30295518 num_examples: 143836 download_size: 10931406 dataset_size: 30295518 - config_name: en-es features: - name: id dtype: string - name: translation dtype: translation: languages: - en - es splits: - name: train num_bytes: 32522545 num_examples: 157895 download_size: 11642092 dataset_size: 32522545 - config_name: en-fa features: - name: id dtype: string - name: translation dtype: translation: languages: - en - fa splits: - name: train num_bytes: 22228781 num_examples: 80510 download_size: 6579696 dataset_size: 22228781 - config_name: en-fr features: - name: id dtype: string - name: translation dtype: translation: languages: - en - fr splits: - name: train num_bytes: 34355481 num_examples: 160420 download_size: 12061420 dataset_size: 34355481 - config_name: en-it features: - name: id dtype: string - name: translation dtype: translation: languages: - en - it splits: - name: train num_bytes: 32916537 num_examples: 159391 download_size: 11774644 dataset_size: 32916537 - config_name: en-nl features: - name: id dtype: string - name: translation dtype: translation: languages: - en - nl splits: - name: train num_bytes: 29679822 num_examples: 145951 download_size: 10712032 dataset_size: 29679822 - config_name: en-pl features: - name: id dtype: string - name: translation dtype: translation: languages: - en - pl splits: - name: train num_bytes: 29776339 num_examples: 149120 download_size: 10999482 dataset_size: 29776339 - config_name: en-pt features: - name: id dtype: string - name: translation dtype: translation: languages: - en - pt splits: - name: train num_bytes: 32179607 num_examples: 155995 download_size: 11493053 dataset_size: 32179607 - config_name: en-ro features: - name: id dtype: string - name: translation dtype: translation: languages: - en - ro splits: - name: train num_bytes: 32958421 num_examples: 158483 download_size: 11936172 dataset_size: 32958421 - config_name: en-ru features: - name: id dtype: string - name: translation dtype: translation: languages: - en - ru splits: - name: train num_bytes: 36529465 num_examples: 133660 download_size: 11167700 dataset_size: 36529465 - config_name: en-sl features: - name: id dtype: string - name: translation dtype: translation: languages: - en - sl splits: - name: train num_bytes: 2831344 num_examples: 14960 download_size: 1060712 dataset_size: 2831344 - config_name: en-tr features: - name: id dtype: string - name: translation dtype: translation: languages: - en - tr splits: - name: train num_bytes: 28016103 num_examples: 137028 download_size: 10038531 dataset_size: 28016103 - config_name: en-zh features: - name: id dtype: string - name: translation dtype: translation: languages: - en - zh splits: - name: train num_bytes: 30205477 num_examples: 154579 download_size: 11714497 dataset_size: 30205477 config_names: - ar-en - de-en - en-es - en-fa - en-fr - en-it - en-nl - en-pl - en-pt - en-ro - en-ru - en-sl - en-tr - en-zh --- # Dataset Card for TedIwlst2013 ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** http://opus.nlpl.eu/TED2013.php - **Repository:** None - **Paper:** hhttp://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf - **Leaderboard:** None - **Point of Contact:** [More Information Needed] ### Dataset Summary [More Information Needed] ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data [More Information Needed] #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations [More Information Needed] #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@abhishekkrthakur](https://github.com/abhishekkrthakur) for adding this dataset.
提供机构:
Helsinki-NLP
原始信息汇总

数据集概述

基本信息

  • 名称: TedIwlst2013
  • 语言: 多语言,包括阿拉伯语、德语、英语、西班牙语、法语、意大利语、荷兰语、波兰语、葡萄牙语、罗马尼亚语、俄语、斯洛文尼亚语、土耳其语、中文等
  • 许可证: 未知
  • 多语言性: 多语言
  • 大小: 100K<n<1M
  • 源数据集: 原始数据
  • 任务类别: 翻译

数据集结构

配置详情

  • 配置名称: 多种语言对,如ar-en, de-en, en-es等
  • 特征:
    • id: 字符串类型
    • translation: 包含多种语言对的翻译信息
  • 分割:
    • train: 每个语言对都有对应的训练集,包含数据大小和示例数量

示例配置

配置名称: ar-en

  • 训练集:
    • 数据大小: 37413446字节
    • 示例数量: 152838
    • 下载大小: 12065234字节
    • 数据集大小: 37413446字节

配置名称: de-en

  • 训练集:
    • 数据大小: 30295518字节
    • 示例数量: 143836
    • 下载大小: 10931406字节
    • 数据集大小: 30295518字节

配置名称: en-es

  • 训练集:
    • 数据大小: 32522545字节
    • 示例数量: 157895
    • 下载大小: 11642092字节
    • 数据集大小: 32522545字节

配置名称: en-fa

  • 训练集:
    • 数据大小: 22228781字节
    • 示例数量: 80510
    • 下载大小: 6579696字节
    • 数据集大小: 22228781字节

配置名称: en-fr

  • 训练集:
    • 数据大小: 34355481字节
    • 示例数量: 160420
    • 下载大小: 12061420字节
    • 数据集大小: 34355481字节

配置名称: en-it

  • 训练集:
    • 数据大小: 32916537字节
    • 示例数量: 159391
    • 下载大小: 11774644字节
    • 数据集大小: 32916537字节

配置名称: en-nl

  • 训练集:
    • 数据大小: 29679822字节
    • 示例数量: 145951
    • 下载大小: 10712032字节
    • 数据集大小: 29679822字节

配置名称: en-pl

  • 训练集:
    • 数据大小: 29776339字节
    • 示例数量: 149120
    • 下载大小: 10999482字节
    • 数据集大小: 29776339字节

配置名称: en-pt

  • 训练集:
    • 数据大小: 32179607字节
    • 示例数量: 155995
    • 下载大小: 11493053字节
    • 数据集大小: 32179607字节

配置名称: en-ro

  • 训练集:
    • 数据大小: 32958421字节
    • 示例数量: 158483
    • 下载大小: 11936172字节
    • 数据集大小: 32958421字节

配置名称: en-ru

  • 训练集:
    • 数据大小: 36529465字节
    • 示例数量: 133660
    • 下载大小: 11167700字节
    • 数据集大小: 36529465字节

配置名称: en-sl

  • 训练集:
    • 数据大小: 2831344字节
    • 示例数量: 14960
    • 下载大小: 1060712字节
    • 数据集大小: 2831344字节

配置名称: en-tr

  • 训练集:
    • 数据大小: 28016103字节
    • 示例数量: 137028
    • 下载大小: 10038531字节
    • 数据集大小: 28016103字节

配置名称: en-zh

  • 训练集:
    • 数据大小: 30205477字节
    • 示例数量: 154579
    • 下载大小: 11714497字节
    • 数据集大小: 30205477字节

数据集贡献者

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作