liaad/machine_translation_dataset
收藏Hugging Face2024-04-08 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/liaad/machine_translation_dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: journalistic
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
'0': pt-PT
'1': pt-BR
splits:
- name: train
num_bytes: 1312620204
num_examples: 1845205
download_size: 869897684
dataset_size: 1312620204
- config_name: legal
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
'0': PT-PT
'1': PT-BR
splits:
- name: train
num_bytes: 149071750
num_examples: 477903
download_size: 80693729
dataset_size: 149071750
- config_name: literature
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
'0': pt-PT
'1': pt-BR
splits:
- name: train
num_bytes: 55905796
num_examples: 225
download_size: 34170187
dataset_size: 55905796
- config_name: politics
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
'0': pt-PT
'1': pt-BR
splits:
- name: train
num_bytes: 367519469
num_examples: 14328
download_size: 199770940
dataset_size: 367519469
- config_name: social_media
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
'0': pt-PT
'1': pt-BR
splits:
- name: train
num_bytes: 372374266
num_examples: 3074774
download_size: 267074829
dataset_size: 372374266
- config_name: web
features:
- name: text
dtype: string
- name: label
dtype:
class_label:
names:
'0': PT-PT
'1': PT-BR
splits:
- name: train
num_bytes: 1373778486
num_examples: 279555
download_size: 674977136
dataset_size: 1373778486
configs:
- config_name: journalistic
data_files:
- split: train
path: journalistic/train-*
- config_name: legal
data_files:
- split: train
path: legal/train-*
- config_name: literature
data_files:
- split: train
path: literature/train-*
- config_name: politics
data_files:
- split: train
path: politics/train-*
- config_name: social_media
data_files:
- split: train
path: social_media/train-*
- config_name: web
data_files:
- split: train
path: web/train-*
---
提供机构:
liaad
原始信息汇总
数据集概述
1. 数据集配置
-
journalistic
- 特征:
- text: 字符串类型
- label: 分类标签,包含 pt-PT 和 pt-BR
- 训练集:
- 字节数: 1312620204
- 示例数: 1845205
- 下载大小: 869897684
- 数据集大小: 1312620204
- 特征:
-
legal
- 特征:
- text: 字符串类型
- label: 分类标签,包含 PT-PT 和 PT-BR
- 训练集:
- 字节数: 149071750
- 示例数: 477903
- 下载大小: 80693729
- 数据集大小: 149071750
- 特征:
-
literature
- 特征:
- text: 字符串类型
- label: 分类标签,包含 pt-PT 和 pt-BR
- 训练集:
- 字节数: 55905796
- 示例数: 225
- 下载大小: 34170187
- 数据集大小: 55905796
- 特征:
-
politics
- 特征:
- text: 字符串类型
- label: 分类标签,包含 pt-PT 和 pt-BR
- 训练集:
- 字节数: 367519469
- 示例数: 14328
- 下载大小: 199770940
- 数据集大小: 367519469
- 特征:
-
social_media
- 特征:
- text: 字符串类型
- label: 分类标签,包含 pt-PT 和 pt-BR
- 训练集:
- 字节数: 372374266
- 示例数: 3074774
- 下载大小: 267074829
- 数据集大小: 372374266
- 特征:
-
web
- 特征:
- text: 字符串类型
- label: 分类标签,包含 PT-PT 和 PT-BR
- 训练集:
- 字节数: 1373778486
- 示例数: 279555
- 下载大小: 674977136
- 数据集大小: 1373778486
- 特征:
2. 数据文件路径
- journalistic
- 训练集路径: journalistic/train-*
- legal
- 训练集路径: legal/train-*
- literature
- 训练集路径: literature/train-*
- politics
- 训练集路径: politics/train-*
- social_media
- 训练集路径: social_media/train-*
- web
- 训练集路径: web/train-*



