Ali-C137/TARJAMAT-UNPC-EN-ZH
收藏Hugging Face2024-03-16 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/Ali-C137/TARJAMAT-UNPC-EN-ZH
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: un-pc_ar-en
features:
- name: arabic
dtype: string
- name: english
dtype: string
- name: source
dtype: string
- name: metadata
dtype: string
splits:
- name: train
num_bytes: 8500696893
num_examples: 20044478
download_size: 3648057038
dataset_size: 8500696893
- config_name: un-pc_ar-zh
features:
- name: arabic
dtype: string
- name: chinese
dtype: string
- name: source
dtype: string
- name: metadata
dtype: 'null'
splits:
- name: train
num_bytes: 6707235000
num_examples: 17306056
download_size: 3058205349
dataset_size: 6707235000
configs:
- config_name: un-pc_ar-en
data_files:
- split: train
path: un-pc_ar-en/train-*
- config_name: un-pc_ar-zh
data_files:
- split: train
path: un-pc_ar-zh/train-*
---
提供机构:
Ali-C137
原始信息汇总
数据集概述
数据集配置1: un-pc_ar-en
-
特征信息:
arabic: 数据类型为stringenglish: 数据类型为stringsource: 数据类型为stringmetadata: 数据类型为string
-
数据分割:
train: 包含20044478个示例,总字节数为8500696893
-
数据集大小:
- 下载大小:
3648057038字节 - 数据集总大小:
8500696893字节
- 下载大小:
数据集配置2: un-pc_ar-zh
-
特征信息:
arabic: 数据类型为stringchinese: 数据类型为stringsource: 数据类型为stringmetadata: 数据类型为null
-
数据分割:
train: 包含17306056个示例,总字节数为6707235000
-
数据集大小:
- 下载大小:
3058205349字节 - 数据集总大小:
6707235000字节
- 下载大小:



