carlesoctav/en-id-parallel-sentences
收藏Hugging Face2023-05-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/carlesoctav/en-id-parallel-sentences
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text_en
dtype: string
- name: text_id
dtype: string
splits:
- name: msmarcoquery
num_bytes: 41010003
num_examples: 500000
- name: combinedtech
num_bytes: 44901963
num_examples: 276659
- name: msmarcocollection
num_bytes: 351086941
num_examples: 500000
- name: TED2020
num_bytes: 32590228
num_examples: 163319
- name: Tatoeba
num_bytes: 797670
num_examples: 10543
- name: NeuLabTedTalks
num_bytes: 19440416
num_examples: 94224
- name: QED
num_bytes: 40115874
num_examples: 274581
- name: tico19
num_bytes: 959990
num_examples: 3071
download_size: 282831590
dataset_size: 530903085
---
# Dataset Card for "en-id-parallel-sentences"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
carlesoctav
原始信息汇总
数据集概述
数据集名称
"en-id-parallel-sentences"
数据集特征
- text_en: 字符串类型
- text_id: 字符串类型
数据集分割
- msmarcoquery:
- 字节数: 41010003
- 示例数: 500000
- combinedtech:
- 字节数: 44901963
- 示例数: 276659
- msmarcocollection:
- 字节数: 351086941
- 示例数: 500000
- TED2020:
- 字节数: 32590228
- 示例数: 163319
- Tatoeba:
- 字节数: 797670
- 示例数: 10543
- NeuLabTedTalks:
- 字节数: 19440416
- 示例数: 94224
- QED:
- 字节数: 40115874
- 示例数: 274581
- tico19:
- 字节数: 959990
- 示例数: 3071
数据集大小
- 下载大小: 282831590
- 数据集总大小: 530903085



