abir-hr196/clt_gpt2_nonenglish_tokenized2
收藏Hugging Face2025-11-23 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/abir-hr196/clt_gpt2_nonenglish_tokenized2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: input_ids
sequence: int32
splits:
- name: deu_Latn
num_bytes: 240894732
num_examples: 223613
- name: arb_Arab
num_bytes: 241456408
num_examples: 364006
- name: cmn_Hani
num_bytes: 241169928
num_examples: 292415
- name: fra_Latn
num_bytes: 240332112
num_examples: 83004
download_size: 1515068815
dataset_size: 963853180
configs:
- config_name: default
data_files:
- split: deu_Latn
path: data/deu_Latn-*
- split: arb_Arab
path: data/arb_Arab-*
- split: cmn_Hani
path: data/cmn_Hani-*
- split: fra_Latn
path: data/fra_Latn-*
---
dataset_info:
features:
- name: input_ids
sequence: int32
splits:
- name: deu_Latn(德语-拉丁字母)
num_bytes: 240894732
num_examples: 223613
- name: arb_Arab(阿拉伯语-阿拉伯字母)
num_bytes: 241456408
num_examples: 364006
- name: cmn_Hani(中文-汉字)
num_bytes: 241169928
num_examples: 292415
- name: fra_Latn(法语-拉丁字母)
num_bytes: 240332112
num_examples: 83004
download_size: 1515068815
dataset_size: 963853180
configs:
- config_name: default
data_files:
- split: deu_Latn
path: data/deu_Latn-*
- split: arb_Arab
path: data/arb_Arab-*
- split: cmn_Hani
path: data/cmn_Hani-*
- split: fra_Latn
path: data/fra_Latn-*
提供机构:
abir-hr196



