minhnguyent546/CulturaY-vi
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/minhnguyent546/CulturaY-vi
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: vi
features:
- name: id
dtype: int64
- name: document_lang
dtype: string
- name: scores
sequence: float64
- name: langs
sequence: string
- name: text
dtype: string
- name: url
dtype: string
- name: collection
dtype: string
splits:
- name: train
num_bytes: 61806850824
num_examples: 4493567
download_size: 25796629112
dataset_size: 61806850824
- config_name: vi-1m
features:
- name: id
dtype: int64
- name: document_lang
dtype: string
- name: scores
sequence: float64
- name: langs
sequence: string
- name: text
dtype: string
- name: url
dtype: string
- name: collection
dtype: string
splits:
- name: train
num_bytes: 13754518587.126886
num_examples: 1000000
download_size: 6271817274
dataset_size: 13754518587.126886
- config_name: vi-250k
features:
- name: id
dtype: int64
- name: document_lang
dtype: string
- name: scores
list: float64
- name: langs
list: string
- name: text
dtype: string
- name: url
dtype: string
- name: collection
dtype: string
splits:
- name: train
num_bytes: 3441848789
num_examples: 250000
download_size: 1986625153
dataset_size: 3441848789
- config_name: vi-2m
features:
- name: id
dtype: int64
- name: document_lang
dtype: string
- name: scores
sequence: float64
- name: langs
sequence: string
- name: text
dtype: string
- name: url
dtype: string
- name: collection
dtype: string
splits:
- name: train
num_bytes: 27509037174.253773
num_examples: 2000000
download_size: 12537225059
dataset_size: 27509037174.253773
configs:
- config_name: vi
data_files:
- split: train
path: vi/train-*
default: true
- config_name: vi-1m
data_files:
- split: train
path: vi-1m/train-*
- config_name: vi-250k
data_files:
- split: train
path: vi-250k/train-*
- config_name: vi-2m
data_files:
- split: train
path: vi-2m/train-*
---
提供机构:
minhnguyent546



