kimnt93/viet-laion-c2
收藏Hugging Face2024-04-20 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/kimnt93/viet-laion-c2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 54943555
num_examples: 38183
download_size: 27470534
dataset_size: 54943555
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# viet-laion-c2 Dataset
## Description
The viet-laion-c2 dataset is a subset of the Laion-400 dataset, specifically curated to include only Vietnamese text samples. The Laion-400 dataset is a collection of text samples in various languages, and the viet-laion-c2 dataset contains only the Vietnamese samples from the original dataset.
## Usage
This dataset can be used for various natural language processing tasks such as text classification, sentiment analysis, language modeling, and more. Researchers and developers interested in Vietnamese language processing can leverage this dataset to train and evaluate models for specific tasks.
## Source
The viet-laion-c2 dataset is derived from the Laion-400 dataset, which can be found at [Laion-400 Open Dataset](https://laion.ai/blog/laion-400-open-dataset/). The subset containing Vietnamese text samples was extracted to create the viet-laion-c2 dataset.
## License
The viet-laion-c2 dataset is provided under the terms specified by the original source of the Laion-400 dataset. Please refer to the license information provided by the original source.
提供机构:
kimnt93
原始信息汇总
viet-laion-c2 数据集概述
数据集特征
- instruction:数据类型为字符串。
- input:数据类型为字符串。
- output:数据类型为字符串。
数据集分割
- train:包含38183个样本,总字节数为54943555。
数据集大小
- 下载大小:27470534字节。
- 数据集大小:54943555字节。
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*



