kimnt93/viet-laion-c2

Name: kimnt93/viet-laion-c2
Creator: kimnt93
Published: 2024-04-20 08:53:36
License: 暂无描述

Hugging Face2024-04-20 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/kimnt93/viet-laion-c2

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 54943555 num_examples: 38183 download_size: 27470534 dataset_size: 54943555 configs: - config_name: default data_files: - split: train path: data/train-* --- # viet-laion-c2 Dataset ## Description The viet-laion-c2 dataset is a subset of the Laion-400 dataset, specifically curated to include only Vietnamese text samples. The Laion-400 dataset is a collection of text samples in various languages, and the viet-laion-c2 dataset contains only the Vietnamese samples from the original dataset. ## Usage This dataset can be used for various natural language processing tasks such as text classification, sentiment analysis, language modeling, and more. Researchers and developers interested in Vietnamese language processing can leverage this dataset to train and evaluate models for specific tasks. ## Source The viet-laion-c2 dataset is derived from the Laion-400 dataset, which can be found at [Laion-400 Open Dataset](https://laion.ai/blog/laion-400-open-dataset/). The subset containing Vietnamese text samples was extracted to create the viet-laion-c2 dataset. ## License The viet-laion-c2 dataset is provided under the terms specified by the original source of the Laion-400 dataset. Please refer to the license information provided by the original source.

提供机构：

kimnt93

原始信息汇总

viet-laion-c2 数据集概述

数据集特征

instruction：数据类型为字符串。
input：数据类型为字符串。
output：数据类型为字符串。

数据集分割

train：包含38183个样本，总字节数为54943555。

数据集大小

下载大小：27470534字节。
数据集大小：54943555字节。

配置

config_name: default
data_files:
- split: train
- path: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集