jp1924/Laion2BMultiKoreanSubset

Name: jp1924/Laion2BMultiKoreanSubset
Creator: jp1924
Published: 2024-07-15 14:08:08
License: 暂无描述

Hugging Face2024-07-15 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/jp1924/Laion2BMultiKoreanSubset

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个特征，如id、image、caption、caption_ls、category、language、height、width、nsfw、license和similarity。数据集主要用于训练，包含8,753,917个样本，总大小为583,353,284,887.375字节。数据集的下载大小为579,973,806,930字节。数据集的配置文件中指定了训练数据的路径。

The dataset contains multiple features such as id, image, caption, caption_ls, category, language, height, width, nsfw, license, and similarity. The dataset is primarily used for training and contains 8,753,917 samples with a total size of 583,353,284,887.375 bytes. The download size of the dataset is 579,973,806,930 bytes. The configuration file of the dataset specifies the path to the training data.

提供机构：

jp1924

原始信息汇总

数据集概述

数据集信息

特征

id: 整数类型 (int64)
image: 图像类型 (image)
caption: 字符串类型 (string)
caption_ls: 字符串列表 (list: string)
category: 字符串类型 (string)
language: 字符串类型 (string)
height: 整数类型 (int32)
width: 整数类型 (int32)
nsfw: 字符串类型 (string)
license: 字符串类型 (string)
similarity: 浮点数类型 (float32)

数据分割

train:
- 字节数: 583353284887.375
- 样本数: 8753917

数据集大小

下载大小: 579973806930
数据集大小: 583353284887.375

配置

config_name: default
- data_files:
  - split: train
  - path: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集