KBlueLeaf/danbooru2023-florence2-caption
收藏Hugging Face2024-08-30 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/KBlueLeaf/danbooru2023-florence2-caption
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含由microsoft/Florence-2-large生成的Danbooru2023图像的描述。数据集格式为parquet,包含图像的Danbooru ID和解析后的Florence 2输出。统计信息部分详细描述了两种任务(MORE_DETAILED_CAPTION和DETAILED_CAPTION)的条目数、输出令牌的统计信息、输出格式以及生成时间成本。
The Danbooru2023 - Florence2 Caption dataset contains captions of danbooru2023 images generated by microsoft/Florence-2-large. This dataset is suitable for text-to-image, image-to-text, and text-generation tasks. The data format is parquet, including the danbooru ID of the image and the parsed Florence 2 output. The dataset is divided into two types: MORE_DETAILED_CAPTION and DETAILED_CAPTION, with 7,438,449 and 7,439,002 entries respectively. Detailed statistics on output formats and token counts are provided for each type. The dataset is licensed under Apache-2.0.
提供机构:
KBlueLeaf
原始信息汇总
Danbooru2023 - Florence2 Caption dataset
概述
- 任务类别:
- 文本到图像
- 图像到文本
- 文本生成
- 语言:
- 英语
- 数据集大小:
- 1M<n<10M
数据格式
- 格式: Parquet
- key: 图像的Danbooru ID
- parsed: 图像的Florence 2输出解析
统计信息
MORE_DETAILED_CAPTION
- 条目数: 7,438,449
- 输出令牌:
- Flan T5 Tokenizer: 19/736/120/114
- DFN CLIP Tokenizer: 19/826/108.7/103
- Qwen2 Tokenizer: 17/883/106.8/101
- 输出格式:
- "The image shows ...": 690,027
- "The image is ... of ...": 6,665,897
- 其他: 82,525
- 时间成本: 约7~10天(4x3090)
DETAILED_CAPTION
- 条目数: 7,439,002
- 输出令牌:
- Flan T5 Tokenizer: 10/649/56.67/55
- DFN CLIP Tokenizer: 10/742/51.06/49
- Qwen2 Tokenizer: 8/871/49.47/48
- 输出格式:
- "The image shows ...": 5,739,496
- "This is an ...": 1,634,386
- 其他: 65,120
- 时间成本: 约4~5天(4x3090)
许可证
- 许可证: Apache-License 2.0



