laion-pop-llama3.2-11b

Hugging Face2024-10-13 更新2024-12-12 收录

下载链接：

https://huggingface.co/datasets/CaptionEmporium/laion-pop-llama3.2-11b

下载链接

链接失效反馈

官方服务：

资源简介：

laion-pop-llama3.2-11b数据集包含1,580,595个为laion/laion-pop数据集中图像生成的新合成描述。该数据集通过过滤掉nsfw_prediction值大于或等于0.995的图像，仅包含SFW图像。长描述由meta-llama/Llama-3.2-11B-Vision-Instruct模型生成，中短描述则由meta-llama/Llama-3.1-8B-Instruct模型生成。数据集旨在用于训练文本到图像模型和其他机器学习任务。示例数据实例展示了数据集的结构，并讨论了偏见和局限性，如描述的正确性未经人工验证，以及由于基于alt-text的接地可能导致幻觉。数据集采用CC BY-SA 4.0许可证。

The laion-pop-llama3.2-11b dataset contains 1,580,595 newly synthesized descriptions generated for the images in the laion/laion-pop dataset. This dataset exclusively contains SFW images after filtering out those with an nsfw_prediction value greater than or equal to 0.995. Long descriptions are generated by the meta-llama/Llama-3.2-11B-Vision-Instruct model, while medium and short descriptions are generated by the meta-llama/Llama-3.1-8B-Instruct model. This dataset is intended for training text-to-image models and other machine learning tasks. Example data instances demonstrate the dataset structure and discuss its biases and limitations, including that the correctness of the descriptions has not been manually verified, and that hallucinations may arise from alt-text-based grounding. This dataset is released under the CC BY-SA 4.0 license.

创建时间：

2024-10-13

原始信息汇总

数据集卡片：laion-pop-llama3.2-11b

数据集描述

数据集概述

数据集名称: laion-pop-llama3.2-11b
数据类型: 图像文本数据集
数据来源: laion/laion-pop
数据规模: 1,580,595 条合成描述
过滤标准: 仅包含 SFW（适合工作环境）图像，过滤掉 nsfw_prediction 大于或等于 0.995 的图像
描述生成模型:
- 长描述：使用 meta-llama/Llama-3.2-11B-Vision-Instruct
- 中短描述：使用 meta-llama/Llama-3.1-8B-Instruct
描述基础: 使用原始图像的 alt_text 字段进行描述生成

语言

主要语言: 英语
其他语言: 偶尔包含图像中的其他语言文本

预期用途

用于训练文本到图像模型和其他机器学习任务

数据分割

数据集名称	训练集大小
laion-pop-llama3.2-11b	526865

数据集创建

数据集生成

生成提示: py prompt_gen = lambda txt :f""" 请使用尽可能多的段落详细描述这张图片。如果你看到文本或物体，请详细描述它们以及前景和背景的任何其他方面。作为提示，这里是图像的 alt-text，可能与图像有关，也可能无关：

提示:
```
{txt}
```
不要在描述中引用 alt-text。

开始描述: """
描述简化: py prompt_shorten = lambda img_prompt: f""" 你有两个任务：从长描述中准备一个长段落描述和一个短描述。

描述供你提炼

以下是要提炼的描述。
```
{img_prompt}
```
任务
1. 将其提炼成一个足够描述图像所有细节的长段落。删除任何冗余的行或描述。
2. 将其提炼成一个最多 25 个单词的单句。删除任何冗余的行或描述。
请仅写两个描述，每行一个。 """

数据实例

示例行: py {alt_txt: Aviões Caravela e Super, Portela (A. Cunha, 1965), alt_txt_similarity: 0.5682529211044312, caption_long_llama32: A tranquil scene on the tarmac of an airport, showcasing aircraft and their surroundings.
```
                     in the foreground, on the left side, a white van with a logo and the word "manutencao" in white letters is parked, facing right. to its left, a propeller plane is positioned, with a smaller plane in front of it, featuring the words "aviôes caravela" and "super" in red text, accompanied by a row of red dots along its body. two more planes, similar in design, are visible in the distance, with a white, low-slung structure between them, possibly a fueling station.


                     the background of the image features a hazy, flat landscape, likely a grassy field, and the horizon, with buildings visible in the distance. the sky above is a gradient of blue and yellow, suggesting either dawn or dusk. the overall atmosphere is one of serenity, with the vehicles and aircraft arranged in a harmonious and organized manner.,
```
caption_medium_llama32: A tranquil scene on an airport tarmac features a white van with the logo "manutencao" parked next to a propeller plane, with a smaller "Aviôes Caravela Super" plane in front of it. The image also includes two more planes in the distance, a low-slung structure, and a hazy landscape with buildings and a blue-yellow sky., caption_short_llama32: A tranquil airport scene features a white van, propeller plane, and smaller "Aviôes Caravela Super" plane, set against a hazy landscape and blue-yellow sky., cogvlm_caption: an airport tarmac during what appears to be the early evening or dawn, with the sky painted in hues of pink and blue. Several airplanes are parked, with one prominently displaying the British Airways logo. In the foreground, theres a vehicle labeled WATERWAY and a few ground support equipment items scattered around. The overall atmosphere of the image is calm and serene, capturing a moment of stillness in the bustling world of aviation., exif: {"Image ExifOffset": "26", "EXIF ColorSpace": "sRGB"}, height: 873.0, key: 005026291, llava_caption: a row of four airplanes parked on a runway. The airplanes are lined up next to each other, with two of them being larger and two of them being smaller. The airplanes are positioned in a way that they are all visible in the frame, with one of the larger airplanes partially covering the smaller one. The sky in the background is cloudy, adding to the overall atmosphere of the scene. The image is a black and white photograph, which gives it a classic and timeless appearance., nsfw_prediction: 7.049842679407448e-05, url: http://0.fotos.web.sapo.io/i/o71082df4/19518980_wr5yy.jpeg, width: 1280.0}

偏见讨论

偏见来源: 数据集内容和 LLaMA 3.2 的训练数据

已知限制

描述准确性: 描述未经过手动验证，可能存在错误
基础描述: 使用 alt-text 可能导致基于建议性的幻觉
安全性: 仅使用 nsfw_prediction 过滤图像，可能仍存在不安全的图像

附加信息

数据集策展人

Caption Emporium
laion

许可信息

许可协议: Creative Commons ShareAlike (CC BY-SA 4.0)

引用信息

@misc{laion-pop-llama3.2-11b, author = { Caption Emporium }, title = {laion-pop-llama3.2-11b}, year = {2024}, publisher = {Huggingface}, journal = {Huggingface repository}, howpublished = {url{https://huggingface.co/datasets/CaptionEmporium/laion-pop-llama3.2-11b}}, }

搜集汇总

数据集介绍

构建方式

laion-pop-llama3.2-11b数据集的构建基于laion/laion-pop图像数据集，通过筛选出`nsfw_prediction`值低于0.995的图像，确保内容的安全性。随后，利用meta-llama/Llama-3.2-11B-Vision-Instruct模型生成长描述，并通过meta-llama/Llama-3.1-8B-Instruct模型进一步生成中短描述。所有描述均基于图像的`alt_text`字段进行生成，确保了描述的多样性和丰富性。

特点

该数据集包含1,580,595条合成描述，涵盖了长、中、短三种不同长度的文本描述，适用于多种文本生成任务。数据集以英语为主，偶尔包含其他语言的文本转录。其独特之处在于通过多轮模型生成和优化，确保了描述的详细性和准确性，同时通过严格的图像筛选机制，保证了数据的安全性。

使用方法

laion-pop-llama3.2-11b数据集主要用于训练文本到图像生成模型及其他机器学习任务。用户可以通过HuggingFace平台直接访问数据集，并利用其丰富的描述文本进行模型训练和评估。数据集的结构清晰，包含图像的元信息及多种长度的描述，便于用户根据需求进行灵活应用。

背景与挑战

背景概述

laion-pop-llama3.2-11b数据集由Caption Emporium于2024年创建，旨在为laion/laion-pop数据集中的图像生成高质量的合成描述。该数据集基于meta-llama/Llama-3.2-11B-Vision-Instruct模型生成长描述，并通过meta-llama/Llama-3.1-8B-Instruct模型进一步生成中短描述。数据集的核心研究问题在于如何通过自动化手段生成多样且准确的图像描述，以支持文本到图像模型的训练。该数据集在计算机视觉和自然语言处理交叉领域具有重要影响力，特别是在图像描述生成任务中，为研究者提供了丰富的实验数据。

当前挑战

laion-pop-llama3.2-11b数据集在构建过程中面临多重挑战。首先，自动化生成的描述可能存在错误，且未经过人工验证，这可能导致描述与图像内容不完全匹配。其次，基于alt_text的提示可能引发模型产生与图像无关的幻觉描述，影响描述的准确性。此外，尽管通过nsfw_prediction过滤了部分不安全内容，但仍可能存在未被完全过滤的不适宜图像。这些挑战不仅影响了数据集的质量，也对依赖该数据集进行模型训练的研究者提出了更高的验证和修正要求。

常用场景

经典使用场景

在计算机视觉与自然语言处理的交叉领域，laion-pop-llama3.2-11b数据集为图像与文本的生成任务提供了丰富的资源。该数据集通过LLaMA模型生成了大量高质量的图像描述，广泛应用于文本到图像生成模型的训练。研究人员利用这些合成描述，能够有效提升模型在图像理解与描述生成任务中的表现，尤其是在处理复杂场景时，模型能够生成更为细致和准确的文本描述。

衍生相关工作

基于laion-pop-llama3.2-11b数据集，研究人员开发了多种先进的文本到图像生成模型。例如，一些研究利用该数据集训练了基于LLaMA的多模态模型，显著提升了图像描述生成任务的性能。此外，该数据集还启发了对图像描述生成任务中模型偏差与局限性的深入研究，推动了多模态学习领域的发展。相关研究不仅提升了模型的生成能力，还为解决图像描述生成中的实际问题提供了新的思路。

数据集最近研究