LAION-SG
收藏魔搭社区2026-01-06 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/LAION-SG
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for LAION-SG
<!-- Provide a quick summary of the dataset. -->
<!-- This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1).
-->
LAION-SG is a large-scale dataset with high-quality structural annotations of scene graphs (SG), which precisely describe attributes and relationships of multiple objects, effectively representing the semantic structure in complex scenes.
## Dataset Details
<!-- ### Dataset Description -->
<!-- Provide a longer summary of what this dataset is. -->
<!-- - **Curated by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed] -->
- **Language(s) :** All of annotations use English as primary language.
- **License:** MIT License.
<!-- ### Dataset Sources [optional] -->
<!-- Provide the basic links for the dataset. -->
- **Repository:** https://github.com/mengcye/LAION-SG?tab=readme-ov-file
- **Paper:** https://arxiv.org/abs/2412.08580
<!-- - **Demo [optional]:** [More Information Needed] -->
- LAION-SG has 6.39 objects per sample, excluding abstract proper nouns and focusing on specific nouns that reflect true semantic relationships. LAION-SG contains 20% more object information than the original LAION-Aesthetics dataset, and this advantage increases to 216% when excluding proper nouns.
- The average annotation length for our scene graphs and original captions is 32.2 and 19.0, which reflecting SGs contain richer information in a more compact form.
- The annotation accuracy of the scene graph is also higher than that of the original captions. For details, please refer to the paper.
### Data Splits
- A total of 540,005 SG-image pairs annotated with objects, attributes, and relationships.
- 480,005 samples for taining
- 10,000 samples for validation
- 50,000 samples for test
## Uses
1. **Download the annotation files and processing code.**
Download the `dataset` folder and the `code` folder to your local machine.
2. **Download images for LAION-SG dataset.**
Due to copyright issues, we are unable to provide the image files of the dataset. Please download the required images for the LAION-SG dataset using the URLs provided in the three JSON files under the `dataset/` directory. All images should be stored in the `data` folder, as specified by `args.image_dir`.
*A reference download method:*
For LAION-Aesthetics-V2-6.5plus: Our images and labels are sourced from https://dagshub.com/DagsHub-Datasets/LAION-Aesthetics-V2-6.5plus.
Additionally, the website provides a TSV file containing the labels and download links for the image data. You can visit the website and refer to the “Example usage” section to learn how to obtain this data.
4. **Use the LAION-SG dataset in your project.**
```
from laion_dataset import LAIONSceneGraphDataset, build_laion_loaders
from configs_laion import parse_args
...
def main():
...
args = parse_args()
train_dataloader, val_dataloader = build_laion_loaders(args)
...
```
The provided `configs_laion` is an example configuration file. Please modify it to match your own settings.
<!--只提供json里面的url,不提供我们自己的images -->
<!-- Address questions around how the dataset is intended to be used. -->
<!-- ### Direct Use -->
<!-- This section describes suitable use cases for the dataset. -->
<!-- [More Information Needed] -->
<!-- ### Out-of-Scope Use -->
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
<!-- [More Information Needed] -->
## Dataset Structure
### Data Instances
An example is as follows.
```
{
"img_id": "482027",
"name": "8040361340228797010.jpg",
"caption_ori": "Yosemite Falls reflection in Merced River, Yosemite, California Poster by Tom Norring / Danita Delimont for $102.50 CAD",
"score": "6.669650077819824",
"url": "http://images.artgalore.ca/artgalore_images/PDD/US05TNO0060.jpg",
"items": [
{
"item_id": 0,
"label": "mountains",
"attributes": [
"rocky",
"tall"
],
"global_item_id": 3201429
},
{
"item_id": 1,
"label": "trees",
"attributes": [
"leafless",
"slender"
],
"global_item_id": 3201430
},
{
"item_id": 2,
"label": "trees",
"attributes": [
"leafless",
"slender"
],
"global_item_id": 3201431
},
{
"item_id": 3,
"label": "snow",
"attributes": [
"white",
"cold"
],
"global_item_id": 3201432
},
{
"item_id": 4,
"label": "river",
"attributes": [
"reflective",
"calm"
],
"global_item_id": 3201433
}
],
"relations": [
{
"triple_id": 0,
"item1": 3,
"relation": "adjacent to",
"item2": 4,
"global_relation_id": 2118313
},
{
"triple_id": 1,
"item1": 1,
"relation": "growing near",
"item2": 4,
"global_relation_id": 2118314
},
{
"triple_id": 2,
"item1": 2,
"relation": "growing near",
"item2": 4,
"global_relation_id": 2118315
}
]
},
```
### Data Fields
- ```"img_id"```: Unique numeric ID of the image.
- ```"name"```: Name of source image.
- ```"caption_ori"```: Original caption of the image in LAION-Aesthetics.
- ```"score"```: Aesthetic score of the image.
- ```"url"```: URL of source image.
- ```"items"```: List of objects recognized in the image.
- ```"item_id"```: Unique numeric ID of the object in current image.
- ```"label"```: Label of the object.
- ```"attributes"```: List of attributes of the object.
- ```"global_item_id"```: Unique numeric ID of the object in all images in LAION-SG.
- ```"relations"```: List of relations recognized in the image.
- ```"triple_id"```: Unique numeric ID of the relation in current image.
- ```"item1"```: The item_id of the subject in scene graph triplet <subject, relation, object>.
- ```"relation"```: The relation between the subject and the object in scene graph triplet <subject, relation, object>.
- ```"item2"```: The item_id of the object in scene graph triplet <subject, relation, object>.
- ```"global_relation_id"```: Unique numeric ID of the relation in all images in LAION-SG.
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
<!-- [More Information Needed] -->
## Dataset Creation
### Source Data
All images are from the LAION-Aestheics V2 (6.5+) dataset.
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
#### Data Collection and Processing

From the paper:
> Our LAION-SG dataset is built on high-quality images in LAION-Aesthetic V2 (6.5+) with automated annotation performed using GPT-4o. LAION-Aesthetics V2 (6.5+) is a subset of LAION-5B, comprising 625,000 image-text pairs with predicted aesthetic scores over 6.5, curated using the LAION-Aesthetics Predictor V2 model.
> During our construction, only 540,005 images are available in the dataset due to copyright or other issues.
>
> Through prompt engineering, we devised a set of specific requirements for scene graph annotations to ensure comprehensiveness, systematic structure, and precision in the annotation results. The above figure illustrates the detailed construction pipeline of LAION-SG.
> Each component plays a crucial role in achieving high-quality automated annotation.
>
> First, as scene graphs typically contain multiple objects and their relations, the prompt requires “identification of as many objects, attributes, and their relations within the image as possible”.
> This design encourages that all objects and interactions in a scene are annotated.
> Each object is assigned a unique ID, even for multiple objects of the same type, ensuring that the entirety of the scene's structure and hierarchy is accurately represented.
>
> Second, the attribute section mandates that each object must have at least one abstract adjective attribute, while avoiding the use of other objects as attributes. This design is especially important in complex scenes as it helps differentiate objects' appearance, state, and characteristics from the background and other elements, maintaining consistency and clarity in annotations.
> By avoiding the confusion between specific objects and abstract attributes, the annotations become more interpretable and generalizable.
>
> In the relation section, we specify the use of concrete verbs to describe relations between objects rather than relying solely on spatial orientation.
> This is because relations are often more critical in scene graphs than mere spatial information.
> By using precise verbs like “standing on” or “holding”, we capture dynamic interactions within the scene, which is essential for complex scene generation.
>
> Leveraging these prompts with the multimodal large language model GPT-4o, we generate annotations representing scene graphs.
> Our annotation is expect to achieve accuracy for every object, attribute, and relationship, thoroughly covering each detail in the scene and providing robust data support for subsequent compositional image generation tasks.
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
<!-- [More Information Needed] -->
<!-- #### Who are the source data producers? -->
<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
<!-- [More Information Needed] -->
<!-- ### Annotations -->
<!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. -->
<!-- #### Annotation process -->
<!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. -->
<!-- [More Information Needed] -->
<!-- #### Who are the annotators? -->
<!-- This section describes the people or systems who created the annotations. -->
<!-- [More Information Needed] -->
<!-- #### Personal and Sensitive Information -->
<!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. -->
<!-- [More Information Needed] -->
<!-- ## Bias, Risks, and Limitations -->
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
<!-- [More Information Needed] -->
<!-- ### Recommendations -->
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
<!-- Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. -->
## Citation
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
```
@article{li2024laion,
title={LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations},
author={Li, Zejian and Meng, Chenye and Li, Yize and Yang, Ling and Zhang, Shengyuan and Ma, Jiarui and Li, Jiayi and Yang, Guang and Yang, Changyuan and Yang, Zhiyuan and others},
journal={arXiv preprint arXiv:2412.08580},
year={2024}
}
```
<!-- **APA:**
[More Information Needed]
## Glossary [optional]
If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card.
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Dataset Card Authors [optional]
[More Information Needed]
## Dataset Card Contact
[More Information Needed] -->
# LAION-SG 数据集卡片
<!-- 提供数据集的简要概述。 -->
<!-- 此数据集卡片旨在作为新数据集的基础模板,其基于[此原始模板](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1)生成。 -->
LAION-SG是一个包含高质量场景图(Scene Graph, SG)结构化标注的大规模数据集,可精准描述多物体的属性与关联关系,有效表征复杂场景中的语义结构。
## 数据集详情
<!-- ### 数据集描述 -->
<!-- 提供数据集的更详细概述。 -->
<!-- - **整理者:** [需要更多信息]
- **资助方 [可选]:** [需要更多信息]
- **分享方 [可选]:** [需要更多信息] -->
- **语言:** 所有标注均以英语作为主要语言。
- **许可证:** MIT许可证(MIT License)。
<!-- ### 数据集来源 [可选] -->
<!-- 提供数据集的基础链接。 -->
- **代码仓库:** https://github.com/mengcye/LAION-SG?tab=readme-ov-file
- **论文:** https://arxiv.org/abs/2412.08580
<!-- - **演示 [可选]:** [需要更多信息] -->
LAION-SG每个样本平均包含6.39个物体,剔除抽象专有名词后聚焦于体现真实语义关联的具体名词。相较于原始LAION-Aesthetics数据集,LAION-SG的物体信息总量多出20%;若剔除专有名词,这一优势将提升至216%。
我们的场景图与原始标题的平均标注长度分别为32.2与19.0,这表明场景图以更紧凑的形式承载了更丰富的信息。
场景图的标注准确率也高于原始标题,详细内容请参阅论文。
### 数据划分
- 总计540,005组带有物体、属性及关联标注的场景图-图像对。
- 480,005组样本用于训练
- 10,000组样本用于验证
- 50,000组样本用于测试
## 使用方法
1. **下载标注文件与处理代码。**
将`dataset`文件夹与`code`文件夹下载至本地机器。
2. **下载LAION-SG数据集的图像文件。**
由于版权问题,我们无法提供本数据集的图像文件。请通过`dataset/`目录下三个JSON文件中提供的URL下载LAION-SG数据集所需的图像。所有图像需存储在由`args.image_dir`指定的`data`文件夹中。
*参考下载方法:*
针对LAION-Aesthetics-V2-6.5plus:我们的图像与标签来源于https://dagshub.com/DagsHub-Datasets/LAION-Aesthetics-V2-6.5plus。该网站同时提供了包含标签与图像下载链接的TSV文件。你可以访问该网站并参考“示例用法”部分了解如何获取此数据。
4. **在你的项目中使用LAION-SG数据集。**
from laion_dataset import LAIONSceneGraphDataset, build_laion_loaders
from configs_laion import parse_args
...
def main():
...
args = parse_args()
train_dataloader, val_dataloader = build_laion_loaders(args)
...
提供的`configs_laion`为示例配置文件,请根据你的需求修改以适配自身设置。
<!--仅提供JSON中的URL,不提供我们自己的images -->
<!-- 解答关于数据集预期用途的问题。 -->
<!-- ### 直接使用 -->
<!-- 此部分描述数据集的合适应用场景。 -->
<!-- [需要更多信息] -->
<!-- ### 超出范围的使用 -->
<!-- 此部分说明误用、恶意使用以及本数据集无法很好适配的使用场景。 -->
<!-- [需要更多信息] -->
## 数据集结构
### 数据实例
示例如下。
{
"img_id": "482027",
"name": "8040361340228797010.jpg",
"caption_ori": "Yosemite Falls reflection in Merced River, Yosemite, California Poster by Tom Norring / Danita Delimont for $102.50 CAD",
"score": "6.669650077819824",
"url": "http://images.artgalore.ca/artgalore_images/PDD/US05TNO0060.jpg",
"items": [
{
"item_id": 0,
"label": "mountains",
"attributes": [
"rocky",
"tall"
],
"global_item_id": 3201429
},
{
"item_id": 1,
"label": "trees",
"attributes": [
"leafless",
"slender"
],
"global_item_id": 3201430
},
{
"item_id": 2,
"label": "trees",
"attributes": [
"leafless",
"slender"
],
"global_item_id": 3201431
},
{
"item_id": 3,
"label": "snow",
"attributes": [
"white",
"cold"
],
"global_item_id": 3201432
},
{
"item_id": 4,
"label": "river",
"attributes": [
"reflective",
"calm"
],
"global_item_id": 3201433
}
],
"relations": [
{
"triple_id": 0,
"item1": 3,
"relation": "adjacent to",
"item2": 4,
"global_relation_id": 2118313
},
{
"triple_id": 1,
"item1": 1,
"relation": "growing near",
"item2": 4,
"global_relation_id": 2118314
},
{
"triple_id": 2,
"item1": 2,
"relation": "growing near",
"item2": 4,
"global_relation_id": 2118315
}
]
},
### 数据字段
- "img_id": 图像的唯一数字ID。
- "name": 源图像的文件名。
- "caption_ori": LAION-Aesthetics中的原始图像标题。
- "score": 图像的美学评分。
- "url": 源图像的URL。
- "items": 图像中识别出的物体列表。
- "item_id": 当前图像中物体的唯一数字ID。
- "label": 物体的标签。
- "attributes": 物体的属性列表。
- "global_item_id": LAION-SG所有图像中物体的唯一数字ID。
- "relations": 图像中识别出的关联关系列表。
- "triple_id": 当前图像中关联关系的唯一数字ID。
- "item1": 场景图三元组<主体,关联关系,客体>中主体的item_id。
- "relation": 场景图三元组<主体,关联关系,客体>中主体与客体的关联关系。
- "item2": 场景图三元组<主体,关联关系,客体>中客体的item_id。
- "global_relation_id": LAION-SG所有图像中关联关系的唯一数字ID。
<!-- 此部分提供数据集字段的描述,以及有关数据集结构的额外信息,例如数据划分所使用的标准、数据点之间的关系等。 -->
<!-- [需要更多信息] -->
## 数据集构建
### 源数据
所有图像均来源于LAION-Aesthetics V2 (6.5+)数据集。
<!-- 此部分描述源数据(例如新闻文本与标题、社交媒体帖子、翻译句子等)。 -->
#### 数据收集与处理

摘自论文:
> 我们的LAION-SG数据集构建自LAION-Aesthetics V2 (6.5+)中的高质量图像,通过GPT-4o完成自动化标注。LAION-Aesthetics V2 (6.5+)是LAION-5B的一个子集,包含625,000组图像-文本对,其美学评分预测值超过6.5,通过LAION-Aesthetics Predictor V2模型筛选得到。在构建过程中,由于版权及其他问题,最终仅保留540,005张图像。
>
> 通过提示工程,我们为场景图标注制定了一系列具体要求,以确保标注结果的全面性、系统性结构与精准性。上图展示了LAION-SG的详细构建流程。每个环节都对实现高质量自动化标注起到了关键作用。
>
> 首先,由于场景图通常包含多个物体及其关联关系,提示词要求“尽可能识别图像中的所有物体、属性及其关联关系”。该设计可确保场景中的所有物体与交互均被标注。每个物体都会被分配唯一ID,即便同一类型的多个物体也不例外,以此精准还原场景的整体结构与层级关系。
>
> 其次,属性模块要求每个物体至少包含一个抽象形容词属性,且禁止将其他物体作为属性。该设计在复杂场景中尤为重要,可帮助区分物体的外观、状态与特征与背景及其他元素,保持标注的一致性与清晰度。通过规避具体物体与抽象属性的混淆,标注结果将更具可解释性与泛化能力。
>
> 在关联关系模块,我们明确要求使用具体动词描述物体间的关联,而非仅依赖空间方位。这是因为在场景图中,关联关系通常比单纯的空间信息更为关键。通过使用“站在……上”“持有……”等精准动词,我们可捕捉场景中的动态交互,这对复杂场景生成任务至关重要。
>
> 借助这些提示词与多模态大语言模型GPT-4o,我们生成了代表场景图的标注。我们的标注旨在实现每个物体、属性与关联关系的准确性,全面覆盖场景中的每一处细节,为后续的组合式图像生成任务提供可靠的数据支撑。
<!-- 此部分描述数据收集与处理流程,例如数据选择标准、过滤与归一化方法、使用的工具与库等。 -->
<!-- [需要更多信息] -->
<!-- #### 源数据生产者是谁? -->
<!-- 此部分描述最初创建数据的个人或系统。若有相关信息,还应包含源数据创作者自行报告的人口统计或身份信息。 -->
<!-- [需要更多信息] -->
<!-- ### 标注 -->
<!-- 若数据集包含不属于初始数据收集的标注,请使用此部分描述它们。 -->
<!-- #### 标注流程 -->
<!-- 此部分描述标注流程,例如过程中使用的标注工具、标注的数据量、提供给标注员的标注指南、标注者间统计数据、标注验证等。 -->
<!-- [需要更多信息] -->
<!-- #### 标注者是谁? -->
<!-- 此部分描述创建标注的个人或系统。 -->
<!-- [需要更多信息] -->
<!-- #### 个人与敏感信息 -->
<!-- 说明数据集是否包含可能被视为个人、敏感或私密的数据(例如揭示地址、唯一可识别的姓名或别名、种族或族裔起源、性取向、宗教信仰、政治观点、财务或健康数据等)。若已采取匿名化措施,请描述匿名化流程。 -->
<!-- [需要更多信息] -->
<!-- ## 偏差、风险与局限性 -->
<!-- 此部分旨在传达技术与社会技术层面的局限性。 -->
<!-- [需要更多信息] -->
<!-- ### 建议 -->
<!-- 此部分旨在传达关于偏差、风险与技术局限性的建议。 -->
<!-- 用户应了解数据集的风险、偏差与局限性。需要更多信息以提供进一步建议。 -->
## 引用
<!-- 若有介绍该数据集的论文或博客文章,此处应包含其APA与Bibtex格式的引用信息。 -->
**BibTeX:**
@article{li2024laion,
title={LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations},
author={Li, Zejian and Meng, Chenye and Li, Yize and Yang, Ling and Zhang, Shengyuan and Ma, Jiarui and Li, Jiayi and Yang, Guang and Yang, Changyuan and Yang, Zhiyuan and others},
journal={arXiv preprint arXiv:2412.08580},
year={2024}
}
<!-- **APA:**
[需要更多信息]
## 术语表 [可选]
若有需要,请在此部分包含可帮助读者理解数据集或数据集卡片的术语与计算方法。
[需要更多信息]
## 更多信息 [可选]
[需要更多信息]
## 数据集卡片作者 [可选]
[需要更多信息]
## 数据集卡片联系人
[需要更多信息]
提供机构:
maas
创建时间:
2024-12-14



