下载链接：

https://modelscope.cn/datasets/AI-ModelScope/LAION-SG

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for LAION-SG   LAION-SG is a large-scale dataset with high-quality structural annotations of scene graphs (SG), which precisely describe attributes and relationships of multiple objects, effectively representing the semantic structure in complex scenes. ## Dataset Details    - **Language(s) :** All of annotations use English as primary language. - **License:** MIT License.   - **Repository:** https://github.com/mengcye/LAION-SG?tab=readme-ov-file - **Paper:** https://arxiv.org/abs/2412.08580  - LAION-SG has 6.39 objects per sample, excluding abstract proper nouns and focusing on specific nouns that reflect true semantic relationships. LAION-SG contains 20% more object information than the original LAION-Aesthetics dataset, and this advantage increases to 216% when excluding proper nouns. - The average annotation length for our scene graphs and original captions is 32.2 and 19.0, which reflecting SGs contain richer information in a more compact form. - The annotation accuracy of the scene graph is also higher than that of the original captions. For details, please refer to the paper. ### Data Splits - A total of 540,005 SG-image pairs annotated with objects, attributes, and relationships. - 480,005 samples for taining - 10,000 samples for validation - 50,000 samples for test ## Uses 1. **Download the annotation files and processing code.** Download the `dataset` folder and the `code` folder to your local machine. 2. **Download images for LAION-SG dataset.** Due to copyright issues, we are unable to provide the image files of the dataset. Please download the required images for the LAION-SG dataset using the URLs provided in the three JSON files under the `dataset/` directory. All images should be stored in the `data` folder, as specified by `args.image_dir`. *A reference download method:* For LAION-Aesthetics-V2-6.5plus: Our images and labels are sourced from https://dagshub.com/DagsHub-Datasets/LAION-Aesthetics-V2-6.5plus. Additionally, the website provides a TSV file containing the labels and download links for the image data. You can visit the website and refer to the “Example usage” section to learn how to obtain this data. 4. **Use the LAION-SG dataset in your project.** ``` from laion_dataset import LAIONSceneGraphDataset, build_laion_loaders from configs_laion import parse_args ... def main(): ... args = parse_args() train_dataloader, val_dataloader = build_laion_loaders(args) ... ``` The provided `configs_laion` is an example configuration file. Please modify it to match your own settings.         ## Dataset Structure ### Data Instances An example is as follows. ``` { "img_id": "482027", "name": "8040361340228797010.jpg", "caption_ori": "Yosemite Falls reflection in Merced River, Yosemite, California Poster by Tom Norring / Danita Delimont for $102.50 CAD", "score": "6.669650077819824", "url": "http://images.artgalore.ca/artgalore_images/PDD/US05TNO0060.jpg", "items": [ { "item_id": 0, "label": "mountains", "attributes": [ "rocky", "tall" ], "global_item_id": 3201429 }, { "item_id": 1, "label": "trees", "attributes": [ "leafless", "slender" ], "global_item_id": 3201430 }, { "item_id": 2, "label": "trees", "attributes": [ "leafless", "slender" ], "global_item_id": 3201431 }, { "item_id": 3, "label": "snow", "attributes": [ "white", "cold" ], "global_item_id": 3201432 }, { "item_id": 4, "label": "river", "attributes": [ "reflective", "calm" ], "global_item_id": 3201433 } ], "relations": [ { "triple_id": 0, "item1": 3, "relation": "adjacent to", "item2": 4, "global_relation_id": 2118313 }, { "triple_id": 1, "item1": 1, "relation": "growing near", "item2": 4, "global_relation_id": 2118314 }, { "triple_id": 2, "item1": 2, "relation": "growing near", "item2": 4, "global_relation_id": 2118315 } ] }, ``` ### Data Fields - ```"img_id"```: Unique numeric ID of the image. - ```"name"```: Name of source image. - ```"caption_ori"```: Original caption of the image in LAION-Aesthetics. - ```"score"```: Aesthetic score of the image. - ```"url"```: URL of source image. - ```"items"```: List of objects recognized in the image. - ```"item_id"```: Unique numeric ID of the object in current image. - ```"label"```: Label of the object. - ```"attributes"```: List of attributes of the object. - ```"global_item_id"```: Unique numeric ID of the object in all images in LAION-SG. - ```"relations"```: List of relations recognized in the image. - ```"triple_id"```: Unique numeric ID of the relation in current image. - ```"item1"```: The item_id of the subject in scene graph triplet <subject, relation, object>. - ```"relation"```: The relation between the subject and the object in scene graph triplet <subject, relation, object>. - ```"item2"```: The item_id of the object in scene graph triplet <subject, relation, object>. - ```"global_relation_id"```: Unique numeric ID of the relation in all images in LAION-SG.   ## Dataset Creation ### Source Data All images are from the LAION-Aestheics V2 (6.5+) dataset.  #### Data Collection and Processing ![pipeline](https://huggingface.co/datasets/mengcy/LAION-SG/resolve/main/figure2-pipeline-2.png) From the paper: > Our LAION-SG dataset is built on high-quality images in LAION-Aesthetic V2 (6.5+) with automated annotation performed using GPT-4o. LAION-Aesthetics V2 (6.5+) is a subset of LAION-5B, comprising 625,000 image-text pairs with predicted aesthetic scores over 6.5, curated using the LAION-Aesthetics Predictor V2 model. > During our construction, only 540,005 images are available in the dataset due to copyright or other issues. > > Through prompt engineering, we devised a set of specific requirements for scene graph annotations to ensure comprehensiveness, systematic structure, and precision in the annotation results. The above figure illustrates the detailed construction pipeline of LAION-SG. > Each component plays a crucial role in achieving high-quality automated annotation. > > First, as scene graphs typically contain multiple objects and their relations, the prompt requires “identification of as many objects, attributes, and their relations within the image as possible”. > This design encourages that all objects and interactions in a scene are annotated. > Each object is assigned a unique ID, even for multiple objects of the same type, ensuring that the entirety of the scene's structure and hierarchy is accurately represented. > > Second, the attribute section mandates that each object must have at least one abstract adjective attribute, while avoiding the use of other objects as attributes. This design is especially important in complex scenes as it helps differentiate objects' appearance, state, and characteristics from the background and other elements, maintaining consistency and clarity in annotations. > By avoiding the confusion between specific objects and abstract attributes, the annotations become more interpretable and generalizable. > > In the relation section, we specify the use of concrete verbs to describe relations between objects rather than relying solely on spatial orientation. > This is because relations are often more critical in scene graphs than mere spatial information. > By using precise verbs like “standing on” or “holding”, we capture dynamic interactions within the scene, which is essential for complex scene generation. > > Leveraging these prompts with the multimodal large language model GPT-4o, we generate annotations representing scene graphs. > Our annotation is expect to achieve accuracy for every object, attribute, and relationship, thoroughly covering each detail in the scene and providing robust data support for subsequent compositional image generation tasks.                       ## Citation  **BibTeX:** ``` @article{li2024laion, title={LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations}, author={Li, Zejian and Meng, Chenye and Li, Yize and Yang, Ling and Zhang, Shengyuan and Ma, Jiarui and Li, Jiayi and Yang, Guang and Yang, Changyuan and Yang, Zhiyuan and others}, journal={arXiv preprint arXiv:2412.08580}, year={2024} } ```

# LAION-SG 数据集卡片   LAION-SG是一个包含高质量场景图（Scene Graph, SG）结构化标注的大规模数据集，可精准描述多物体的属性与关联关系，有效表征复杂场景中的语义结构。 ## 数据集详情    - **语言：** 所有标注均以英语作为主要语言。 - **许可证：** MIT许可证（MIT License）。   - **代码仓库：** https://github.com/mengcye/LAION-SG?tab=readme-ov-file - **论文：** https://arxiv.org/abs/2412.08580  LAION-SG每个样本平均包含6.39个物体，剔除抽象专有名词后聚焦于体现真实语义关联的具体名词。相较于原始LAION-Aesthetics数据集，LAION-SG的物体信息总量多出20%；若剔除专有名词，这一优势将提升至216%。我们的场景图与原始标题的平均标注长度分别为32.2与19.0，这表明场景图以更紧凑的形式承载了更丰富的信息。场景图的标注准确率也高于原始标题，详细内容请参阅论文。 ### 数据划分 - 总计540,005组带有物体、属性及关联标注的场景图-图像对。 - 480,005组样本用于训练 - 10,000组样本用于验证 - 50,000组样本用于测试 ## 使用方法 1. **下载标注文件与处理代码。** 将`dataset`文件夹与`code`文件夹下载至本地机器。 2. **下载LAION-SG数据集的图像文件。** 由于版权问题，我们无法提供本数据集的图像文件。请通过`dataset/`目录下三个JSON文件中提供的URL下载LAION-SG数据集所需的图像。所有图像需存储在由`args.image_dir`指定的`data`文件夹中。 *参考下载方法：* 针对LAION-Aesthetics-V2-6.5plus：我们的图像与标签来源于https://dagshub.com/DagsHub-Datasets/LAION-Aesthetics-V2-6.5plus。该网站同时提供了包含标签与图像下载链接的TSV文件。你可以访问该网站并参考“示例用法”部分了解如何获取此数据。 4. **在你的项目中使用LAION-SG数据集。** from laion_dataset import LAIONSceneGraphDataset, build_laion_loaders from configs_laion import parse_args ... def main(): ... args = parse_args() train_dataloader, val_dataloader = build_laion_loaders(args) ... 提供的`configs_laion`为示例配置文件，请根据你的需求修改以适配自身设置。         ## 数据集结构 ### 数据实例示例如下。 { "img_id": "482027", "name": "8040361340228797010.jpg", "caption_ori": "Yosemite Falls reflection in Merced River, Yosemite, California Poster by Tom Norring / Danita Delimont for $102.50 CAD", "score": "6.669650077819824", "url": "http://images.artgalore.ca/artgalore_images/PDD/US05TNO0060.jpg", "items": [ { "item_id": 0, "label": "mountains", "attributes": [ "rocky", "tall" ], "global_item_id": 3201429 }, { "item_id": 1, "label": "trees", "attributes": [ "leafless", "slender" ], "global_item_id": 3201430 }, { "item_id": 2, "label": "trees", "attributes": [ "leafless", "slender" ], "global_item_id": 3201431 }, { "item_id": 3, "label": "snow", "attributes": [ "white", "cold" ], "global_item_id": 3201432 }, { "item_id": 4, "label": "river", "attributes": [ "reflective", "calm" ], "global_item_id": 3201433 } ], "relations": [ { "triple_id": 0, "item1": 3, "relation": "adjacent to", "item2": 4, "global_relation_id": 2118313 }, { "triple_id": 1, "item1": 1, "relation": "growing near", "item2": 4, "global_relation_id": 2118314 }, { "triple_id": 2, "item1": 2, "relation": "growing near", "item2": 4, "global_relation_id": 2118315 } ] }, ### 数据字段 - "img_id": 图像的唯一数字ID。 - "name": 源图像的文件名。 - "caption_ori": LAION-Aesthetics中的原始图像标题。 - "score": 图像的美学评分。 - "url": 源图像的URL。 - "items": 图像中识别出的物体列表。 - "item_id": 当前图像中物体的唯一数字ID。 - "label": 物体的标签。 - "attributes": 物体的属性列表。 - "global_item_id": LAION-SG所有图像中物体的唯一数字ID。 - "relations": 图像中识别出的关联关系列表。 - "triple_id": 当前图像中关联关系的唯一数字ID。 - "item1": 场景图三元组<主体，关联关系，客体>中主体的item_id。 - "relation": 场景图三元组<主体，关联关系，客体>中主体与客体的关联关系。 - "item2": 场景图三元组<主体，关联关系，客体>中客体的item_id。 - "global_relation_id": LAION-SG所有图像中关联关系的唯一数字ID。   ## 数据集构建 ### 源数据所有图像均来源于LAION-Aesthetics V2 (6.5+)数据集。  #### 数据收集与处理 ![pipeline](https://huggingface.co/datasets/mengcy/LAION-SG/resolve/main/figure2-pipeline-2.png) 摘自论文： > 我们的LAION-SG数据集构建自LAION-Aesthetics V2 (6.5+)中的高质量图像，通过GPT-4o完成自动化标注。LAION-Aesthetics V2 (6.5+)是LAION-5B的一个子集，包含625,000组图像-文本对，其美学评分预测值超过6.5，通过LAION-Aesthetics Predictor V2模型筛选得到。在构建过程中，由于版权及其他问题，最终仅保留540,005张图像。 > > 通过提示工程，我们为场景图标注制定了一系列具体要求，以确保标注结果的全面性、系统性结构与精准性。上图展示了LAION-SG的详细构建流程。每个环节都对实现高质量自动化标注起到了关键作用。 > > 首先，由于场景图通常包含多个物体及其关联关系，提示词要求“尽可能识别图像中的所有物体、属性及其关联关系”。该设计可确保场景中的所有物体与交互均被标注。每个物体都会被分配唯一ID，即便同一类型的多个物体也不例外，以此精准还原场景的整体结构与层级关系。 > > 其次，属性模块要求每个物体至少包含一个抽象形容词属性，且禁止将其他物体作为属性。该设计在复杂场景中尤为重要，可帮助区分物体的外观、状态与特征与背景及其他元素，保持标注的一致性与清晰度。通过规避具体物体与抽象属性的混淆，标注结果将更具可解释性与泛化能力。 > > 在关联关系模块，我们明确要求使用具体动词描述物体间的关联，而非仅依赖空间方位。这是因为在场景图中，关联关系通常比单纯的空间信息更为关键。通过使用“站在……上”“持有……”等精准动词，我们可捕捉场景中的动态交互，这对复杂场景生成任务至关重要。 > > 借助这些提示词与多模态大语言模型GPT-4o，我们生成了代表场景图的标注。我们的标注旨在实现每个物体、属性与关联关系的准确性，全面覆盖场景中的每一处细节，为后续的组合式图像生成任务提供可靠的数据支撑。                       ## 引用  **BibTeX:** @article{li2024laion, title={LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations}, author={Li, Zejian and Meng, Chenye and Li, Yize and Yang, Ling and Zhang, Shengyuan and Ma, Jiarui and Li, Jiayi and Yang, Guang and Yang, Changyuan and Yang, Zhiyuan and others}, journal={arXiv preprint arXiv:2412.08580}, year={2024} } <!-- **APA:** [需要更多信息] ## 术语表 [可选] 若有需要，请在此部分包含可帮助读者理解数据集或数据集卡片的术语与计算方法。 [需要更多信息] ## 更多信息 [可选] [需要更多信息] ## 数据集卡片作者 [可选] [需要更多信息] ## 数据集卡片联系人 [需要更多信息]

应用场景：