StoryStream

Hugging Face2024-07-12 更新2024-12-12 收录

下载链接：

https://huggingface.co/datasets/TencentARC/StoryStream

下载链接

链接失效反馈

资源简介：

StoryStream数据集是一个创新资源，旨在推动多模态故事生成技术的发展。该数据集源自流行的卡通系列，包含一系列详细叙述和高分辨率图像，支持长故事序列的创作。数据集分为三个子集：Curious George、Rabbids Invasion和The Land Before Time。每个子集包含一个图像包和一个JSONL文件包，每个JSONL文件中的每一行对应一个包含30张图片和相应文本的故事。

The StoryStream Dataset is an innovative resource designed to advance the development of multimodal story generation technologies. Derived from popular cartoon series, this dataset includes a series of detailed narratives and high-resolution images, supporting the creation of long-form story sequences. The dataset is divided into three subsets: Curious George, Rabbids Invasion, and The Land Before Time. Each subset contains an image package and a JSONL file package, where each line in every JSONL file corresponds to a story containing 30 images and their corresponding text.

创建时间：

2024-07-10

原始信息汇总

StoryStream 数据集

简介

StoryStream 数据集是一个创新资源，旨在推动多模态故事生成的发展。该数据集源自流行的卡通系列，包含详细的叙事和高分辨率图像的全面集合。它旨在支持长故事序列的创作。

数据格式

StoryStream 数据集包含三个子集：

Curious George
Rabbids Invasion
The Land Before Time

每个子集包括：

图像包：一个 tar.gz 文件，包含从卡通系列中提取的所有图像。
JSONL 文件包：一个 zip 文件，包含多个 JSONL 文件。每个 JSONL 文件的每一行对应一个包含 30 张图像及其相关文本的故事。
- "images" 部分提供 30 张图像的路径列表。
- "captions" 部分列出 30 个相应的叙事文本。

在训练和验证的划分上：

Curious George 数据集包含两个独立的验证集。val.jsonl 是从与训练集相同的视频但不同的片段中提取的。val2.jsonl 完全来自训练集中未见过的视频。
Rabbids Invasion 和 The Land Before Time 只包含一个验证集。val.jsonl 包含来自两个来源的片段：与训练集相同的视频的不同片段，以及训练集中完全未见过的视频片段。

示例

一个 JSONL 行的示例如下： json {"id": 102, "images": ["000258/000258_keyframe_0-19-49-688.jpg", "000258/000258_keyframe_0-19-52-608.jpg", "000258/000258_keyframe_0-19-54-443.jpg", "000258/000258_keyframe_0-19-56-945.jpg", "000258/000258_keyframe_0-20-0-866.jpg", "000258/000258_keyframe_0-20-2-242.jpg", "000258/000258_keyframe_0-20-4-328.jpg", "000258/000258_keyframe_0-20-10-250.jpg", "000258/000258_keyframe_0-20-16-673.jpg", "000258/000258_keyframe_0-20-19-676.jpg"], "captions": ["Once upon a time, in a town filled with colorful buildings, a young boy named Timmy was standing on a sidewalk. He was wearing a light green t-shirt with a building motif and matching gloves, looking excited about the day ahead.", "Soon, Timmy joined a group of people gathered in a park. Among them was a man in a yellow hat and green tie, a lady in a pink dress holding a bag and a spray bottle, and two other children in white shirts holding bags. They were all ready to start their days activity.", "Timmy stood next to the man in the yellow hat, who was also wearing yellow gloves and a shirt with a cityscape design. Timmy, sporting a green T-shirt with a recycling symbol, held a clear plastic bag filled with recyclables and a piece of paper. They were ready to start their city clean-up mission.", "Timmy, still smiling, began walking along a sidewalk with a silver railing, excited to help clean up his beloved city, and his enthusiasm was contagious.", "The group gathered in the park, preparing for their clean-up activity. The man in the yellow hat held a clipboard, while a child nearby wore gloves and carried a trash picker. Everyone was eager to start.", "Suddenly, George, the brown monkey, appeared. He stood between two individuals, happily holding a blue bowling pin with a castle design. George was always ready to join in on the fun and lend a helping hand.", "One of the group members held a trash bag and a clipboard while wearing gloves. They were all set to start the clean-up, with George eager to help.", "As they started cleaning, one of the children handed a drawing to an adult. The drawing was of flowers, a symbol of the beauty they were trying to preserve in their city.", "The group, holding hands and carrying bags, walked down the sidewalk. They were a team, working together to make their city cleaner and more beautiful.", "As they walked, they passed a toddler in white clothes and an adult pushing a stroller. The city was bustling with life, and everyone was doing their part to keep it clean."], "orders": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}

训练

为了优化训练效率，建议将故事分成每段 10 张图像的片段，如我们的研究论文所示。处理此过程的脚本 StoryStream/chunk_data.py 可在我们的 GitHub 仓库中找到。

引用

如果您发现这项工作有帮助，请考虑引用： bash @article{yang2024seedstory, title={SEED-Story: Multimodal Long Story Generation with Large Language Model}, author={Shuai Yang and Yuying Ge and Yang Li and Yukang Chen and Yixiao Ge and Ying Shan and Yingcong Chen}, year={2024}, journal={arXiv preprint arXiv:2407.08683}, url={https://arxiv.org/abs/2407.08683}, }

许可证

StoryStream 数据集在 Apache License Version 2.0 下授权，第三方组件除外，详见 License。

AI搜集汇总

数据集介绍

构建方式

StoryStream数据集的构建基于多模态故事生成的研究需求，主要来源于流行的卡通系列。数据集包含三个子集，分别是《好奇的乔治》、《疯狂兔子》和《小脚板走天涯》。每个子集由图像包和JSONL文件包组成，图像包包含从卡通系列中提取的高分辨率图像，而JSONL文件包则包含多个JSONL文件，每个文件对应一个由30张图像及其相关文本组成的故事。训练集和验证集的划分方式因子集而异，部分验证集来自训练集中未见的视频片段，以增强模型的泛化能力。

特点

StoryStream数据集的特点在于其丰富的高分辨率图像与详细叙事文本的配对，这些文本和图像共同模拟了现实世界故事书的丰富性。数据集的叙事文本通常跨越较长的序列，增强了故事的深度和连贯性。此外，数据集的设计支持将故事分割为10张图像的小段，以提高训练效率。这种设计不仅适用于长故事生成任务，还为多模态学习提供了丰富的素材。

使用方法

使用StoryStream数据集时，建议参考GitHub仓库中的`StoryStream/chunk_data.py`脚本，将故事分割为10张图像的小段，以优化训练效率。数据加载器可以通过`src/data/story_telling.py`中的`build_long_story_datapipe`函数构建。数据集适用于多模态长故事生成任务，尤其适合基于大语言模型的研究。在使用过程中，用户需注意数据集的许可证要求，并遵守相关引用规范。

背景与挑战

背景概述

StoryStream数据集由Tencent ARC团队于2024年推出，旨在推动多模态故事生成领域的研究。该数据集基于经典卡通系列，包含丰富的叙事文本与高分辨率图像，专注于支持长序列故事的生成。其核心研究问题在于如何通过多模态数据（图像与文本）的结合，生成连贯且富有深度的故事序列。StoryStream的发布为自然语言处理与计算机视觉的交叉领域提供了新的研究工具，尤其在长故事生成任务中展现了显著的影响力。

当前挑战

StoryStream数据集面临的挑战主要体现在两个方面。其一，在领域问题层面，长序列故事生成需要模型具备强大的上下文理解与记忆能力，以保持故事的连贯性与逻辑性。其二，在数据构建过程中，如何从卡通系列中提取高质量且语义丰富的图像与文本对，并确保其多样性与一致性，是一项复杂的任务。此外，数据集的分割与验证集设计也需兼顾模型泛化能力的评估，避免过拟合现象的发生。这些挑战共同构成了StoryStream数据集在推动多模态故事生成研究中的关键难点。

常用场景

经典使用场景

StoryStream数据集在多模态故事生成领域具有广泛的应用前景。其经典使用场景包括通过结合高分辨率图像与详细叙事文本，生成连贯且富有情感的长篇故事序列。研究人员可以利用该数据集训练模型，使其能够理解并生成复杂的故事情节，从而推动自然语言处理与计算机视觉的交叉研究。

解决学术问题

StoryStream数据集解决了多模态故事生成中的关键学术问题，如长序列故事的连贯性生成、图像与文本的跨模态对齐以及复杂叙事的逻辑一致性。通过提供丰富的图像与文本对，该数据集为研究人员提供了实验基础，推动了多模态生成模型在长序列任务中的性能提升，填补了现有数据集在叙事深度与长度上的不足。

衍生相关工作

StoryStream数据集衍生了一系列经典研究工作，例如基于该数据集的SEED-Story模型，该模型利用大语言模型实现了多模态长故事生成。此外，许多研究团队在此基础上提出了改进的跨模态对齐算法与长序列生成策略，进一步推动了多模态生成领域的发展。这些工作不仅验证了数据集的实用性，也为后续研究提供了宝贵的参考。

以上内容由AI搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集