StoryStream
收藏StoryStream 数据集
简介
StoryStream 数据集是一个创新资源,旨在推动多模态故事生成的发展。该数据集源自流行的卡通系列,包含详细的叙事和高分辨率图像的全面集合。它旨在支持长故事序列的创作。
数据格式
StoryStream 数据集包含三个子集:
- Curious George
- Rabbids Invasion
- The Land Before Time
每个子集包括:
- 图像包:一个 tar.gz 文件,包含从卡通系列中提取的所有图像。
- JSONL 文件包:一个 zip 文件,包含多个 JSONL 文件。每个 JSONL 文件的每一行对应一个包含 30 张图像及其相关文本的故事。
- "images" 部分提供 30 张图像的路径列表。
- "captions" 部分列出 30 个相应的叙事文本。
在训练和验证的划分上:
- Curious George 数据集包含两个独立的验证集。val.jsonl 是从与训练集相同的视频但不同的片段中提取的。val2.jsonl 完全来自训练集中未见过的视频。
- Rabbids Invasion 和 The Land Before Time 只包含一个验证集。val.jsonl 包含来自两个来源的片段:与训练集相同的视频的不同片段,以及训练集中完全未见过的视频片段。
示例
一个 JSONL 行的示例如下: json {"id": 102, "images": ["000258/000258_keyframe_0-19-49-688.jpg", "000258/000258_keyframe_0-19-52-608.jpg", "000258/000258_keyframe_0-19-54-443.jpg", "000258/000258_keyframe_0-19-56-945.jpg", "000258/000258_keyframe_0-20-0-866.jpg", "000258/000258_keyframe_0-20-2-242.jpg", "000258/000258_keyframe_0-20-4-328.jpg", "000258/000258_keyframe_0-20-10-250.jpg", "000258/000258_keyframe_0-20-16-673.jpg", "000258/000258_keyframe_0-20-19-676.jpg"], "captions": ["Once upon a time, in a town filled with colorful buildings, a young boy named Timmy was standing on a sidewalk. He was wearing a light green t-shirt with a building motif and matching gloves, looking excited about the day ahead.", "Soon, Timmy joined a group of people gathered in a park. Among them was a man in a yellow hat and green tie, a lady in a pink dress holding a bag and a spray bottle, and two other children in white shirts holding bags. They were all ready to start their days activity.", "Timmy stood next to the man in the yellow hat, who was also wearing yellow gloves and a shirt with a cityscape design. Timmy, sporting a green T-shirt with a recycling symbol, held a clear plastic bag filled with recyclables and a piece of paper. They were ready to start their city clean-up mission.", "Timmy, still smiling, began walking along a sidewalk with a silver railing, excited to help clean up his beloved city, and his enthusiasm was contagious.", "The group gathered in the park, preparing for their clean-up activity. The man in the yellow hat held a clipboard, while a child nearby wore gloves and carried a trash picker. Everyone was eager to start.", "Suddenly, George, the brown monkey, appeared. He stood between two individuals, happily holding a blue bowling pin with a castle design. George was always ready to join in on the fun and lend a helping hand.", "One of the group members held a trash bag and a clipboard while wearing gloves. They were all set to start the clean-up, with George eager to help.", "As they started cleaning, one of the children handed a drawing to an adult. The drawing was of flowers, a symbol of the beauty they were trying to preserve in their city.", "The group, holding hands and carrying bags, walked down the sidewalk. They were a team, working together to make their city cleaner and more beautiful.", "As they walked, they passed a toddler in white clothes and an adult pushing a stroller. The city was bustling with life, and everyone was doing their part to keep it clean."], "orders": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}
训练
为了优化训练效率,建议将故事分成每段 10 张图像的片段,如我们的研究论文所示。处理此过程的脚本 StoryStream/chunk_data.py 可在我们的 GitHub 仓库中找到。
引用
如果您发现这项工作有帮助,请考虑引用: bash @article{yang2024seedstory, title={SEED-Story: Multimodal Long Story Generation with Large Language Model}, author={Shuai Yang and Yuying Ge and Yang Li and Yukang Chen and Yixiao Ge and Ying Shan and Yingcong Chen}, year={2024}, journal={arXiv preprint arXiv:2407.08683}, url={https://arxiv.org/abs/2407.08683}, }
许可证
StoryStream 数据集在 Apache License Version 2.0 下授权,第三方组件除外,详见 License。




