sora-video-generation-aligned-words

收藏

魔搭社区2025-12-05 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/Rapidata/sora-video-generation-aligned-words

下载链接

链接失效反馈

官方服务：

资源简介：

<style> .vertical-container { display: flex; flex-direction: column; gap: 60px; } .image-container img { height: 250px; /* Set the desired height */ margin:0; object-fit: contain; /* Ensures the aspect ratio is maintained */ width: auto; /* Adjust width automatically based on height */ } .image-container { display: flex; /* Aligns images side by side */ justify-content: space-around; /* Space them evenly */ align-items: center; /* Align them vertically */ } .container { width: 90%; margin: 0 auto; } .prompt { width: 100%; text-align: center; font-weight: bold; font-size: 16px; height: 60px; } .score-amount { margin: 0; margin-top: 10px; } .score-percentage { font-size: 12px; font-weight: semi-bold; text-align: right; } .main-container { display: flex; flex-direction: row; gap: 60px; } .good { color: #18c54f; } .bad { color: red; } </style> # Rapidata Video Generation Word for Word Alignment Dataset <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="300" alt="Dataset visualization"> </a> <a href="https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback"> </a> <p> If you get value from this dataset and would like to see more in the future, please consider liking it. </p> This dataset was collected in ~1 hour using the [Rapidata Python API](https://docs.rapidata.ai), accessible to anyone and ideal for large scale data annotation. # Overview In this dataset, ~1500 human evaluators were asked to evaluate AI-generated videos based on what part of the prompt did not align the video. The specific instruction was: "The video is based on the text below. Select mistakes, i.e., words that are not aligned with the video." The dataset is based on the [Alignment Dataset](https://huggingface.co/datasets/Rapidata/sora-video-generation-alignment-likert-scoring). The videos that scored above a 0.5 (were worse) in the "LikertScoreNormalized" were selected to be analyzed in detail. # Videos The videos in the dataset viewer are previewed as scaled down gifs. The original videos are stored under [Files and versions](https://huggingface.co/datasets/Rapidata/sora-video-generation-aligned-words/tree/main/Videos) <h3> The video is based on the text below. Select mistakes, i.e., words that are not aligned with the video. </h3> <div class="main-container"> <div class="container"> <div class="image-container"> <div> <img src="https://cdn-uploads.huggingface.co/production/uploads/672b7d79fd1e92e3c3567435/L5ncdW_-mKfT14Rn2-0X1.gif" width=500> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <img src="https://cdn-uploads.huggingface.co/production/uploads/672b7d79fd1e92e3c3567435/WTkh6PSn84c9KOK9EnhbV.gif" width=500> </div> </div> </div> </div>

<style> .vertical-container { display: flex; flex-direction: column; gap: 60px; } .image-container img { height: 250px; /* Set the desired height */ margin:0; object-fit: contain; /* Ensures the aspect ratio is maintained */ width: auto; /* Adjust width automatically based on height */ } .image-container { display: flex; /* Aligns images side by side */ justify-content: space-around; /* Space them evenly */ align-items: center; /* Align them vertically */ } .container { width: 90%; margin: 0 auto; } .prompt { width: 100%; text-align: center; font-weight: bold; font-size: 16px; height: 60px; } .score-amount { margin: 0; margin-top: 10px; } .score-percentage { font-size: 12px; font-weight: semi-bold; text-align: right; } .main-container { display: flex; flex-direction: row; gap: 60px; } .good { color: #18c54f; } .bad { color: red; } </style> # Rapidata视频生成逐词对齐数据集（Rapidata） <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="300" alt="数据集可视化"> </a> <a href="https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback"> </a> <p>若您从本数据集获益并希望未来看到更多相关资源，不妨为其点赞。</p> 本数据集通过[Rapidata Python API（Rapidata Python API）](https://docs.rapidata.ai)耗时约1小时完成采集，面向所有用户开放，是大规模数据标注的理想选择。 ## 数据集概况本数据集共招募约1500名人类评估员，要求其基于提示词（Prompt）与生成视频的不匹配部分，对AI生成视频进行评估。具体评估指令为：「该视频基于以下文本生成，请勾选与视频内容不匹配的错误词汇。」本数据集基于[对齐数据集（Alignment Dataset）](https://huggingface.co/datasets/Rapidata/sora-video-generation-alignment-likert-scoring)构建，筛选出「标准化李克特评分（LikertScoreNormalized）」得分高于0.5（即视频质量较差）的视频进行详细分析。 ## 视频说明数据集查看器中的视频以压缩GIF格式预览，原始视频存储于[文件与版本（Files and versions）](https://huggingface.co/datasets/Rapidata/sora-video-generation-aligned-words/tree/main/Videos)路径下 <h3>评估指令：该视频基于以下文本生成，请勾选与视频内容不匹配的错误词汇。</h3> <div class="main-container"> <div class="container"> <div class="image-container"> <div> <img src="https://cdn-uploads.huggingface.co/production/uploads/672b7d79fd1e92e3c3567435/L5ncdW_-mKfT14Rn2-0X1.gif" width=500> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <img src="https://cdn-uploads.huggingface.co/production/uploads/672b7d79fd1e92e3c3567435/WTkh6PSn84c9KOK9EnhbV.gif" width=500> </div> </div> </div> </div>

提供机构：

创建时间：

2025-02-05

搜集汇总

数据集介绍

main_image_url

背景与挑战

背景概述

该数据集由Rapidata通过其API快速收集，用于评估AI生成视频与文本提示的未对齐部分，涉及约1500名人类评估者。它基于Alignment Dataset，专注于分析LikertScoreNormalized评分高于0.5的视频，以识别视频与文本之间的不匹配错误。

以上内容由遇见数据集搜集并总结生成

© 2023-2026 上海数据发展科技有限责任公司版权所有

沪ICP备17003045号-15 沪公网安备31010402336585号

二维码

社区交流群

二维码

科研交流群

商业服务