five

text-2-video-human-preferences-veo2

收藏
魔搭社区2025-11-16 更新2025-03-15 收录
下载链接:
https://modelscope.cn/datasets/Rapidata/text-2-video-human-preferences-veo2
下载链接
链接失效反馈
官方服务:
资源简介:
<style> .vertical-container { display: flex; flex-direction: column; gap: 60px; } .image-container img { height: 150px; /* Set the desired height */ margin:0; object-fit: contain; /* Ensures the aspect ratio is maintained */ width: auto; /* Adjust width automatically based on height */ } .image-container { display: flex; /* Aligns images side by side */ justify-content: space-around; /* Space them evenly */ align-items: center; /* Align them vertically */ } .container { width: 90%; margin: 0 auto; } .text-center { text-align: center; } .score-amount { margin: 0; margin-top: 10px; } .score-percentage { font-size: 12px; font-weight: semi-bold; } </style> # Rapidata Video Generation Google DeepMind Veo2 Human Preference <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="300" alt="Dataset visualization"> </a> <a href="https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback"> </a> <p> If you get value from this dataset and would like to see more in the future, please consider liking it. </p> This dataset was collected in ~1 hour total using the [Rapidata Python API](https://docs.rapidata.ai), accessible to anyone and ideal for large scale data annotation. # Overview In this dataset, ~45'000 human annotations were collected to evaluate Google DeepMind Veo2 video generation model on our benchmark. The up to date benchmark can be viewed on our [website](https://www.rapidata.ai/leaderboard/video-models). The benchmark data is accessible on [huggingface](https://huggingface.co/datasets/Rapidata/text-2-video-human-preferences) directly. # Explanation of the colums The dataset contains paired video comparisons. Each entry includes 'video1' and 'video2' fields, which contain links to downscaled GIFs for easy viewing. The full-resolution videos can be found [here](https://huggingface.co/datasets/Rapidata/text-2-video-human-preferences/tree/main/Videos). The weighted_results column contains scores ranging from 0 to 1, representing aggregated user responses. Individual user responses can be found in the detailedResults column. # Alignment The alignment score quantifies how well an video matches its prompt. Users were asked: "Which video fits the description better?". ## Examples <div class="vertical-container"> <div class="container"> <div class="text-center"> <q>A lone kayaker paddles through calm, reflecting waters under a vibrant sunset, the sky painted with hues of orange and pink, creating a serene and mesmerizing evening scene.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">Veo 2 </h3> <div class="score-percentage">(Score: 92.83%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/wLMZ_ZpXGJQ2DNsGrKBt0.webp" width=500> </div> <div> <h3 class="score-amount">Hunyuan </h3> <div class="score-percentage">(Score: 7.17%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/yt5nrwg0_soHhA-ut0Duy.webp" width=500> </div> </div> </div> <div class="container"> <div class="text-center"> <q>An astronaut explores a newly discovered alien planet, scanning the terrain with a high-tech visor, as vibrant flora and towering structures emerge under a dual-star sky.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">Veo 2 </h3> <div class="score-percentage">(Score: 7.87%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/c3DPkvz5v6SddtYqwQeki.webp" width=500> </div> <div> <h3 class="score-amount">Pika </h3> <div class="score-percentage">(Score: 92.13%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/klJif2LwzkLeG33hqK4pI.webp" width=500> </div> </div> </div> </div> # Coherence The coherence score measures whether the generated video is logically consistent and free from artifacts or visual glitches. Without seeing the original prompt, users were asked: "Which video is logically more coherent? E.g. the video where physics are less violated and the composition makes more sense." ## Examples <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3>Veo 2 </h3> <div class="score-percentage">(Score: 94.99%)</div> <img src="https://assets.rapidata.ai/0020_veo2_0.gif" width="500" alt="Dataset visualization"> </div> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">(Score: 5.01%)</div> <img src="https://assets.rapidata.ai/0020_wan2.1_0.gif" width="500" alt="Dataset visualization"> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3>Veo 2 </h3> <div class="score-percentage">(Score: 13.00%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/OuLb0PRVq2yl64Gru3n8k.webp" width="500" alt="Dataset visualization"> </div> <div> <h3>Hunyuan </h3> <div class="score-percentage">(Score: 87.00%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/SbpuvC5QIrJX1Q2b20s7d.webp" width="500" alt="Dataset visualization"> </div> </div> </div> </div> # Preference The preference score reflects how visually appealing participants found each video, independent of the prompt. Users were asked: "Which video do you prefer aesthetically?" ## Examples <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3>Veo 2 </h3> <div class="score-percentage">(Score: 90.31%)</div> <img src="https://assets.rapidata.ai/0001_veo2_0.gif" width="500" alt="Dataset visualization"> </div> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">(Score: 9.69%)</div> <img src="https://assets.rapidata.ai/0001_wan2.1_0.gif" width="500" alt="Dataset visualization"> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3>Veo 2 </h3> <div class="score-percentage">(Score: 3.28%)</div> <img src="https://assets.rapidata.ai/0085_veo2_0.gif" width="500" alt="Dataset visualization"> </div> <div> <h3>Sora </h3> <div class="score-percentage">(Score: 96.72%)</div> <img src="https://assets.rapidata.ai/0085_sora_0.gif" width="500" alt="Dataset visualization"> </div> </div> </div> </div> </br> # About Rapidata Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit [rapidata.ai](https://www.rapidata.ai/) to learn more about how we're revolutionizing human feedback collection for AI development. # Other Datasets We run a benchmark of the major image generation models, the results can be found on our [website](https://www.rapidata.ai/leaderboard/image-models). We rank the models according to their coherence/plausiblity, their aligment with the given prompt and style prefernce. The underlying 2M+ annotations can be found here: - Link to the [Rich Video Annotation dataset](https://huggingface.co/datasets/Rapidata/text-2-video-Rich-Human-Feedback) - Link to the [Coherence dataset](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset) - Link to the [Text-2-Image Alignment dataset](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Alignment_Dataset) - Link to the [Preference dataset](https://huggingface.co/datasets/Rapidata/700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3) We have also colleted a [rich human feedback dataset](https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback), where we annotated an alignment score of each word in a prompt, scored coherence, overall aligment and style preferences and finally annotated heatmaps of areas of interest for those images with low scores.

<style> .vertical-container { display: flex; flex-direction: column; gap: 60px; } .image-container img { height: 150px; /* Set the desired height */ margin:0; object-fit: contain; /* Ensures the aspect ratio is maintained */ width: auto; /* Adjust width automatically based on height */ } .image-container { display: flex; /* Aligns images side by side */ justify-content: space-around; /* Space them evenly */ align-items: center; /* Align them vertically */ } .container { width: 90%; margin: 0 auto; } .text-center { text-align: center; } .score-amount { margin: 0; margin-top: 10px; } .score-percentage { font-size: 12px; font-weight: semi-bold; } </style> # Rapidata 谷歌DeepMind Veo2 视频生成人类偏好数据集 <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="300" alt="数据集可视化"> </a> <a href="https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback"> </a> <p> 若您从本数据集获益并希望未来获取更多同类资源,不妨为其点赞。 </p> 本数据集总计耗时约1小时,通过[Rapidata Python应用程序编程接口(Rapidata Python API)](https://docs.rapidata.ai)完成采集,面向所有用户开放,非常适用于大规模数据标注工作。 # 数据集概览 本数据集共采集约45000条人类标注数据,用于在我们的基准测试中评估谷歌DeepMind Veo2视频生成模型。最新基准测试结果可在我们的[官方网站](https://www.rapidata.ai/leaderboard/video-models)查看,基准测试数据可直接在[Hugging Face平台(Hugging Face)](https://huggingface.co/datasets/Rapidata/text-2-video-human-preferences)获取。 # 数据集字段说明 本数据集包含成对视频对比样本。每条数据均包含`video1`与`video2`字段,其中存储了用于快速预览的压缩格式GIF文件链接。全分辨率视频可通过[此链接](https://huggingface.co/datasets/Rapidata/text-2-video-human-preferences/tree/main/Videos)获取。`weighted_results`字段的评分范围为0至1,代表汇总后的用户反馈结果;单条用户反馈数据可在`detailedResults`字段中查看。 # 对齐度 对齐度评分用于衡量视频与输入提示词的匹配程度。调研时向用户提出的问题为:“哪一段视频更贴合给定的描述?” ## 示例 <div class="vertical-container"> <div class="container"> <div class="text-center"> <q>一名独自划行的皮划艇运动员于静谧澄澈的水面上穿梭,绚烂日落之下,天空被橙粉交织的色彩晕染,勾勒出静谧迷人的黄昏景致。</q> </div> <div class="image-container"> <div> <h3 class="score-amount">Veo2 </h3> <div class="score-percentage">(评分:92.83%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/wLMZ_ZpXGJQ2DNsGrKBt0.webp" width="500"> </div> <div> <h3 class="score-amount">Hunyuan </h3> <div class="score-percentage">(评分:7.17%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/yt5nrwg0_soHhA-ut0Duy.webp" width="500"> </div> </div> </div> <div class="container"> <div class="text-center"> <q>一名宇航员探索全新发现的外星行星,借助高科技面罩扫描地表,双恒星天空下,色彩斑斓的植被与高耸的建筑映入眼帘。</q> </div> <div class="image-container"> <div> <h3 class="score-amount">Veo2 </h3> <div class="score-percentage">(评分:7.87%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/c3DPkvz5v6SddtYqwQeki.webp" width="500"> </div> <div> <h3 class="score-amount">Pika </h3> <div class="score-percentage">(评分:92.13%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/klJif2LwzkLeG33hqK4pI.webp" width="500"> </div> </div> </div> </div> # 连贯性 连贯性评分用于评估生成视频的逻辑自洽性,以及是否存在伪影或视觉瑕疵。调研时不向用户展示原始提示词,提出的问题为:“哪一段视频的逻辑连贯性更强?例如,更符合物理规则、画面构图更合理的视频。” ## 示例 <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3>Veo2 </h3> <div class="score-percentage">(评分:94.99%)</div> <img src="https://assets.rapidata.ai/0020_veo2_0.gif" width="500" alt="数据集可视化"> </div> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">(评分:5.01%)</div> <img src="https://assets.rapidata.ai/0020_wan2.1_0.gif" width="500" alt="数据集可视化"> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3>Veo2 </h3> <div class="score-percentage">(评分:13.00%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/OuLb0PRVq2yl64Gru3n8k.webp" width="500" alt="数据集可视化"> </div> <div> <h3>Hunyuan </h3> <div class="score-percentage">(评分:87.00%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/SbpuvC5QIrJX1Q2b20s7d.webp" width="500" alt="数据集可视化"> </div> </div> </div> </div> # 美学偏好 美学偏好评分用于体现参与者对单段视频的视觉吸引力评价,与原始提示词无关。调研时向用户提出的问题为:“从美学角度出发,你更倾向于哪一段视频?” ## 示例 <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3>Veo2 </h3> <div class="score-percentage">(评分:90.31%)</div> <img src="https://assets.rapidata.ai/0001_veo2_0.gif" width="500" alt="数据集可视化"> </div> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">(评分:9.69%)</div> <img src="https://assets.rapidata.ai/0001_wan2.1_0.gif" width="500" alt="数据集可视化"> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3>Veo2 </h3> <div class="score-percentage">(评分:3.28%)</div> <img src="https://assets.rapidata.ai/0085_veo2_0.gif" width="500" alt="数据集可视化"> </div> <div> <h3>Sora </h3> <div class="score-percentage">(评分:96.72%)</div> <img src="https://assets.rapidata.ai/0085_sora_0.gif" width="500" alt="数据集可视化"> </div> </div> </div> </div> </br> # 关于Rapidata Rapidata的技术使大规模人类反馈采集工作比以往任何时候都更快捷、更易用。请访问[rapidata.ai官网](https://www.rapidata.ai/),了解我们如何革新AI开发中的人类反馈采集流程。 # 其他相关数据集 我们针对主流图像生成模型开展了基准测试,测试结果可在我们的[官网](https://www.rapidata.ai/leaderboard/image-models)查看。我们将根据模型的连贯性/合理性、与提示词的对齐度以及风格偏好对模型进行排名。相关的200万+条标注数据可通过以下链接获取: - [丰富视频标注数据集](https://huggingface.co/datasets/Rapidata/text-2-video-Rich-Human-Feedback) - [连贯性数据集](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset) - [文图对齐数据集](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Alignment_Dataset) - [偏好数据集](https://huggingface.co/datasets/Rapidata/700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3) 我们还采集了[丰富人类反馈数据集](https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback),该数据集对提示词中每个单词的对齐度进行了标注,同时对图像的连贯性、整体对齐度与风格偏好进行评分,并最终为低分图像生成了感兴趣区域热力图标注。
提供机构:
maas
创建时间:
2025-03-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作