text-2-video-human-preferences-wan2.1

收藏

魔搭社区2025-12-04 更新2025-03-15 收录

下载链接：

https://modelscope.cn/datasets/Rapidata/text-2-video-human-preferences-wan2.1

下载链接

链接失效反馈

官方服务：

资源简介：

<style> .vertical-container { display: flex; flex-direction: column; gap: 60px; } .image-container img { height: 150px; /* Set the desired height */ margin:0; object-fit: contain; /* Ensures the aspect ratio is maintained */ width: auto; /* Adjust width automatically based on height */ } .image-container { display: flex; /* Aligns images side by side */ justify-content: space-around; /* Space them evenly */ align-items: center; /* Align them vertically */ } .container { width: 90%; margin: 0 auto; } .text-center { text-align: center; } .score-amount { margin: 0; margin-top: 10px; } .score-percentage { font-size: 12px; font-weight: semi-bold; } </style> # Rapidata Video Generation Alibaba Wan2.1 Human Preference <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="300" alt="Dataset visualization"> </a> <a href="https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback"> </a> <p> If you get value from this dataset and would like to see more in the future, please consider liking it. </p> This dataset was collected in ~1 hour total using the [Rapidata Python API](https://docs.rapidata.ai), accessible to anyone and ideal for large scale data annotation. # Overview In this dataset, ~45'000 human annotations were collected to evaluate Alibaba Wan 2.1 video generation model on our benchmark. The up to date benchmark can be viewed on our [website](https://www.rapidata.ai/leaderboard/video-models). The benchmark data is accessible on [huggingface](https://huggingface.co/datasets/Rapidata/text-2-video-human-preferences) directly. # Explanation of the colums The dataset contains paired video comparisons. Each entry includes 'video1' and 'video2' fields, which contain links to downscaled GIFs for easy viewing. The full-resolution videos can be found [here](https://huggingface.co/datasets/Rapidata/text-2-video-human-preferences/tree/main/Videos). The weighted_results column contains scores ranging from 0 to 1, representing aggregated user responses. Individual user responses can be found in the detailedResults column. # Alignment The alignment score quantifies how well an video matches its prompt. Users were asked: "Which video fits the description better?". ## Examples <div class="vertical-container"> <div class="container"> <div class="text-center"> <q>A firefighter in action battles flames, the camera alternating between his determined face and the roaring blaze as he rescues those in danger.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">Wan 2.1 </h3> <div class="score-percentage">(Score: 90.08%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/QVcYT7l4EYvlfsQpIv7_b.webp" width=500> </div> <div> <h3 class="score-amount">Alpha </h3> <div class="score-percentage">(Score: 19.92%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/sv01eASaoFXMVFo_1lVqG.webp" width=500> </div> </div> </div> <div class="container"> <div class="text-center"> <q>An artist paints a vibrant mural under flickering streetlights, each brushstroke blending colors seamlessly, while passersby watch in awe as the masterpiece comes to life.</q> </div> <div class="image-container"> <div> <h3 class="score-amount">Wan 2.1 </h3> <div class="score-percentage">(Score: 0.00%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/UOYqSEBFlPRade5qZUhr6.webp" width=500> </div> <div> <h3 class="score-amount">Pika </h3> <div class="score-percentage">(Score: 100.00%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/T-SeHTxzHPV_RsWDnvuyx.webp" width=500> </div> </div> </div> </div> # Coherence The coherence score measures whether the generated video is logically consistent and free from artifacts or visual glitches. Without seeing the original prompt, users were asked: "Which video is logically more coherent? E.g. the video where physics are less violated and the composition makes more sense." ## Examples <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">(Score: 89.15%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/XNBdO2xbRbzXRghJhw7AJ.webp" width="500" alt="Dataset visualization"> </div> <div> <h3>Hunyuan </h3> <div class="score-percentage">(Score: 11.85%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/QRmD9y4bl_E1YcrdqdhhO.webp" width="500" alt="Dataset visualization"> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">(Score: 12.28%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/2j-0AOR9SdYn3TIeJN7N6.webp" width="500" alt="Dataset visualization"> </div> <div> <h3>Veo 2 </h3> <div class="score-percentage">(Score: 87.72%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/VDD8-DTZ7knLnv2WI60QY.webp" width="500" alt="Dataset visualization"> </div> </div> </div> </div> # Preference The preference score reflects how visually appealing participants found each video, independent of the prompt. Users were asked: "Which video do you prefer aesthetically?" ## Examples <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">(Score: 91.57%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/Ew0fJrcpN3izQxQ3Bxv-z.webp" width="500" alt="Dataset visualization"> </div> <div> <h3>Hunyuan </h3> <div class="score-percentage">(Score: 8.43%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/7urtjMmka0qEPFpwE5Jdz.webp" width="500" alt="Dataset visualization"> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">(Score: 13.18%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/XhcAQoTLIjygsSBw5AbYw.webp" width="500" alt="Dataset visualization"> </div> <div> <h3>Veo 2 </h3> <div class="score-percentage">(Score: 86.82%)</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/geJYWGYJ2fM58wAxXkWZh.webp" width="500" alt="Dataset visualization"> </div> </div> </div> </div> </br> # About Rapidata Rapidata's technology makes collecting human feedback at scale faster and more accessible than ever before. Visit [rapidata.ai](https://www.rapidata.ai/) to learn more about how we're revolutionizing human feedback collection for AI development. # Other Datasets We run a benchmark of the major image generation models, the results can be found on our [website](https://www.rapidata.ai/leaderboard/image-models). We rank the models according to their coherence/plausiblity, their aligment with the given prompt and style prefernce. The underlying 2M+ annotations can be found here: - Link to the [Rich Video Annotation dataset](https://huggingface.co/datasets/Rapidata/text-2-video-Rich-Human-Feedback) - Link to the [Coherence dataset](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset) - Link to the [Text-2-Image Alignment dataset](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Alignment_Dataset) - Link to the [Preference dataset](https://huggingface.co/datasets/Rapidata/700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3) We have also colleted a [rich human feedback dataset](https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback), where we annotated an alignment score of each word in a prompt, scored coherence, overall aligment and style preferences and finally annotated heatmaps of areas of interest for those images with low scores.

<style> .vertical-container { display: flex; flex-direction: column; gap: 60px; } .image-container img { height: 150px; /* Set the desired height */ margin:0; object-fit: contain; /* Ensures the aspect ratio is maintained */ width: auto; /* Adjust width automatically based on height */ } .image-container { display: flex; /* Aligns images side by side */ justify-content: space-around; /* Space them evenly */ align-items: center; /* Align them vertically */ } .container { width: 90%; margin: 0 auto; } .text-center { text-align: center; } .score-amount { margin: 0; margin-top: 10px; } .score-percentage { font-size: 12px; font-weight: semi-bold; } </style> # Rapidata 视频生成：阿里巴巴 Wan 2.1 人类偏好数据集 <a href="https://www.rapidata.ai"> <img src="https://cdn-uploads.huggingface.co/production/uploads/66f5624c42b853e73e0738eb/jfxR79bOztqaC6_yNNnGU.jpeg" width="300" alt="数据集可视化"> </a> <a href="https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback"> </a> <p>如果您从本数据集获益并希望未来推出更多同类资源，欢迎为其点赞。</p> 本数据集总计耗时约1小时，通过[Rapidata Python API](https://docs.rapidata.ai)完成采集，面向所有用户开放，非常适合开展大规模数据标注工作。 # 数据集概览本数据集共收集了约45000条人类标注数据，用于在我们的基准测试中评估阿里巴巴 Wan 2.1 视频生成模型。最新的基准测试可在我们的[官方网站](https://www.rapidata.ai/leaderboard/video-models)查看，基准数据可直接在[Hugging Face](https://huggingface.co/datasets/Rapidata/text-2-video-human-preferences)获取。 # 字段说明本数据集包含成对的视频对比样本。每条数据均包含`video1`与`video2`字段，其中存储了用于快速预览的压缩GIF文件链接。全分辨率视频可通过[此链接](https://huggingface.co/datasets/Rapidata/text-2-video-human-preferences/tree/main/Videos)获取。 `weighted_results`字段包含取值范围为0至1的分数，代表聚合后的用户反馈结果；详细的单条用户标注结果可在`detailedResults`字段中查看。 # 对齐度评分（Alignment Score）对齐度评分用于量化视频与对应提示文本的匹配程度。用户需回答："哪一段视频更贴合给定的描述？" ## 示例 <div class="vertical-container"> <div class="container"> <div class="text-center"> <q>一名消防员在行动中与烈火搏斗，镜头在他坚毅的面庞与咆哮的火焰之间切换，同时他正营救身陷险境的人员。</q> </div> <div class="image-container"> <div> <h3 class="score-amount">Wan 2.1 </h3> <div class="score-percentage">（得分：90.08%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/QVcYT7l4EYvlfsQpIv7_b.webp" width=500> </div> <div> <h3 class="score-amount">Alpha </h3> <div class="score-percentage">（得分：19.92%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/sv01eASaoFXMVFo_1lVqG.webp" width=500> </div> </div> </div> <div class="container"> <div class="text-center"> <q>一名艺术家在闪烁的街灯下绘制色彩鲜艳的壁画，每一笔都让色彩无缝融合，路人满怀敬畏地注视着这幅杰作逐渐成型。</q> </div> <div class="image-container"> <div> <h3 class="score-amount">Wan 2.1 </h3> <div class="score-percentage">（得分：0.00%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/UOYqSEBFlPRade5qZUhr6.webp" width=500> </div> <div> <h3 class="score-amount">Pika </h3> <div class="score-percentage">（得分：100.00%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/T-SeHTxzHPV_RsWDnvuyx.webp" width=500> </div> </div> </div> </div> # 连贯性（Coherence）连贯性评分（Coherence Score）用于衡量生成视频的逻辑一致性，以及是否存在视觉伪影或画面故障。在不查看原始提示文本的前提下，用户需回答："哪一段视频的逻辑连贯性更强？例如，物理规则更符合常理，画面构图更合理的视频。" ## 示例 <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">（得分：89.15%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/XNBdO2xbRbzXRghJhw7AJ.webp" width="500" alt="数据集可视化"> </div> <div> <h3>Hunyuan </h3> <div class="score-percentage">（得分：11.85%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/QRmD9y4bl_E1YcrdqdhhO.webp" width="500" alt="数据集可视化"> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">（得分：12.28%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/2j-0AOR9SdYn3TIeJN7N6.webp" width="500" alt="数据集可视化"> </div> <div> <h3>Veo 2 </h3> <div class="score-percentage">（得分：87.72%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/VDD8-DTZ7knLnv2WI60QY.webp" width="500" alt="数据集可视化"> </div> </div> </div> </div> # 偏好度（Preference）偏好度评分（Preference Score）反映参与者对视频的视觉美观程度评价，与提示文本无关。用户需回答："你更青睐哪一段视频的美学效果？" ## 示例 <div class="vertical-container"> <div class="container"> <div class="image-container"> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">（得分：91.57%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/Ew0fJrcpN3izQxQ3Bxv-z.webp" width="500" alt="数据集可视化"> </div> <div> <h3>Hunyuan </h3> <div class="score-percentage">（得分：8.43%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/7urtjMmka0qEPFpwE5Jdz.webp" width="500" alt="数据集可视化"> </div> </div> </div> <div class="container"> <div class="image-container"> <div> <h3>Wan 2.1 </h3> <div class="score-percentage">（得分：13.18%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/XhcAQoTLIjygsSBw5AbYw.webp" width="500" alt="数据集可视化"> </div> <div> <h3>Veo 2 </h3> <div class="score-percentage">（得分：86.82%）</div> <img src="https://cdn-uploads.huggingface.co/production/uploads/664dcc6296d813a7e15e170e/geJYWGYJ2fM58wAxXkWZh.webp" width="500" alt="数据集可视化"> </div> </div> </div> </div> <br> # 关于 Rapidata Rapidata的技术让大规模人类反馈采集工作比以往任何时候都更加快捷、易用。访问[rapidata.ai](https://www.rapidata.ai/)了解更多我们如何革新AI开发中的人类反馈采集技术。 # 其他数据集我们运营主流图像生成模型的基准测试，相关结果可在[官方网站](https://www.rapidata.ai/leaderboard/image-models)查看。我们根据模型的连贯性/合理性、与提示文本的对齐程度以及风格偏好对模型进行排名。背后超过200万条的标注数据可通过以下链接获取： - 链接至[Rich Video Annotation数据集](https://huggingface.co/datasets/Rapidata/text-2-video-Rich-Human-Feedback) - 链接至[连贯性数据集](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Coherence_Dataset) - 链接至[文本-图像对齐数据集](https://huggingface.co/datasets/Rapidata/Flux_SD3_MJ_Dalle_Human_Alignment_Dataset) - 链接至[偏好数据集](https://huggingface.co/datasets/Rapidata/700k_Human_Preference_Dataset_FLUX_SD3_MJ_DALLE3) 我们还收集了[丰富人类反馈数据集](https://huggingface.co/datasets/Rapidata/text-2-image-Rich-Human-Feedback)，其中我们对提示文本中的每个词汇标注了对齐度评分，并对连贯性、整体对齐程度和风格偏好进行打分，最终为得分较低的图像生成了关注区域热力图。

提供机构：

maas

创建时间：

2025-03-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集

© 2023-2025 上海数据发展科技有限责任公司版权所有

沪ICP备17003045号-15 沪公网安备31010402336585号

二维码

社区交流群

面向社区/商业的数据集话题

二维码

科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作