five

nusdufv/text-2-video-human-preferences-motion

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nusdufv/text-2-video-human-preferences-motion
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: "Human Preference Data for AI Video Generation — Motion Quality (29K Labels, 4 Models)" language: - en license: cc-by-4.0 size_categories: - 10K<n<100K task_categories: - video-classification - text-to-video - reinforcement-learning configs: - config_name: default data_files: - split: train path: data/train-*.parquet tags: - human-preferences - video-generation - preference-data - human-motion - rlhf - reward-model - text-to-video - video-quality - pairwise-comparison - annotation - video-evaluation - video-benchmark - dpo - human-feedback - ai-video - generative-ai - sora - veo - kling - grok - luma - coherence - aesthetics - prompt-adherence - motion-quality - temporal-consistency - video-reward-model - preference-learning - video-rlhf --- # Human Preferences for AI-Generated Video: Motion Quality <p align="left"> <img src="https://huggingface.co/datasets/datapointai/text-2-video-human-preferences-motion/resolve/main/datapointlogo.png" alt="Datapoint AI" width="300"> </p> **29,283 pairwise human preference labels** comparing **4 frontier video generation models** on human motion across **3 quality dimensions**, collected from **4,349 real annotators** via [Datapoint AI](https://trydatapoint.com). This is the largest publicly available human preference dataset focused specifically on **human motion in AI-generated video**. ## Why This Dataset Video generation models are improving fast, but **evaluating human motion remains unsolved**. Automated judges (VLMs like GPT-4V, Gemini) miss subtle errors in gait, facial expressions, and multi-body coordination that humans catch easily. This dataset gives you **ground-truth human preferences** you can use to: - **Train video reward models** for RLHF / DPO / preference optimization - **Benchmark video generation models** on realistic human motion - **Calibrate VLM judges** — measure where automated evaluators disagree with humans - **Study annotation patterns** — inter-annotator agreement, position bias, response time distributions ## Models Compared | Model | Type | |---|---| | **Grok Imagine** | xAI's video generation model | | **Veo 3 Fast** | Google DeepMind | | **Kling 1.5 Pro** | Kuaishou | | **Luma Ray 2** | Luma Labs | ## Dataset Structure 354 aggregated comparison rows (from 29,283 individual annotations). Each row = one pairwise comparison between two model outputs for the same prompt. | Field | Description | |---|---| | `prompt` | Text prompt used to generate both videos | | `video1` / `video2` | GIF previews of the generated videos | | `model1` / `model2` | Which model generated each video | | `weighted_results1_Coherence` | Fraction of annotators preferring video 1 on coherence | | `weighted_results2_Coherence` | Fraction preferring video 2 on coherence | | `weighted_results1_Aesthetic` | Fraction preferring video 1 on aesthetics | | `weighted_results2_Aesthetic` | Fraction preferring video 2 on aesthetics | | `weighted_results1_Prompt_Adherence` | Fraction preferring video 1 on prompt faithfulness | | `weighted_results2_Prompt_Adherence` | Fraction preferring video 2 on prompt faithfulness | | `detailedResults_*` | Per-annotator votes with timestamps | | `subcategory` | Motion type: walking, dancing, talking, sports, stationary, multi-person | | `prompt_id` | Unique prompt identifier (1–60) | ## Evaluation Dimensions | Dimension | What annotators judged | |---|---| | **Coherence** | Temporal consistency — no flickering, warping, deformation, or physically implausible motion | | **Aesthetic** | Visual quality — composition, lighting, color, style, production value | | **Prompt Adherence** | Accuracy — does the video depict what the prompt describes? | ## Motion Categories | Category | Examples | Why it's hard for AI | |---|---|---| | **Walking / Running** | Gaits, jogging, sprinting | Weight shift, foot contact, natural rhythm | | **Dancing** | Ballet, hip-hop, folk | Complex coordinated movement, full-body flow | | **Talking / Expressions** | Speaking, singing, laughing | Lip sync, facial micro-movements | | **Sports / Action** | Martial arts, skateboarding | Fast motion, physics, athletic poses | | **Stationary** | Meditating, reading, posing | Subtle motion, identity preservation over time | | **Multi-Person** | Handshakes, sparring, group performance | Two+ bodies, occlusion, interaction physics | ## Key Results ### Overall Win Rates | Rank | Model | Win Rate | 95% CI | |---|---|---|---| | 1 | **Grok Imagine** | 54.7% | [54.0%, 55.5%] | | 2 | **Veo 3 Fast** | 54.6% | [53.8%, 55.3%] | | 3 | **Kling 1.5 Pro** | 47.9% | [47.1%, 48.7%] | | 4 | **Luma Ray 2** | 42.8% | [42.0%, 43.6%] | ### By Dimension | Model | Coherence | Aesthetic | Prompt Adherence | |---|---|---|---| | Grok Imagine | 53.6% | **55.7%** | 54.7% | | Veo 3 Fast | 54.5% | 54.7% | 54.5% | | Kling 1.5 Pro | 48.4% | 48.0% | 47.4% | | Luma Ray 2 | 43.5% | 41.5% | 43.5% | ## Quick Start ```python from datasets import load_dataset ds = load_dataset("datapointai/text-2-video-human-preferences-motion") print(ds["train"][0]) ``` ### Train a reward model ```python import pandas as pd from datasets import load_dataset ds = load_dataset("datapointai/text-2-video-human-preferences-motion", split="train") df = ds.to_pandas() # Each row is a comparison — use weighted scores as soft labels for _, row in df.iterrows(): prompt = row["prompt"] score_a_coherence = row["weighted_results1_Coherence"] score_b_coherence = row["weighted_results2_Coherence"] # Use as preference pairs for DPO, reward modeling, etc. ``` ## Data Quality | Metric | Value | |---|---| | Total annotations | 29,283 | | Unique annotators | 4,349 | | Unique prompts | 60 | | Pairwise comparisons | 354 | | Annotations per comparison | ~28 (median) | | Median response time | 14.9 seconds | | Position bias | 52.8% left / 47.2% right (near 50/50) | **Position bias control**: Videos were randomly shuffled between left/right for each comparison. Observed selection rate is near the 50/50 baseline. **Engagement verification**: Median 14.9s response time confirms annotators watched both videos (each 4–5 seconds) before deciding. **Annotator diversity**: 4,349 unique annotators with a median of 4 labels each — broad perspectives, low individual bias. ## Methodology - **60 prompts** generated with structured diversity across motion categories - **4 models** evaluated via Fal.ai API (single inference, no cherry-picking) - **All videos** are 4–5 seconds, 540p–720p, 16:9 - **Mobile-first annotation** through Datapoint AI's consumer app SDK - **Forced-choice** pairwise comparison with dimension-specific questions - **Dawid-Skene aggregation** available for consensus estimation ## Compared to Other Datasets | Dataset | Labels | Focus | Models | Dimensions | |---|---|---|---|---| | **This dataset** | **29,283** | **Human motion** | **4 frontier (2025)** | **3** | | Rapidata text-2-video | 2,570 | General video | 4 | 3 | | VideoGen-Eval | ~5,000 | General video | 6 | 1 | ## Get Custom Human Preference Data Need preference labels for **your** model, domain, or evaluation criteria? Datapoint AI runs the same annotation pipeline used to create this dataset — but customized to your specs: - **Your models** — any video, image, or text generation model - **Your prompts** — domain-specific evaluation sets - **Your dimensions** — custom quality criteria beyond coherence/aesthetics/adherence - **Scale** — from 1K to 1M+ labels, median 24-hour turnaround - **No professional annotator bias** — real users in a consumer app, not Mechanical Turk 🎓 **First dataset free for university researchers and early-stage startups.** 👉 **[Get started at trydatapoint.com](https://trydatapoint.com)** or email **sales@trydatapoint.com** ## Citation ```bibtex @dataset{datapointai_vidprefmotion_2026, title={Human Preference Data for AI Video Generation: Motion Quality}, author={Datapoint AI}, year={2026}, url={https://huggingface.co/datasets/datapointai/text-2-video-human-preferences-motion}, note={29,283 pairwise human preference labels for AI-generated human motion video} } ``` ## License CC-BY-4.0 — free for research and commercial use with attribution. ## About Datapoint AI [Datapoint AI](https://trydatapoint.com) collects human preference data at scale through a mobile-first annotation pipeline embedded in consumer apps. We replace mobile ads with data labeling tasks — real users, real preferences, no professional annotator bias. For custom evaluation studies, higher-scale labeling, or API access: **[trydatapoint.com](https://trydatapoint.com)**
提供机构:
nusdufv
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作