five

tohoku-nlp/SLVMEval

收藏
Hugging Face2026-03-08 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/tohoku-nlp/SLVMEval
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: SLVMEval license: other language: - en size_categories: - 1K<n<10K tags: - video - benchmark - evaluation - long-video - preference-pairs --- # Dataset Card for SLVMEval ## Dataset Summary **SLVMEval** (Synthetic Long-Video Meta-Evaluation Benchmark) is a benchmark for **meta-evaluating automatic evaluation systems** for text-to-long video (T2LV) generation. The benchmark follows a **pairwise comparison-based** setup. It constructs controlled **high-quality vs. low-quality** long-video pairs by applying aspect-specific synthetic degradations to source videos. The final benchmark data is built by retaining **human-validated** pairs where the degradation is clearly perceptible. ## What This Release Contains This Hugging Face release contains benchmark artifacts under: ```text SLVMEval/ └── degraded/ └── degrade_5clip/ ├── aesthetics/ │ ├── cleaned_sampled_test.jsonl │ ├── degraded_video_data.jsonl │ ├── videos.zip │ └── frames.zip ├── background_consistency/ ├── color/ ├── dynamics_degree/ ├── move_scene/ ├── object_removal/ ├── scene/ ├── spatial_relationship/ ├── style/ └── technical_quality/ ``` ## Download ```bash hf auth login --token "$HF_TOKEN" hf download tohoku-nlp/SLVMEval --repo-type dataset --local-dir /work/data/slvmeval ``` ## Unzip (videos / frames) ```bash ROOT=/work/data/slvmeval/degraded/degrade_5clip ASPECTS=(aesthetics background_consistency color dynamics_degree move_scene object_removal scene spatial_relationship style technical_quality) for a in "${ASPECTS[@]}"; do d="$ROOT/$a" mkdir -p "$d/videos" "$d/frames" unzip -oq "$d/videos.zip" -d "$d/videos" unzip -oq "$d/frames.zip" -d "$d/frames" done ``` After unzip: ```text /work/data/slvmeval/ └── degraded/ └── degrade_5clip/ └── <aspect>/ ├── cleaned_sampled_test.jsonl ├── degraded_video_data.jsonl ├── videos/ │ └── <video_id>.mp4 └── frames/ └── <video_id>/ ├── 000001.jpg └── ... ``` ## Aspect Definitions | Key in data | Aspect name in paper | |---|---| | `aesthetics` | Aesthetics | | `technical_quality` | Technical Quality | | `style` | Appearance Style | | `background_consistency` | Background Consistency | | `move_scene` | Temporal Flow | | `scene` | Comprehensiveness | | `object_removal` | Object Integrity | | `spatial_relationship` | Spatial Relationship | | `dynamics_degree` | Dynamics Degree | | `color` | Color | ## Data Fields ### `cleaned_sampled_test.jsonl` One line corresponds to one pairwise evaluation sample. Main keys: - `__index__` - `prompt` - `first_model`, `second_model` - `first_video_id`, `second_video_id` - `aspect` - `reversed` - `meta_data.preference` ### `degraded_video_data.jsonl` Metadata records for `video_id`s referenced by `cleaned_sampled_test.jsonl`. Main keys: - `video_id` - `path` - `fps` - `frame_paths` - `predicted_clips` (e.g., `span`, `clip_id`, `path`) - `meta_data` ## Statistics | aspect | cleaned_rows | degraded_rows | videos_files | frame_dirs | |---|---:|---:|---:|---:| | aesthetics | 564 | 282 | 282 | 282 | | background_consistency | 708 | 354 | 354 | 354 | | color | 408 | 204 | 204 | 204 | | dynamics_degree | 666 | 333 | 333 | 333 | | move_scene | 570 | 285 | 285 | 285 | | object_removal | 200 | 100 | 100 | 100 | | scene | 470 | 235 | 235 | 235 | | spatial_relationship | 472 | 236 | 236 | 236 | | style | 624 | 312 | 312 | 312 | | technical_quality | 260 | 130 | 130 | 130 | | **total** | **4942** | **2471** | **2471** | **2471** | ## Limitations and Notes - Upstream source data (including Vript and original video platforms) remain subject to their original terms. - This release does not redistribute the full upstream source dataset itself. ## License This dataset is intended for academic, non-commercial research use. - Redistribution or re-upload is prohibited without permission. - If upstream source terms are stricter, upstream terms take precedence. ## Citation ```bibtex @inproceedings{matsuda2026slvmeval, title = {SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation}, author = {Ryosuke Matsuda and Keito Kudo and Haruto Yoshida and Nobuyuki Shimizu and Jun Suzuki}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2026} } ```

--- pretty_name: SLVMEval license: 其他 language: - 英语 size_categories: - 1千 < 样本数 < 1万 tags: - 视频 - 基准测试集 - 评估 - 长视频 - 偏好配对样本 --- ## SLVMEval数据集卡片 ## 数据集概述 **SLVMEval(合成长视频元评估基准,Synthetic Long-Video Meta-Evaluation Benchmark)**是一款用于**元评估文本到长视频(Text-to-Long Video,T2LV)生成任务的自动评估系统**的基准测试集。 该基准采用**基于配对比较**的构建范式,通过对源视频施加特定维度的人工退化操作,构建受控的**高质量与低质量长视频配对样本**。最终的基准数据集仅保留经人工验证、退化效果清晰可辨的配对样本。 ## 本次发布包含的内容 本次Hugging Face发布的基准测试工件存储于以下路径: text SLVMEval/ └── degraded/ └── degrade_5clip/ ├── aesthetics/ │ ├── cleaned_sampled_test.jsonl │ ├── degraded_video_data.jsonl │ ├── videos.zip │ └── frames.zip ├── background_consistency/ ├── color/ ├── dynamics_degree/ ├── move_scene/ ├── object_removal/ ├── scene/ ├── spatial_relationship/ ├── style/ └── technical_quality/ ## 下载方式 可以通过以下命令下载该数据集: bash hf auth login --token "$HF_TOKEN" hf download tohoku-nlp/SLVMEval --repo-type dataset --local-dir /work/data/slvmeval ## 解压(视频与帧文件) 以下为批量解压视频与帧文件的脚本: bash ROOT=/work/data/slvmeval/degraded/degrade_5clip ASPECTS=(aesthetics background_consistency color dynamics_degree move_scene object_removal scene spatial_relationship style technical_quality) for a in "${ASPECTS[@]}"; do d="$ROOT/$a" mkdir -p "$d/videos" "$d/frames" unzip -oq "$d/videos.zip" -d "$d/videos" unzip -oq "$d/frames.zip" -d "$d/frames" done 解压完成后的目录结构如下: text /work/data/slvmeval/ └── degraded/ └── degrade_5clip/ └── <评估维度>/ ├── cleaned_sampled_test.jsonl ├── degraded_video_data.jsonl ├── videos/ │ └── <video_id>.mp4 └── frames/ └── <video_id>/ ├── 000001.jpg └── ... ## 维度定义 | 数据中键名 | 论文中维度名称 | |---|---| | `aesthetics` | 美学质量(Aesthetics) | | `technical_quality` | 技术质量(Technical Quality) | | `style` | 外观风格(Appearance Style) | | `background_consistency` | 背景一致性(Background Consistency) | | `move_scene` | 时间流畅性(Temporal Flow) | | `scene` | 内容完整性(Comprehensiveness) | | `object_removal` | 对象完整性(Object Integrity) | | `spatial_relationship` | 空间关系(Spatial Relationship) | | `dynamics_degree` | 动态程度(Dynamics Degree) | | `color` | 色彩(Color) | ## 数据字段 ### `cleaned_sampled_test.jsonl` 每行对应一个配对评估样本。主要字段包括: - `__index__`:样本索引 - `prompt`:提示文本 - `first_model`、`second_model`:参与对比的两个模型 - `first_video_id`、`second_video_id`:两个对比视频的ID - `aspect`:评估维度 - `reversed`:反转标记 - `meta_data.preference`:元数据中的偏好标签 ### `degraded_video_data.jsonl` 该文件存储了`cleaned_sampled_test.jsonl`中引用的所有视频ID对应的元数据记录。主要字段包括: - `video_id`:视频ID - `path`:视频文件路径 - `fps`:帧率(Frames Per Second, FPS) - `frame_paths`:帧文件路径列表 - `predicted_clips`:预测片段信息(如`span`、`clip_id`、`path`) - `meta_data`:元数据 ## 数据集统计 | 评估维度 | 清理后样本数 | 退化后样本数 | 视频文件数 | 帧目录数 | |---|---:|---:|---:|---:| | aesthetics | 564 | 282 | 282 | 282 | | background_consistency | 708 | 354 | 354 | 354 | | color | 408 | 204 | 204 | 204 | | dynamics_degree | 666 | 333 | 333 | 333 | | move_scene | 570 | 285 | 285 | 285 | | object_removal | 200 | 100 | 100 | 100 | | scene | 470 | 235 | 235 | 235 | | spatial_relationship | 472 | 236 | 236 | 236 | | style | 624 | 312 | 312 | 312 | | technical_quality | 260 | 130 | 130 | 130 | | **总计** | **4942** | **2471** | **2471** | **2471** | ## 局限性与说明 - 上游源数据(包括Vript及原始视频平台)仍受其原使用条款约束。 - 本次发布未重新分发完整的上游源数据集本身。 ## 许可证 本数据集仅可用于学术非商业研究用途: - 未经许可,禁止重新分发或上传本数据集。 - 若上游源数据的使用条款更为严格,则以其条款为准。 ## 引用格式 bibtex @inproceedings{matsuda2026slvmeval, title = {SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation}, author = {Ryosuke Matsuda and Keito Kudo and Haruto Yoshida and Nobuyuki Shimizu and Jun Suzuki}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2026} }
提供机构:
tohoku-nlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作