tohoku-nlp/SLVMEval

Name: tohoku-nlp/SLVMEval
Creator: tohoku-nlp
Published: 2026-03-08 23:20:20
License: 暂无描述

Hugging Face2026-03-08 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/tohoku-nlp/SLVMEval

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: SLVMEval license: other language: - en size_categories: - 1K<n<10K tags: - video - benchmark - evaluation - long-video - preference-pairs --- # Dataset Card for SLVMEval ## Dataset Summary **SLVMEval** (Synthetic Long-Video Meta-Evaluation Benchmark) is a benchmark for **meta-evaluating automatic evaluation systems** for text-to-long video (T2LV) generation. The benchmark follows a **pairwise comparison-based** setup. It constructs controlled **high-quality vs. low-quality** long-video pairs by applying aspect-specific synthetic degradations to source videos. The final benchmark data is built by retaining **human-validated** pairs where the degradation is clearly perceptible. ## What This Release Contains This Hugging Face release contains benchmark artifacts under: ```text SLVMEval/ └── degraded/ └── degrade_5clip/ ├── aesthetics/ │ ├── cleaned_sampled_test.jsonl │ ├── degraded_video_data.jsonl │ ├── videos.zip │ └── frames.zip ├── background_consistency/ ├── color/ ├── dynamics_degree/ ├── move_scene/ ├── object_removal/ ├── scene/ ├── spatial_relationship/ ├── style/ └── technical_quality/ ``` ## Download ```bash hf auth login --token "$HF_TOKEN" hf download tohoku-nlp/SLVMEval --repo-type dataset --local-dir /work/data/slvmeval ``` ## Unzip (videos / frames) ```bash ROOT=/work/data/slvmeval/degraded/degrade_5clip ASPECTS=(aesthetics background_consistency color dynamics_degree move_scene object_removal scene spatial_relationship style technical_quality) for a in "${ASPECTS[@]}"; do d="$ROOT/$a" mkdir -p "$d/videos" "$d/frames" unzip -oq "$d/videos.zip" -d "$d/videos" unzip -oq "$d/frames.zip" -d "$d/frames" done ``` After unzip: ```text /work/data/slvmeval/ └── degraded/ └── degrade_5clip/ └── <aspect>/ ├── cleaned_sampled_test.jsonl ├── degraded_video_data.jsonl ├── videos/ │ └── <video_id>.mp4 └── frames/ └── <video_id>/ ├── 000001.jpg └── ... ``` ## Aspect Definitions | Key in data | Aspect name in paper | |---|---| | `aesthetics` | Aesthetics | | `technical_quality` | Technical Quality | | `style` | Appearance Style | | `background_consistency` | Background Consistency | | `move_scene` | Temporal Flow | | `scene` | Comprehensiveness | | `object_removal` | Object Integrity | | `spatial_relationship` | Spatial Relationship | | `dynamics_degree` | Dynamics Degree | | `color` | Color | ## Data Fields ### `cleaned_sampled_test.jsonl` One line corresponds to one pairwise evaluation sample. Main keys: - `__index__` - `prompt` - `first_model`, `second_model` - `first_video_id`, `second_video_id` - `aspect` - `reversed` - `meta_data.preference` ### `degraded_video_data.jsonl` Metadata records for `video_id`s referenced by `cleaned_sampled_test.jsonl`. Main keys: - `video_id` - `path` - `fps` - `frame_paths` - `predicted_clips` (e.g., `span`, `clip_id`, `path`) - `meta_data` ## Statistics | aspect | cleaned_rows | degraded_rows | videos_files | frame_dirs | |---|---:|---:|---:|---:| | aesthetics | 564 | 282 | 282 | 282 | | background_consistency | 708 | 354 | 354 | 354 | | color | 408 | 204 | 204 | 204 | | dynamics_degree | 666 | 333 | 333 | 333 | | move_scene | 570 | 285 | 285 | 285 | | object_removal | 200 | 100 | 100 | 100 | | scene | 470 | 235 | 235 | 235 | | spatial_relationship | 472 | 236 | 236 | 236 | | style | 624 | 312 | 312 | 312 | | technical_quality | 260 | 130 | 130 | 130 | | **total** | **4942** | **2471** | **2471** | **2471** | ## Limitations and Notes - Upstream source data (including Vript and original video platforms) remain subject to their original terms. - This release does not redistribute the full upstream source dataset itself. ## License This dataset is intended for academic, non-commercial research use. - Redistribution or re-upload is prohibited without permission. - If upstream source terms are stricter, upstream terms take precedence. ## Citation ```bibtex @inproceedings{matsuda2026slvmeval, title = {SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation}, author = {Ryosuke Matsuda and Keito Kudo and Haruto Yoshida and Nobuyuki Shimizu and Jun Suzuki}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2026} } ```

--- pretty_name: SLVMEval license: 其他 language: - 英语 size_categories: - 1千 < 样本数 < 1万 tags: - 视频 - 基准测试集 - 评估 - 长视频 - 偏好配对样本 --- ## SLVMEval数据集卡片 ## 数据集概述 **SLVMEval（合成长视频元评估基准，Synthetic Long-Video Meta-Evaluation Benchmark）**是一款用于**元评估文本到长视频（Text-to-Long Video，T2LV）生成任务的自动评估系统**的基准测试集。该基准采用**基于配对比较**的构建范式，通过对源视频施加特定维度的人工退化操作，构建受控的**高质量与低质量长视频配对样本**。最终的基准数据集仅保留经人工验证、退化效果清晰可辨的配对样本。 ## 本次发布包含的内容本次Hugging Face发布的基准测试工件存储于以下路径： text SLVMEval/ └── degraded/ └── degrade_5clip/ ├── aesthetics/ │ ├── cleaned_sampled_test.jsonl │ ├── degraded_video_data.jsonl │ ├── videos.zip │ └── frames.zip ├── background_consistency/ ├── color/ ├── dynamics_degree/ ├── move_scene/ ├── object_removal/ ├── scene/ ├── spatial_relationship/ ├── style/ └── technical_quality/ ## 下载方式可以通过以下命令下载该数据集： bash hf auth login --token "$HF_TOKEN" hf download tohoku-nlp/SLVMEval --repo-type dataset --local-dir /work/data/slvmeval ## 解压（视频与帧文件）以下为批量解压视频与帧文件的脚本： bash ROOT=/work/data/slvmeval/degraded/degrade_5clip ASPECTS=(aesthetics background_consistency color dynamics_degree move_scene object_removal scene spatial_relationship style technical_quality) for a in "${ASPECTS[@]}"; do d="$ROOT/$a" mkdir -p "$d/videos" "$d/frames" unzip -oq "$d/videos.zip" -d "$d/videos" unzip -oq "$d/frames.zip" -d "$d/frames" done 解压完成后的目录结构如下： text /work/data/slvmeval/ └── degraded/ └── degrade_5clip/ └── <评估维度>/ ├── cleaned_sampled_test.jsonl ├── degraded_video_data.jsonl ├── videos/ │ └── <video_id>.mp4 └── frames/ └── <video_id>/ ├── 000001.jpg └── ... ## 维度定义 | 数据中键名 | 论文中维度名称 | |---|---| | `aesthetics` | 美学质量（Aesthetics） | | `technical_quality` | 技术质量（Technical Quality） | | `style` | 外观风格（Appearance Style） | | `background_consistency` | 背景一致性（Background Consistency） | | `move_scene` | 时间流畅性（Temporal Flow） | | `scene` | 内容完整性（Comprehensiveness） | | `object_removal` | 对象完整性（Object Integrity） | | `spatial_relationship` | 空间关系（Spatial Relationship） | | `dynamics_degree` | 动态程度（Dynamics Degree） | | `color` | 色彩（Color） | ## 数据字段 ### `cleaned_sampled_test.jsonl` 每行对应一个配对评估样本。主要字段包括： - `__index__`：样本索引 - `prompt`：提示文本 - `first_model`、`second_model`：参与对比的两个模型 - `first_video_id`、`second_video_id`：两个对比视频的ID - `aspect`：评估维度 - `reversed`：反转标记 - `meta_data.preference`：元数据中的偏好标签 ### `degraded_video_data.jsonl` 该文件存储了`cleaned_sampled_test.jsonl`中引用的所有视频ID对应的元数据记录。主要字段包括： - `video_id`：视频ID - `path`：视频文件路径 - `fps`：帧率（Frames Per Second, FPS） - `frame_paths`：帧文件路径列表 - `predicted_clips`：预测片段信息（如`span`、`clip_id`、`path`） - `meta_data`：元数据 ## 数据集统计 | 评估维度 | 清理后样本数 | 退化后样本数 | 视频文件数 | 帧目录数 | |---|---:|---:|---:|---:| | aesthetics | 564 | 282 | 282 | 282 | | background_consistency | 708 | 354 | 354 | 354 | | color | 408 | 204 | 204 | 204 | | dynamics_degree | 666 | 333 | 333 | 333 | | move_scene | 570 | 285 | 285 | 285 | | object_removal | 200 | 100 | 100 | 100 | | scene | 470 | 235 | 235 | 235 | | spatial_relationship | 472 | 236 | 236 | 236 | | style | 624 | 312 | 312 | 312 | | technical_quality | 260 | 130 | 130 | 130 | | **总计** | **4942** | **2471** | **2471** | **2471** | ## 局限性与说明 - 上游源数据（包括Vript及原始视频平台）仍受其原使用条款约束。 - 本次发布未重新分发完整的上游源数据集本身。 ## 许可证本数据集仅可用于学术非商业研究用途： - 未经许可，禁止重新分发或上传本数据集。 - 若上游源数据的使用条款更为严格，则以其条款为准。 ## 引用格式 bibtex @inproceedings{matsuda2026slvmeval, title = {SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation}, author = {Ryosuke Matsuda and Keito Kudo and Haruto Yoshida and Nobuyuki Shimizu and Jun Suzuki}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2026} }

提供机构：

tohoku-nlp

5,000+

优质数据集

54 个

任务类型

进入经典数据集