tohoku-nlp/SLVMEval
收藏Hugging Face2026-03-08 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/tohoku-nlp/SLVMEval
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: SLVMEval
license: other
language:
- en
size_categories:
- 1K<n<10K
tags:
- video
- benchmark
- evaluation
- long-video
- preference-pairs
---
# Dataset Card for SLVMEval
## Dataset Summary
**SLVMEval** (Synthetic Long-Video Meta-Evaluation Benchmark) is a benchmark for **meta-evaluating automatic evaluation systems** for text-to-long video (T2LV) generation.
The benchmark follows a **pairwise comparison-based** setup. It constructs controlled **high-quality vs. low-quality** long-video pairs by applying aspect-specific synthetic degradations to source videos.
The final benchmark data is built by retaining **human-validated** pairs where the degradation is clearly perceptible.
## What This Release Contains
This Hugging Face release contains benchmark artifacts under:
```text
SLVMEval/
└── degraded/
└── degrade_5clip/
├── aesthetics/
│ ├── cleaned_sampled_test.jsonl
│ ├── degraded_video_data.jsonl
│ ├── videos.zip
│ └── frames.zip
├── background_consistency/
├── color/
├── dynamics_degree/
├── move_scene/
├── object_removal/
├── scene/
├── spatial_relationship/
├── style/
└── technical_quality/
```
## Download
```bash
hf auth login --token "$HF_TOKEN"
hf download tohoku-nlp/SLVMEval --repo-type dataset --local-dir /work/data/slvmeval
```
## Unzip (videos / frames)
```bash
ROOT=/work/data/slvmeval/degraded/degrade_5clip
ASPECTS=(aesthetics background_consistency color dynamics_degree move_scene object_removal scene spatial_relationship style technical_quality)
for a in "${ASPECTS[@]}"; do
d="$ROOT/$a"
mkdir -p "$d/videos" "$d/frames"
unzip -oq "$d/videos.zip" -d "$d/videos"
unzip -oq "$d/frames.zip" -d "$d/frames"
done
```
After unzip:
```text
/work/data/slvmeval/
└── degraded/
└── degrade_5clip/
└── <aspect>/
├── cleaned_sampled_test.jsonl
├── degraded_video_data.jsonl
├── videos/
│ └── <video_id>.mp4
└── frames/
└── <video_id>/
├── 000001.jpg
└── ...
```
## Aspect Definitions
| Key in data | Aspect name in paper |
|---|---|
| `aesthetics` | Aesthetics |
| `technical_quality` | Technical Quality |
| `style` | Appearance Style |
| `background_consistency` | Background Consistency |
| `move_scene` | Temporal Flow |
| `scene` | Comprehensiveness |
| `object_removal` | Object Integrity |
| `spatial_relationship` | Spatial Relationship |
| `dynamics_degree` | Dynamics Degree |
| `color` | Color |
## Data Fields
### `cleaned_sampled_test.jsonl`
One line corresponds to one pairwise evaluation sample.
Main keys:
- `__index__`
- `prompt`
- `first_model`, `second_model`
- `first_video_id`, `second_video_id`
- `aspect`
- `reversed`
- `meta_data.preference`
### `degraded_video_data.jsonl`
Metadata records for `video_id`s referenced by `cleaned_sampled_test.jsonl`.
Main keys:
- `video_id`
- `path`
- `fps`
- `frame_paths`
- `predicted_clips` (e.g., `span`, `clip_id`, `path`)
- `meta_data`
## Statistics
| aspect | cleaned_rows | degraded_rows | videos_files | frame_dirs |
|---|---:|---:|---:|---:|
| aesthetics | 564 | 282 | 282 | 282 |
| background_consistency | 708 | 354 | 354 | 354 |
| color | 408 | 204 | 204 | 204 |
| dynamics_degree | 666 | 333 | 333 | 333 |
| move_scene | 570 | 285 | 285 | 285 |
| object_removal | 200 | 100 | 100 | 100 |
| scene | 470 | 235 | 235 | 235 |
| spatial_relationship | 472 | 236 | 236 | 236 |
| style | 624 | 312 | 312 | 312 |
| technical_quality | 260 | 130 | 130 | 130 |
| **total** | **4942** | **2471** | **2471** | **2471** |
## Limitations and Notes
- Upstream source data (including Vript and original video platforms) remain subject to their original terms.
- This release does not redistribute the full upstream source dataset itself.
## License
This dataset is intended for academic, non-commercial research use.
- Redistribution or re-upload is prohibited without permission.
- If upstream source terms are stricter, upstream terms take precedence.
## Citation
```bibtex
@inproceedings{matsuda2026slvmeval,
title = {SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation},
author = {Ryosuke Matsuda and Keito Kudo and Haruto Yoshida and Nobuyuki Shimizu and Jun Suzuki},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2026}
}
```
---
pretty_name: SLVMEval
license: 其他
language:
- 英语
size_categories:
- 1千 < 样本数 < 1万
tags:
- 视频
- 基准测试集
- 评估
- 长视频
- 偏好配对样本
---
## SLVMEval数据集卡片
## 数据集概述
**SLVMEval(合成长视频元评估基准,Synthetic Long-Video Meta-Evaluation Benchmark)**是一款用于**元评估文本到长视频(Text-to-Long Video,T2LV)生成任务的自动评估系统**的基准测试集。
该基准采用**基于配对比较**的构建范式,通过对源视频施加特定维度的人工退化操作,构建受控的**高质量与低质量长视频配对样本**。最终的基准数据集仅保留经人工验证、退化效果清晰可辨的配对样本。
## 本次发布包含的内容
本次Hugging Face发布的基准测试工件存储于以下路径:
text
SLVMEval/
└── degraded/
└── degrade_5clip/
├── aesthetics/
│ ├── cleaned_sampled_test.jsonl
│ ├── degraded_video_data.jsonl
│ ├── videos.zip
│ └── frames.zip
├── background_consistency/
├── color/
├── dynamics_degree/
├── move_scene/
├── object_removal/
├── scene/
├── spatial_relationship/
├── style/
└── technical_quality/
## 下载方式
可以通过以下命令下载该数据集:
bash
hf auth login --token "$HF_TOKEN"
hf download tohoku-nlp/SLVMEval --repo-type dataset --local-dir /work/data/slvmeval
## 解压(视频与帧文件)
以下为批量解压视频与帧文件的脚本:
bash
ROOT=/work/data/slvmeval/degraded/degrade_5clip
ASPECTS=(aesthetics background_consistency color dynamics_degree move_scene object_removal scene spatial_relationship style technical_quality)
for a in "${ASPECTS[@]}"; do
d="$ROOT/$a"
mkdir -p "$d/videos" "$d/frames"
unzip -oq "$d/videos.zip" -d "$d/videos"
unzip -oq "$d/frames.zip" -d "$d/frames"
done
解压完成后的目录结构如下:
text
/work/data/slvmeval/
└── degraded/
└── degrade_5clip/
└── <评估维度>/
├── cleaned_sampled_test.jsonl
├── degraded_video_data.jsonl
├── videos/
│ └── <video_id>.mp4
└── frames/
└── <video_id>/
├── 000001.jpg
└── ...
## 维度定义
| 数据中键名 | 论文中维度名称 |
|---|---|
| `aesthetics` | 美学质量(Aesthetics) |
| `technical_quality` | 技术质量(Technical Quality) |
| `style` | 外观风格(Appearance Style) |
| `background_consistency` | 背景一致性(Background Consistency) |
| `move_scene` | 时间流畅性(Temporal Flow) |
| `scene` | 内容完整性(Comprehensiveness) |
| `object_removal` | 对象完整性(Object Integrity) |
| `spatial_relationship` | 空间关系(Spatial Relationship) |
| `dynamics_degree` | 动态程度(Dynamics Degree) |
| `color` | 色彩(Color) |
## 数据字段
### `cleaned_sampled_test.jsonl`
每行对应一个配对评估样本。主要字段包括:
- `__index__`:样本索引
- `prompt`:提示文本
- `first_model`、`second_model`:参与对比的两个模型
- `first_video_id`、`second_video_id`:两个对比视频的ID
- `aspect`:评估维度
- `reversed`:反转标记
- `meta_data.preference`:元数据中的偏好标签
### `degraded_video_data.jsonl`
该文件存储了`cleaned_sampled_test.jsonl`中引用的所有视频ID对应的元数据记录。主要字段包括:
- `video_id`:视频ID
- `path`:视频文件路径
- `fps`:帧率(Frames Per Second, FPS)
- `frame_paths`:帧文件路径列表
- `predicted_clips`:预测片段信息(如`span`、`clip_id`、`path`)
- `meta_data`:元数据
## 数据集统计
| 评估维度 | 清理后样本数 | 退化后样本数 | 视频文件数 | 帧目录数 |
|---|---:|---:|---:|---:|
| aesthetics | 564 | 282 | 282 | 282 |
| background_consistency | 708 | 354 | 354 | 354 |
| color | 408 | 204 | 204 | 204 |
| dynamics_degree | 666 | 333 | 333 | 333 |
| move_scene | 570 | 285 | 285 | 285 |
| object_removal | 200 | 100 | 100 | 100 |
| scene | 470 | 235 | 235 | 235 |
| spatial_relationship | 472 | 236 | 236 | 236 |
| style | 624 | 312 | 312 | 312 |
| technical_quality | 260 | 130 | 130 | 130 |
| **总计** | **4942** | **2471** | **2471** | **2471** |
## 局限性与说明
- 上游源数据(包括Vript及原始视频平台)仍受其原使用条款约束。
- 本次发布未重新分发完整的上游源数据集本身。
## 许可证
本数据集仅可用于学术非商业研究用途:
- 未经许可,禁止重新分发或上传本数据集。
- 若上游源数据的使用条款更为严格,则以其条款为准。
## 引用格式
bibtex
@inproceedings{matsuda2026slvmeval,
title = {SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation},
author = {Ryosuke Matsuda and Keito Kudo and Haruto Yoshida and Nobuyuki Shimizu and Jun Suzuki},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2026}
}
提供机构:
tohoku-nlp



