mlvbench-review/MLV-Bench
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/mlvbench-review/MLV-Bench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
pretty_name: MLV-Bench
language:
- en
tags:
- medical-video-understanding
- long-context-video
- multimodal-large-language-models
- benchmark
- visual-question-answering
- croissant
size_categories:
- 100<n<1K
---
# MLV-Bench
MLV-Bench is a benchmark for long-context medical video understanding in the wild. It contains 340 public full-procedure medical videos from 8 public sources, totaling 759 decoded hours, and 1,253 verified multiple-choice questions.
## Files
- `mlvbench.jsonl`: official benchmark metadata and QA records. Each line is one video record with nested QA items.
- `sample/`: reviewer-facing representative sample with four videos and a same-schema JSONL file.
- `mlvbench_croissant.json`: Croissant metadata with Responsible AI fields for NeurIPS 2026 review.
## Schema
Each JSONL line contains `key`, `dataset`, `organ`, `scene_type`, `duration_tier`, `video_path`, `num_frames`, `fps`, `duration_seconds`, and `qa`. Each QA item contains `uid`, `question`, `options`, `answer`, `task_id`, `task_name`, `task_class`, `category`, `question_type`, and optional hop metadata.
## Intended use
This dataset is intended for research evaluation of multimodal models on long-context medical video understanding, sparse evidence retrieval, and multi-hop reasoning. It is not intended for clinical diagnosis, patient management, or deployment.
## Representative sample
Because the complete dataset is larger than 4 GB, the `sample/` folder provides a reviewer-accessible subset. The sample is stratified by clinical scene type and includes surgery, gastrointestinal endoscopy, colonoscopy, and ultrasound examples. It is for data-quality inspection only and is not the official evaluation split.
## Licensing and source terms
MLV-Bench is derived from multiple public medical video datasets. Source-specific licenses and usage terms apply to the corresponding source data. Users must comply with all original dataset licenses and privacy terms.
## Responsible AI notes
The benchmark uses public medical procedure videos and does not intentionally include direct patient identifiers in the benchmark JSONL. However, clinical videos are human-subject medical data and may contain residual source metadata or overlays. Users must not attempt re-identification and must use the dataset only for approved research evaluation.
提供机构:
mlvbench-review



