mlvbench-review/MLV-Bench

Name: mlvbench-review/MLV-Bench
Creator: mlvbench-review
Published: 2026-04-27 08:52:14
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/mlvbench-review/MLV-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: other pretty_name: MLV-Bench language: - en tags: - medical-video-understanding - long-context-video - multimodal-large-language-models - benchmark - visual-question-answering - croissant size_categories: - 100<n<1K --- # MLV-Bench MLV-Bench is a benchmark for long-context medical video understanding in the wild. It contains 340 public full-procedure medical videos from 8 public sources, totaling 759 decoded hours, and 1,253 verified multiple-choice questions. ## Files - `mlvbench.jsonl`: official benchmark metadata and QA records. Each line is one video record with nested QA items. - `sample/`: reviewer-facing representative sample with four videos and a same-schema JSONL file. - `mlvbench_croissant.json`: Croissant metadata with Responsible AI fields for NeurIPS 2026 review. ## Schema Each JSONL line contains `key`, `dataset`, `organ`, `scene_type`, `duration_tier`, `video_path`, `num_frames`, `fps`, `duration_seconds`, and `qa`. Each QA item contains `uid`, `question`, `options`, `answer`, `task_id`, `task_name`, `task_class`, `category`, `question_type`, and optional hop metadata. ## Intended use This dataset is intended for research evaluation of multimodal models on long-context medical video understanding, sparse evidence retrieval, and multi-hop reasoning. It is not intended for clinical diagnosis, patient management, or deployment. ## Representative sample Because the complete dataset is larger than 4 GB, the `sample/` folder provides a reviewer-accessible subset. The sample is stratified by clinical scene type and includes surgery, gastrointestinal endoscopy, colonoscopy, and ultrasound examples. It is for data-quality inspection only and is not the official evaluation split. ## Licensing and source terms MLV-Bench is derived from multiple public medical video datasets. Source-specific licenses and usage terms apply to the corresponding source data. Users must comply with all original dataset licenses and privacy terms. ## Responsible AI notes The benchmark uses public medical procedure videos and does not intentionally include direct patient identifiers in the benchmark JSONL. However, clinical videos are human-subject medical data and may contain residual source metadata or overlays. Users must not attempt re-identification and must use the dataset only for approved research evaluation.

提供机构：

mlvbench-review

5,000+

优质数据集

54 个

任务类型

进入经典数据集