MWS-Vision-Bench

Name: MWS-Vision-Bench
Creator: maas
Published: 2025-12-05 16:54:38
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/MTSAIR/MWS-Vision-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

# MWS-Vision-Bench > 🇷🇺 *Русскоязычное описание ниже / Russian summary below.* **MWS Vision Bench** — the first **Russian-language business-OCR benchmark** designed for multimodal large language models (MLLMs). This is the validation split - publicly available for open evaluation and comparison. 🧩 **Paper is coming soon.** 🔗 **Official repository:** [github.com/mts-ai/MWS-Vision-Bench](https://github.com/mts-ai/MWS-Vision-Bench) 🏢 **Organization:** [MTSAIR on Hugging Face](https://huggingface.co/MTSAIR) 📰 **Article on Habr (in Russian):** [“MWS Vision Bench — the first Russian business-OCR benchmark”](https://habr.com/ru/companies/mts_ai/articles/953292/) --- ## 📊 Dataset Statistics - **Total samples:** 1,302 - **Unique images:** 400 - **Task types:** 5 --- ## 🖼️ Dataset Preview ![Dataset Examples](preview.jpg) *Examples of diverse document types in the benchmark: business documents, handwritten notes, technical drawings, receipts, and more.* --- ## 📁 Repository Structure ``` MWS-Vision-Bench/ ├── metadata.jsonl # Dataset annotations ├── images/ # Image files organized by category │ ├── business/ │ │ ├── scans/ │ │ ├── sheets/ │ │ ├── plans/ │ │ └── diagramms/ │ └── personal/ │ ├── hand_documents/ │ ├── hand_notebooks/ │ └── hand_misc/ └── README.md # This file ``` --- ## 📋 Data Format Each line in `metadata.jsonl` contains one JSON object: ```python { "file_name": "images/image_0.jpg", # Path to the image "id": "1", # Unique identifier "type": "text grounding ru", # Task type "dataset_name": "business", # Subdataset name "question": "...", # Question in Russian "answers": ["398", "65", ...] # List of valid answers (as strings) } ``` --- ## 🎯 Task Types | Task | Description | Count | |------|--------------|-------| | `document parsing ru` | Parsing structured documents | 243 | | `full-page OCR ru` | End-to-end OCR on full pages | 144 | | `key information extraction ru` | Extracting key fields | 119 | | `reasoning VQA ru` | Visual reasoning in Russian | 400 | | `text grounding ru` | Text–region alignment | 396 | --- ## 📊 Leaderboard (Validation Set) Top models evaluated on this validation dataset: | Model | Overall | img→text | img→markdown | Grounding | KIE (JSON) | VQA | |-------|---------|----------|--------------|-----------|------------|-----| | **Gemini-2.5-pro** | **0.682** | 0.836 | 0.745 | 0.084 | 0.891 | 0.853 | | **Gemini-2.5-flash** | **0.644** | 0.796 | 0.683 | 0.067 | 0.841 | 0.833 | | **gpt-4.1-mini** | **0.643** | 0.866 | 0.724 | 0.091 | 0.750 | 0.782 | | **Claude-4.5-Sonnet** | **0.639** | 0.723 | 0.676 | 0.377 | 0.728 | 0.692 | | **gpt-5-mini** | **0.632** | 0.797 | 0.678 | 0.126 | 0.784 | 0.776 | | Qwen2.5-VL-72B | 0.631 | 0.848 | 0.712 | 0.220 | 0.644 | 0.732 | | gpt-5-mini (responses) | 0.594 | 0.743 | 0.567 | 0.118 | 0.811 | 0.731 | | Qwen3-VL-30B-A3B | 0.589 | 0.802 | 0.688 | 0.053 | 0.661 | 0.743 | | gpt-4.1 | 0.587 | 0.709 | 0.693 | 0.086 | 0.662 | 0.784 | | Qwen3-VL-32B | 0.585 | 0.732 | 0.646 | 0.054 | 0.724 | 0.770 | | Qwen3-VL-30B-A3B-FP8 | 0.583 | 0.798 | 0.683 | 0.056 | 0.638 | 0.740 | | Qwen2.5-VL-32B | 0.577 | 0.767 | 0.649 | 0.232 | 0.493 | 0.743 | | gpt-5 (responses) | 0.573 | 0.746 | 0.650 | 0.080 | 0.687 | 0.704 | | Qwen2.5-VL-7B | 0.549 | 0.779 | 0.704 | 0.185 | 0.426 | 0.651 | | gpt-4.1-nano | 0.503 | 0.676 | 0.672 | 0.028 | 0.567 | 0.573 | | gpt-5-nano | 0.503 | 0.487 | 0.583 | 0.091 | 0.661 | 0.693 | | Qwen3-VL-2B | 0.439 | 0.592 | 0.613 | 0.029 | 0.356 | 0.605 | | Qwen2.5-VL-3B | 0.402 | 0.613 | 0.654 | 0.045 | 0.203 | 0.494 | *Scale: 0.0 - 1.0 (higher is better)* **📝 Submit your model**: To evaluate on the private test set, contact [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai) --- ## 💻 Usage Example ```python from datasets import load_dataset # Load dataset (authorization required if private) dataset = load_dataset("MTSAIR/MWS-Vision-Bench", token="hf_...") # Example iteration for item in dataset: print(f"ID: {item['id']}") print(f"Type: {item['type']}") print(f"Question: {item['question']}") print(f"Image: {item['image_path']}") print(f"Answers: {item['answers']}") ``` --- ## 📄 License **MIT License** © 2024 MTS AI See [LICENSE](https://github.com/MTSAIR/multimodalocr/blob/main/LICENSE.txt) for details. --- ## 📚 Citation If you use this dataset in your research, please cite: ```bibtex @misc{mwsvisionbench2024, title={MWS-Vision-Bench: Russian Multimodal OCR Benchmark}, author={MTS AI Research}, organization={MTSAIR}, year={2025}, url={https://huggingface.co/datasets/MTSAIR/MWS-Vision-Bench}, note={Paper coming soon} } ``` --- ## 🤝 Contacts - **Team:** [MTSAIR Research](https://huggingface.co/MTSAIR) - **Email:** [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai) --- ## 🇷🇺 Краткое описание **MWS Vision Bench** — первый русскоязычный бенчмарк для бизнес-OCR в эпоху мультимодальных моделей. Он включает 1302 примера и 5 типов задач, отражающих реальные сценарии обработки бизнес-документов и рукописных данных. Датасет создан для оценки и развития мультимодальных LLM в русскоязычном контексте. 📄 *Научная статья в процессе подготовки (paper coming soon).* --- **Made with ❤️ by MTS AI Research Team**

# MWS视觉基准测试（MWS-Vision-Bench） > 🇷🇺 *下方为俄语摘要 / Russian summary below.* **MWS视觉基准测试（MWS-Vision-Bench）** 是首个专为多模态大语言模型（multimodal large language models）设计的**俄语商业光学字符识别（business-OCR）基准测试集**。本数据集为验证拆分集，可公开获取用于开放评估与对比。 🧩 **论文即将上线。** 🔗 **官方仓库：** [github.com/mts-ai/MWS-Vision-Bench](https://github.com/mts-ai/MWS-Vision-Bench) 🏢 **开发机构：** [Hugging Face 上的 MTSAIR](https://huggingface.co/MTSAIR) 📰 **Habr平台俄语文章：** ["MWS Vision Bench — 首个俄语商业OCR基准测试集"](https://habr.com/ru/companies/mts_ai/articles/953292/) --- ## 📊 数据集统计信息 - **总样本数：** 1,302 - **唯一图像数：** 400 - **任务类型数：** 5 --- ## 🖼️ 数据集预览 ![Dataset Examples](preview.jpg) *本基准测试集中包含多种类型文档示例：商业文档、手写笔记、技术图纸、收据及更多。* --- ## 📁 仓库目录结构 MWS-Vision-Bench/ ├── metadata.jsonl # 数据集标注文件 ├── images/ # 按类别组织的图像文件 │ ├── business/ │ │ ├── scans/ # 扫描件 │ │ ├── sheets/ # 表单文档 │ │ ├── plans/ # 方案文档 │ │ └── diagramms/ # 图表 │ └── personal/ │ ├── hand_documents/ # 手写文档 │ ├── hand_notebooks/ # 手写笔记本 │ └── hand_misc/ # 其他手写材料 └── README.md # 本说明文件 --- ## 📋 数据格式 `metadata.jsonl` 的每一行均包含一个JSON对象： python { "file_name": "images/image_0.jpg", # 图像文件路径 "id": "1", # 唯一标识符 "type": "text grounding ru", # 任务类型 "dataset_name": "business", # 子数据集名称 "question": "...", # 俄语问题 "answers": ["398", "65", ...] # 有效答案列表（字符串格式） } --- ## 🎯 任务类型 | 任务 | 任务描述 | 样本数量 | |------|--------------|-------| | `document parsing ru` | 结构化文档解析 | 243 | | `full-page OCR ru` | 全页面端到端光学字符识别（full-page OCR） | 144 | | `key information extraction ru` | 关键字段信息抽取 | 119 | | `reasoning VQA ru` | 俄语视觉推理 | 400 | | `text grounding ru` | 文本锚定（text grounding） | 396 | --- ## 📊 验证集排行榜已在本验证数据集上完成评估的顶尖模型： | 模型 | 综合得分 | 图像→文本 | 图像→Markdown | 文本锚定 | 关键信息抽取（JSON） | 视觉问答 | |-------|---------|----------|--------------|-----------|------------|-----| | **Gemini-2.5-pro** | **0.682** | 0.836 | 0.745 | 0.084 | 0.891 | 0.853 | | **Gemini-2.5-flash** | **0.644** | 0.796 | 0.683 | 0.067 | 0.841 | 0.833 | | **gpt-4.1-mini** | **0.643** | 0.866 | 0.724 | 0.091 | 0.750 | 0.782 | | **Claude-4.5-Sonnet** | **0.639** | 0.723 | 0.676 | 0.377 | 0.728 | 0.692 | | **gpt-5-mini** | **0.632** | 0.797 | 0.678 | 0.126 | 0.784 | 0.776 | | Qwen2.5-VL-72B | 0.631 | 0.848 | 0.712 | 0.220 | 0.644 | 0.732 | | gpt-5-mini (responses) | 0.594 | 0.743 | 0.567 | 0.118 | 0.811 | 0.731 | | Qwen3-VL-30B-A3B | 0.589 | 0.802 | 0.688 | 0.053 | 0.661 | 0.743 | | gpt-4.1 | 0.587 | 0.709 | 0.693 | 0.086 | 0.662 | 0.784 | | Qwen3-VL-32B | 0.585 | 0.732 | 0.646 | 0.054 | 0.724 | 0.770 | | Qwen3-VL-30B-A3B-FP8 | 0.583 | 0.798 | 0.683 | 0.056 | 0.638 | 0.740 | | Qwen2.5-VL-32B | 0.577 | 0.767 | 0.649 | 0.232 | 0.493 | 0.743 | | gpt-5 (responses) | 0.573 | 0.746 | 0.650 | 0.080 | 0.687 | 0.704 | | Qwen2.5-VL-7B | 0.549 | 0.779 | 0.704 | 0.185 | 0.426 | 0.651 | | gpt-4.1-nano | 0.503 | 0.676 | 0.672 | 0.028 | 0.567 | 0.573 | | gpt-5-nano | 0.503 | 0.487 | 0.583 | 0.091 | 0.661 | 0.693 | | Qwen3-VL-2B | 0.439 | 0.592 | 0.613 | 0.029 | 0.356 | 0.605 | | Qwen2.5-VL-3B | 0.402 | 0.613 | 0.654 | 0.045 | 0.203 | 0.494 | *评分范围：0.0 ~ 1.0，得分越高性能越好* **📝 提交你的模型**：若需在私有测试集上进行评估，请联系 [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai) --- ## 💻 使用示例 python from datasets import load_dataset # 加载数据集（若为私有数据集需提供访问令牌） dataset = load_dataset("MTSAIR/MWS-Vision-Bench", token="hf_...") # 示例遍历代码 for item in dataset: print(f"ID: {item['id']}") print(f"任务类型: {item['type']}") print(f"问题: {item['question']}") print(f"图像路径: {item['image_path']}") print(f"有效答案: {item['answers']}") --- ## 📄 许可证 **MIT许可证** © 2024 MTS AI 详情请参见 [LICENSE](https://github.com/MTSAIR/multimodalocr/blob/main/LICENSE.txt)。 --- ## 📚 引用规范若您在研究中使用本数据集，请引用如下文献： bibtex @misc{mwsvisionbench2024, title={MWS-Vision-Bench: 俄语多模态OCR基准测试集}, author={MTS AI 研究团队}, organization={MTSAIR}, year={2025}, url={https://huggingface.co/datasets/MTSAIR/MWS-Vision-Bench}, note={论文即将上线} } --- ## 🤝 联系方式 - **研发团队：** [MTSAIR 研究团队](https://huggingface.co/MTSAIR) - **邮箱：** [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai) --- ## 🇷🇺 俄语摘要 **MWS视觉基准测试（MWS-Vision-Bench）** 是首个面向多模态模型的俄语商业OCR基准测试集。本数据集包含1302个样本与5类任务，覆盖商业文档与手写数据处理的真实应用场景。本数据集旨在用于俄语语境下多模态大语言模型的评估与研发。 📄 *学术论文即将完成（paper coming soon）。* --- **由MTS AI研究团队 ❤️ 制作**

提供机构：

maas

创建时间：

2025-10-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集