MWS-Vision-Bench
收藏魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/MTSAIR/MWS-Vision-Bench
下载链接
链接失效反馈官方服务:
资源简介:
# MWS-Vision-Bench
> 🇷🇺 *Русскоязычное описание ниже / Russian summary below.*
**MWS Vision Bench** — the first **Russian-language business-OCR benchmark** designed for multimodal large language models (MLLMs).
This is the validation split - publicly available for open evaluation and comparison.
🧩 **Paper is coming soon.**
🔗 **Official repository:** [github.com/mts-ai/MWS-Vision-Bench](https://github.com/mts-ai/MWS-Vision-Bench)
🏢 **Organization:** [MTSAIR on Hugging Face](https://huggingface.co/MTSAIR)
📰 **Article on Habr (in Russian):** [“MWS Vision Bench — the first Russian business-OCR benchmark”](https://habr.com/ru/companies/mts_ai/articles/953292/)
---
## 📊 Dataset Statistics
- **Total samples:** 1,302
- **Unique images:** 400
- **Task types:** 5
---
## 🖼️ Dataset Preview

*Examples of diverse document types in the benchmark: business documents, handwritten notes, technical drawings, receipts, and more.*
---
## 📁 Repository Structure
```
MWS-Vision-Bench/
├── metadata.jsonl # Dataset annotations
├── images/ # Image files organized by category
│ ├── business/
│ │ ├── scans/
│ │ ├── sheets/
│ │ ├── plans/
│ │ └── diagramms/
│ └── personal/
│ ├── hand_documents/
│ ├── hand_notebooks/
│ └── hand_misc/
└── README.md # This file
```
---
## 📋 Data Format
Each line in `metadata.jsonl` contains one JSON object:
```python
{
"file_name": "images/image_0.jpg", # Path to the image
"id": "1", # Unique identifier
"type": "text grounding ru", # Task type
"dataset_name": "business", # Subdataset name
"question": "...", # Question in Russian
"answers": ["398", "65", ...] # List of valid answers (as strings)
}
```
---
## 🎯 Task Types
| Task | Description | Count |
|------|--------------|-------|
| `document parsing ru` | Parsing structured documents | 243 |
| `full-page OCR ru` | End-to-end OCR on full pages | 144 |
| `key information extraction ru` | Extracting key fields | 119 |
| `reasoning VQA ru` | Visual reasoning in Russian | 400 |
| `text grounding ru` | Text–region alignment | 396 |
---
## 📊 Leaderboard (Validation Set)
Top models evaluated on this validation dataset:
| Model | Overall | img→text | img→markdown | Grounding | KIE (JSON) | VQA |
|-------|---------|----------|--------------|-----------|------------|-----|
| **Gemini-2.5-pro** | **0.682** | 0.836 | 0.745 | 0.084 | 0.891 | 0.853 |
| **Gemini-2.5-flash** | **0.644** | 0.796 | 0.683 | 0.067 | 0.841 | 0.833 |
| **gpt-4.1-mini** | **0.643** | 0.866 | 0.724 | 0.091 | 0.750 | 0.782 |
| **Claude-4.5-Sonnet** | **0.639** | 0.723 | 0.676 | 0.377 | 0.728 | 0.692 |
| **gpt-5-mini** | **0.632** | 0.797 | 0.678 | 0.126 | 0.784 | 0.776 |
| Qwen2.5-VL-72B | 0.631 | 0.848 | 0.712 | 0.220 | 0.644 | 0.732 |
| gpt-5-mini (responses) | 0.594 | 0.743 | 0.567 | 0.118 | 0.811 | 0.731 |
| Qwen3-VL-30B-A3B | 0.589 | 0.802 | 0.688 | 0.053 | 0.661 | 0.743 |
| gpt-4.1 | 0.587 | 0.709 | 0.693 | 0.086 | 0.662 | 0.784 |
| Qwen3-VL-32B | 0.585 | 0.732 | 0.646 | 0.054 | 0.724 | 0.770 |
| Qwen3-VL-30B-A3B-FP8 | 0.583 | 0.798 | 0.683 | 0.056 | 0.638 | 0.740 |
| Qwen2.5-VL-32B | 0.577 | 0.767 | 0.649 | 0.232 | 0.493 | 0.743 |
| gpt-5 (responses) | 0.573 | 0.746 | 0.650 | 0.080 | 0.687 | 0.704 |
| Qwen2.5-VL-7B | 0.549 | 0.779 | 0.704 | 0.185 | 0.426 | 0.651 |
| gpt-4.1-nano | 0.503 | 0.676 | 0.672 | 0.028 | 0.567 | 0.573 |
| gpt-5-nano | 0.503 | 0.487 | 0.583 | 0.091 | 0.661 | 0.693 |
| Qwen3-VL-2B | 0.439 | 0.592 | 0.613 | 0.029 | 0.356 | 0.605 |
| Qwen2.5-VL-3B | 0.402 | 0.613 | 0.654 | 0.045 | 0.203 | 0.494 |
*Scale: 0.0 - 1.0 (higher is better)*
**📝 Submit your model**: To evaluate on the private test set, contact [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai)
---
## 💻 Usage Example
```python
from datasets import load_dataset
# Load dataset (authorization required if private)
dataset = load_dataset("MTSAIR/MWS-Vision-Bench", token="hf_...")
# Example iteration
for item in dataset:
print(f"ID: {item['id']}")
print(f"Type: {item['type']}")
print(f"Question: {item['question']}")
print(f"Image: {item['image_path']}")
print(f"Answers: {item['answers']}")
```
---
## 📄 License
**MIT License**
© 2024 MTS AI
See [LICENSE](https://github.com/MTSAIR/multimodalocr/blob/main/LICENSE.txt) for details.
---
## 📚 Citation
If you use this dataset in your research, please cite:
```bibtex
@misc{mwsvisionbench2024,
title={MWS-Vision-Bench: Russian Multimodal OCR Benchmark},
author={MTS AI Research},
organization={MTSAIR},
year={2025},
url={https://huggingface.co/datasets/MTSAIR/MWS-Vision-Bench},
note={Paper coming soon}
}
```
---
## 🤝 Contacts
- **Team:** [MTSAIR Research](https://huggingface.co/MTSAIR)
- **Email:** [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai)
---
## 🇷🇺 Краткое описание
**MWS Vision Bench** — первый русскоязычный бенчмарк для бизнес-OCR в эпоху мультимодальных моделей.
Он включает 1302 примера и 5 типов задач, отражающих реальные сценарии обработки бизнес-документов и рукописных данных.
Датасет создан для оценки и развития мультимодальных LLM в русскоязычном контексте.
📄 *Научная статья в процессе подготовки (paper coming soon).*
---
**Made with ❤️ by MTS AI Research Team**
# MWS视觉基准测试(MWS-Vision-Bench)
> 🇷🇺 *下方为俄语摘要 / Russian summary below.*
**MWS视觉基准测试(MWS-Vision-Bench)** 是首个专为多模态大语言模型(multimodal large language models)设计的**俄语商业光学字符识别(business-OCR)基准测试集**。
本数据集为验证拆分集,可公开获取用于开放评估与对比。
🧩 **论文即将上线。**
🔗 **官方仓库:** [github.com/mts-ai/MWS-Vision-Bench](https://github.com/mts-ai/MWS-Vision-Bench)
🏢 **开发机构:** [Hugging Face 上的 MTSAIR](https://huggingface.co/MTSAIR)
📰 **Habr平台俄语文章:** ["MWS Vision Bench — 首个俄语商业OCR基准测试集"](https://habr.com/ru/companies/mts_ai/articles/953292/)
---
## 📊 数据集统计信息
- **总样本数:** 1,302
- **唯一图像数:** 400
- **任务类型数:** 5
---
## 🖼️ 数据集预览

*本基准测试集中包含多种类型文档示例:商业文档、手写笔记、技术图纸、收据及更多。*
---
## 📁 仓库目录结构
MWS-Vision-Bench/
├── metadata.jsonl # 数据集标注文件
├── images/ # 按类别组织的图像文件
│ ├── business/
│ │ ├── scans/ # 扫描件
│ │ ├── sheets/ # 表单文档
│ │ ├── plans/ # 方案文档
│ │ └── diagramms/ # 图表
│ └── personal/
│ ├── hand_documents/ # 手写文档
│ ├── hand_notebooks/ # 手写笔记本
│ └── hand_misc/ # 其他手写材料
└── README.md # 本说明文件
---
## 📋 数据格式
`metadata.jsonl` 的每一行均包含一个JSON对象:
python
{
"file_name": "images/image_0.jpg", # 图像文件路径
"id": "1", # 唯一标识符
"type": "text grounding ru", # 任务类型
"dataset_name": "business", # 子数据集名称
"question": "...", # 俄语问题
"answers": ["398", "65", ...] # 有效答案列表(字符串格式)
}
---
## 🎯 任务类型
| 任务 | 任务描述 | 样本数量 |
|------|--------------|-------|
| `document parsing ru` | 结构化文档解析 | 243 |
| `full-page OCR ru` | 全页面端到端光学字符识别(full-page OCR) | 144 |
| `key information extraction ru` | 关键字段信息抽取 | 119 |
| `reasoning VQA ru` | 俄语视觉推理 | 400 |
| `text grounding ru` | 文本锚定(text grounding) | 396 |
---
## 📊 验证集排行榜
已在本验证数据集上完成评估的顶尖模型:
| 模型 | 综合得分 | 图像→文本 | 图像→Markdown | 文本锚定 | 关键信息抽取(JSON) | 视觉问答 |
|-------|---------|----------|--------------|-----------|------------|-----|
| **Gemini-2.5-pro** | **0.682** | 0.836 | 0.745 | 0.084 | 0.891 | 0.853 |
| **Gemini-2.5-flash** | **0.644** | 0.796 | 0.683 | 0.067 | 0.841 | 0.833 |
| **gpt-4.1-mini** | **0.643** | 0.866 | 0.724 | 0.091 | 0.750 | 0.782 |
| **Claude-4.5-Sonnet** | **0.639** | 0.723 | 0.676 | 0.377 | 0.728 | 0.692 |
| **gpt-5-mini** | **0.632** | 0.797 | 0.678 | 0.126 | 0.784 | 0.776 |
| Qwen2.5-VL-72B | 0.631 | 0.848 | 0.712 | 0.220 | 0.644 | 0.732 |
| gpt-5-mini (responses) | 0.594 | 0.743 | 0.567 | 0.118 | 0.811 | 0.731 |
| Qwen3-VL-30B-A3B | 0.589 | 0.802 | 0.688 | 0.053 | 0.661 | 0.743 |
| gpt-4.1 | 0.587 | 0.709 | 0.693 | 0.086 | 0.662 | 0.784 |
| Qwen3-VL-32B | 0.585 | 0.732 | 0.646 | 0.054 | 0.724 | 0.770 |
| Qwen3-VL-30B-A3B-FP8 | 0.583 | 0.798 | 0.683 | 0.056 | 0.638 | 0.740 |
| Qwen2.5-VL-32B | 0.577 | 0.767 | 0.649 | 0.232 | 0.493 | 0.743 |
| gpt-5 (responses) | 0.573 | 0.746 | 0.650 | 0.080 | 0.687 | 0.704 |
| Qwen2.5-VL-7B | 0.549 | 0.779 | 0.704 | 0.185 | 0.426 | 0.651 |
| gpt-4.1-nano | 0.503 | 0.676 | 0.672 | 0.028 | 0.567 | 0.573 |
| gpt-5-nano | 0.503 | 0.487 | 0.583 | 0.091 | 0.661 | 0.693 |
| Qwen3-VL-2B | 0.439 | 0.592 | 0.613 | 0.029 | 0.356 | 0.605 |
| Qwen2.5-VL-3B | 0.402 | 0.613 | 0.654 | 0.045 | 0.203 | 0.494 |
*评分范围:0.0 ~ 1.0,得分越高性能越好*
**📝 提交你的模型**:若需在私有测试集上进行评估,请联系 [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai)
---
## 💻 使用示例
python
from datasets import load_dataset
# 加载数据集(若为私有数据集需提供访问令牌)
dataset = load_dataset("MTSAIR/MWS-Vision-Bench", token="hf_...")
# 示例遍历代码
for item in dataset:
print(f"ID: {item['id']}")
print(f"任务类型: {item['type']}")
print(f"问题: {item['question']}")
print(f"图像路径: {item['image_path']}")
print(f"有效答案: {item['answers']}")
---
## 📄 许可证
**MIT许可证**
© 2024 MTS AI
详情请参见 [LICENSE](https://github.com/MTSAIR/multimodalocr/blob/main/LICENSE.txt)。
---
## 📚 引用规范
若您在研究中使用本数据集,请引用如下文献:
bibtex
@misc{mwsvisionbench2024,
title={MWS-Vision-Bench: 俄语多模态OCR基准测试集},
author={MTS AI 研究团队},
organization={MTSAIR},
year={2025},
url={https://huggingface.co/datasets/MTSAIR/MWS-Vision-Bench},
note={论文即将上线}
}
---
## 🤝 联系方式
- **研发团队:** [MTSAIR 研究团队](https://huggingface.co/MTSAIR)
- **邮箱:** [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai)
---
## 🇷🇺 俄语摘要
**MWS视觉基准测试(MWS-Vision-Bench)** 是首个面向多模态模型的俄语商业OCR基准测试集。
本数据集包含1302个样本与5类任务,覆盖商业文档与手写数据处理的真实应用场景。
本数据集旨在用于俄语语境下多模态大语言模型的评估与研发。
📄 *学术论文即将完成(paper coming soon)。*
---
**由MTS AI研究团队 ❤️ 制作**
提供机构:
maas
创建时间:
2025-10-13



