five

MWS-Vision-Bench

收藏
魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/MTSAIR/MWS-Vision-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
# MWS-Vision-Bench > 🇷🇺 *Русскоязычное описание ниже / Russian summary below.* **MWS Vision Bench** — the first **Russian-language business-OCR benchmark** designed for multimodal large language models (MLLMs). This is the validation split - publicly available for open evaluation and comparison. 🧩 **Paper is coming soon.** 🔗 **Official repository:** [github.com/mts-ai/MWS-Vision-Bench](https://github.com/mts-ai/MWS-Vision-Bench) 🏢 **Organization:** [MTSAIR on Hugging Face](https://huggingface.co/MTSAIR) 📰 **Article on Habr (in Russian):** [“MWS Vision Bench — the first Russian business-OCR benchmark”](https://habr.com/ru/companies/mts_ai/articles/953292/) --- ## 📊 Dataset Statistics - **Total samples:** 1,302 - **Unique images:** 400 - **Task types:** 5 --- ## 🖼️ Dataset Preview ![Dataset Examples](preview.jpg) *Examples of diverse document types in the benchmark: business documents, handwritten notes, technical drawings, receipts, and more.* --- ## 📁 Repository Structure ``` MWS-Vision-Bench/ ├── metadata.jsonl # Dataset annotations ├── images/ # Image files organized by category │ ├── business/ │ │ ├── scans/ │ │ ├── sheets/ │ │ ├── plans/ │ │ └── diagramms/ │ └── personal/ │ ├── hand_documents/ │ ├── hand_notebooks/ │ └── hand_misc/ └── README.md # This file ``` --- ## 📋 Data Format Each line in `metadata.jsonl` contains one JSON object: ```python { "file_name": "images/image_0.jpg", # Path to the image "id": "1", # Unique identifier "type": "text grounding ru", # Task type "dataset_name": "business", # Subdataset name "question": "...", # Question in Russian "answers": ["398", "65", ...] # List of valid answers (as strings) } ``` --- ## 🎯 Task Types | Task | Description | Count | |------|--------------|-------| | `document parsing ru` | Parsing structured documents | 243 | | `full-page OCR ru` | End-to-end OCR on full pages | 144 | | `key information extraction ru` | Extracting key fields | 119 | | `reasoning VQA ru` | Visual reasoning in Russian | 400 | | `text grounding ru` | Text–region alignment | 396 | --- ## 📊 Leaderboard (Validation Set) Top models evaluated on this validation dataset: | Model | Overall | img→text | img→markdown | Grounding | KIE (JSON) | VQA | |-------|---------|----------|--------------|-----------|------------|-----| | **Gemini-2.5-pro** | **0.682** | 0.836 | 0.745 | 0.084 | 0.891 | 0.853 | | **Gemini-2.5-flash** | **0.644** | 0.796 | 0.683 | 0.067 | 0.841 | 0.833 | | **gpt-4.1-mini** | **0.643** | 0.866 | 0.724 | 0.091 | 0.750 | 0.782 | | **Claude-4.5-Sonnet** | **0.639** | 0.723 | 0.676 | 0.377 | 0.728 | 0.692 | | **gpt-5-mini** | **0.632** | 0.797 | 0.678 | 0.126 | 0.784 | 0.776 | | Qwen2.5-VL-72B | 0.631 | 0.848 | 0.712 | 0.220 | 0.644 | 0.732 | | gpt-5-mini (responses) | 0.594 | 0.743 | 0.567 | 0.118 | 0.811 | 0.731 | | Qwen3-VL-30B-A3B | 0.589 | 0.802 | 0.688 | 0.053 | 0.661 | 0.743 | | gpt-4.1 | 0.587 | 0.709 | 0.693 | 0.086 | 0.662 | 0.784 | | Qwen3-VL-32B | 0.585 | 0.732 | 0.646 | 0.054 | 0.724 | 0.770 | | Qwen3-VL-30B-A3B-FP8 | 0.583 | 0.798 | 0.683 | 0.056 | 0.638 | 0.740 | | Qwen2.5-VL-32B | 0.577 | 0.767 | 0.649 | 0.232 | 0.493 | 0.743 | | gpt-5 (responses) | 0.573 | 0.746 | 0.650 | 0.080 | 0.687 | 0.704 | | Qwen2.5-VL-7B | 0.549 | 0.779 | 0.704 | 0.185 | 0.426 | 0.651 | | gpt-4.1-nano | 0.503 | 0.676 | 0.672 | 0.028 | 0.567 | 0.573 | | gpt-5-nano | 0.503 | 0.487 | 0.583 | 0.091 | 0.661 | 0.693 | | Qwen3-VL-2B | 0.439 | 0.592 | 0.613 | 0.029 | 0.356 | 0.605 | | Qwen2.5-VL-3B | 0.402 | 0.613 | 0.654 | 0.045 | 0.203 | 0.494 | *Scale: 0.0 - 1.0 (higher is better)* **📝 Submit your model**: To evaluate on the private test set, contact [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai) --- ## 💻 Usage Example ```python from datasets import load_dataset # Load dataset (authorization required if private) dataset = load_dataset("MTSAIR/MWS-Vision-Bench", token="hf_...") # Example iteration for item in dataset: print(f"ID: {item['id']}") print(f"Type: {item['type']}") print(f"Question: {item['question']}") print(f"Image: {item['image_path']}") print(f"Answers: {item['answers']}") ``` --- ## 📄 License **MIT License** © 2024 MTS AI See [LICENSE](https://github.com/MTSAIR/multimodalocr/blob/main/LICENSE.txt) for details. --- ## 📚 Citation If you use this dataset in your research, please cite: ```bibtex @misc{mwsvisionbench2024, title={MWS-Vision-Bench: Russian Multimodal OCR Benchmark}, author={MTS AI Research}, organization={MTSAIR}, year={2025}, url={https://huggingface.co/datasets/MTSAIR/MWS-Vision-Bench}, note={Paper coming soon} } ``` --- ## 🤝 Contacts - **Team:** [MTSAIR Research](https://huggingface.co/MTSAIR) - **Email:** [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai) --- ## 🇷🇺 Краткое описание **MWS Vision Bench** — первый русскоязычный бенчмарк для бизнес-OCR в эпоху мультимодальных моделей. Он включает 1302 примера и 5 типов задач, отражающих реальные сценарии обработки бизнес-документов и рукописных данных. Датасет создан для оценки и развития мультимодальных LLM в русскоязычном контексте. 📄 *Научная статья в процессе подготовки (paper coming soon).* --- **Made with ❤️ by MTS AI Research Team**

# MWS视觉基准测试(MWS-Vision-Bench) > 🇷🇺 *下方为俄语摘要 / Russian summary below.* **MWS视觉基准测试(MWS-Vision-Bench)** 是首个专为多模态大语言模型(multimodal large language models)设计的**俄语商业光学字符识别(business-OCR)基准测试集**。 本数据集为验证拆分集,可公开获取用于开放评估与对比。 🧩 **论文即将上线。** 🔗 **官方仓库:** [github.com/mts-ai/MWS-Vision-Bench](https://github.com/mts-ai/MWS-Vision-Bench) 🏢 **开发机构:** [Hugging Face 上的 MTSAIR](https://huggingface.co/MTSAIR) 📰 **Habr平台俄语文章:** ["MWS Vision Bench — 首个俄语商业OCR基准测试集"](https://habr.com/ru/companies/mts_ai/articles/953292/) --- ## 📊 数据集统计信息 - **总样本数:** 1,302 - **唯一图像数:** 400 - **任务类型数:** 5 --- ## 🖼️ 数据集预览 ![Dataset Examples](preview.jpg) *本基准测试集中包含多种类型文档示例:商业文档、手写笔记、技术图纸、收据及更多。* --- ## 📁 仓库目录结构 MWS-Vision-Bench/ ├── metadata.jsonl # 数据集标注文件 ├── images/ # 按类别组织的图像文件 │ ├── business/ │ │ ├── scans/ # 扫描件 │ │ ├── sheets/ # 表单文档 │ │ ├── plans/ # 方案文档 │ │ └── diagramms/ # 图表 │ └── personal/ │ ├── hand_documents/ # 手写文档 │ ├── hand_notebooks/ # 手写笔记本 │ └── hand_misc/ # 其他手写材料 └── README.md # 本说明文件 --- ## 📋 数据格式 `metadata.jsonl` 的每一行均包含一个JSON对象: python { "file_name": "images/image_0.jpg", # 图像文件路径 "id": "1", # 唯一标识符 "type": "text grounding ru", # 任务类型 "dataset_name": "business", # 子数据集名称 "question": "...", # 俄语问题 "answers": ["398", "65", ...] # 有效答案列表(字符串格式) } --- ## 🎯 任务类型 | 任务 | 任务描述 | 样本数量 | |------|--------------|-------| | `document parsing ru` | 结构化文档解析 | 243 | | `full-page OCR ru` | 全页面端到端光学字符识别(full-page OCR) | 144 | | `key information extraction ru` | 关键字段信息抽取 | 119 | | `reasoning VQA ru` | 俄语视觉推理 | 400 | | `text grounding ru` | 文本锚定(text grounding) | 396 | --- ## 📊 验证集排行榜 已在本验证数据集上完成评估的顶尖模型: | 模型 | 综合得分 | 图像→文本 | 图像→Markdown | 文本锚定 | 关键信息抽取(JSON) | 视觉问答 | |-------|---------|----------|--------------|-----------|------------|-----| | **Gemini-2.5-pro** | **0.682** | 0.836 | 0.745 | 0.084 | 0.891 | 0.853 | | **Gemini-2.5-flash** | **0.644** | 0.796 | 0.683 | 0.067 | 0.841 | 0.833 | | **gpt-4.1-mini** | **0.643** | 0.866 | 0.724 | 0.091 | 0.750 | 0.782 | | **Claude-4.5-Sonnet** | **0.639** | 0.723 | 0.676 | 0.377 | 0.728 | 0.692 | | **gpt-5-mini** | **0.632** | 0.797 | 0.678 | 0.126 | 0.784 | 0.776 | | Qwen2.5-VL-72B | 0.631 | 0.848 | 0.712 | 0.220 | 0.644 | 0.732 | | gpt-5-mini (responses) | 0.594 | 0.743 | 0.567 | 0.118 | 0.811 | 0.731 | | Qwen3-VL-30B-A3B | 0.589 | 0.802 | 0.688 | 0.053 | 0.661 | 0.743 | | gpt-4.1 | 0.587 | 0.709 | 0.693 | 0.086 | 0.662 | 0.784 | | Qwen3-VL-32B | 0.585 | 0.732 | 0.646 | 0.054 | 0.724 | 0.770 | | Qwen3-VL-30B-A3B-FP8 | 0.583 | 0.798 | 0.683 | 0.056 | 0.638 | 0.740 | | Qwen2.5-VL-32B | 0.577 | 0.767 | 0.649 | 0.232 | 0.493 | 0.743 | | gpt-5 (responses) | 0.573 | 0.746 | 0.650 | 0.080 | 0.687 | 0.704 | | Qwen2.5-VL-7B | 0.549 | 0.779 | 0.704 | 0.185 | 0.426 | 0.651 | | gpt-4.1-nano | 0.503 | 0.676 | 0.672 | 0.028 | 0.567 | 0.573 | | gpt-5-nano | 0.503 | 0.487 | 0.583 | 0.091 | 0.661 | 0.693 | | Qwen3-VL-2B | 0.439 | 0.592 | 0.613 | 0.029 | 0.356 | 0.605 | | Qwen2.5-VL-3B | 0.402 | 0.613 | 0.654 | 0.045 | 0.203 | 0.494 | *评分范围:0.0 ~ 1.0,得分越高性能越好* **📝 提交你的模型**:若需在私有测试集上进行评估,请联系 [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai) --- ## 💻 使用示例 python from datasets import load_dataset # 加载数据集(若为私有数据集需提供访问令牌) dataset = load_dataset("MTSAIR/MWS-Vision-Bench", token="hf_...") # 示例遍历代码 for item in dataset: print(f"ID: {item['id']}") print(f"任务类型: {item['type']}") print(f"问题: {item['question']}") print(f"图像路径: {item['image_path']}") print(f"有效答案: {item['answers']}") --- ## 📄 许可证 **MIT许可证** © 2024 MTS AI 详情请参见 [LICENSE](https://github.com/MTSAIR/multimodalocr/blob/main/LICENSE.txt)。 --- ## 📚 引用规范 若您在研究中使用本数据集,请引用如下文献: bibtex @misc{mwsvisionbench2024, title={MWS-Vision-Bench: 俄语多模态OCR基准测试集}, author={MTS AI 研究团队}, organization={MTSAIR}, year={2025}, url={https://huggingface.co/datasets/MTSAIR/MWS-Vision-Bench}, note={论文即将上线} } --- ## 🤝 联系方式 - **研发团队:** [MTSAIR 研究团队](https://huggingface.co/MTSAIR) - **邮箱:** [g.gaikov@mts.ai](mailto:g.gaikov@mts.ai) --- ## 🇷🇺 俄语摘要 **MWS视觉基准测试(MWS-Vision-Bench)** 是首个面向多模态模型的俄语商业OCR基准测试集。 本数据集包含1302个样本与5类任务,覆盖商业文档与手写数据处理的真实应用场景。 本数据集旨在用于俄语语境下多模态大语言模型的评估与研发。 📄 *学术论文即将完成(paper coming soon)。* --- **由MTS AI研究团队 ❤️ 制作**
提供机构:
maas
创建时间:
2025-10-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作