five

AI4Manufacturing/forge

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AI4Manufacturing/forge
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: grounding_cross_coord_to_coord features: - name: ref_image dtype: image - name: test_image dtype: image - name: ref_hint dtype: string - name: ref_hint_coord dtype: string - name: test_choices dtype: string - name: test_mcq_options dtype: string - name: gt_answer dtype: string - name: task dtype: string - name: folder dtype: string splits: - name: train num_bytes: 829716650 num_examples: 513 download_size: 634842012 dataset_size: 829716650 - config_name: grounding_cross_letter_to_letter features: - name: ref_image dtype: image - name: test_image dtype: image - name: ref_hint dtype: string - name: ref_hint_coord dtype: string - name: test_choices dtype: string - name: test_mcq_options dtype: string - name: gt_answer dtype: string - name: task dtype: string - name: folder dtype: string splits: - name: train num_bytes: 829529694 num_examples: 513 download_size: 634800583 dataset_size: 829529694 - config_name: grounding_task_a_icl_outside features: - name: test_image dtype: image - name: target_coord dtype: string - name: target_letter dtype: string - name: choices dtype: string - name: gt_choice_letter dtype: string - name: task dtype: string - name: scenario dtype: string - name: n_icl_examples dtype: int32 - name: icl_metadata dtype: string - name: icl_image_0 dtype: image - name: icl_image_1 dtype: image - name: icl_image_2 dtype: image splits: - name: train num_bytes: 1241840521 num_examples: 500 download_size: 1182653732 dataset_size: 1241840521 - config_name: grounding_task_a_icl_within features: - name: test_image dtype: image - name: target_coord dtype: string - name: target_letter dtype: string - name: choices dtype: string - name: gt_choice_letter dtype: string - name: task dtype: string - name: scenario dtype: string - name: n_icl_examples dtype: int32 - name: icl_metadata dtype: string - name: icl_image_0 dtype: image - name: icl_image_1 dtype: image - name: icl_image_2 dtype: image splits: - name: train num_bytes: 1242503348 num_examples: 500 download_size: 1178297566 dataset_size: 1242503348 - config_name: grounding_task_a_zero_shot features: - name: test_image dtype: image - name: target_coord dtype: string - name: target_letter dtype: string - name: choices dtype: string - name: gt_choice_letter dtype: string - name: task dtype: string - name: scenario dtype: string - name: n_icl_examples dtype: int32 - name: icl_metadata dtype: string - name: icl_image_0 dtype: image - name: icl_image_1 dtype: image - name: icl_image_2 dtype: image splits: - name: train num_bytes: 414286389 num_examples: 500 download_size: 392568896 dataset_size: 414286389 - config_name: grounding_task_b_icl_outside features: - name: test_image dtype: image - name: target_coord dtype: string - name: target_letter dtype: string - name: choices dtype: string - name: gt_choice_letter dtype: string - name: task dtype: string - name: scenario dtype: string - name: n_icl_examples dtype: int32 - name: icl_metadata dtype: string - name: icl_image_0 dtype: image - name: icl_image_1 dtype: image - name: icl_image_2 dtype: image splits: - name: train num_bytes: 1244099842 num_examples: 500 download_size: 1190169899 dataset_size: 1244099842 - config_name: grounding_task_b_icl_within features: - name: test_image dtype: image - name: target_coord dtype: string - name: target_letter dtype: string - name: choices dtype: string - name: gt_choice_letter dtype: string - name: task dtype: string - name: scenario dtype: string - name: n_icl_examples dtype: int32 - name: icl_metadata dtype: string - name: icl_image_0 dtype: image - name: icl_image_1 dtype: image - name: icl_image_2 dtype: image splits: - name: train num_bytes: 1242524691 num_examples: 500 download_size: 1178307472 dataset_size: 1242524691 - config_name: grounding_task_b_zero_shot features: - name: test_image dtype: image - name: target_coord dtype: string - name: target_letter dtype: string - name: choices dtype: string - name: gt_choice_letter dtype: string - name: task dtype: string - name: scenario dtype: string - name: n_icl_examples dtype: int32 - name: icl_metadata dtype: string - name: icl_image_0 dtype: image - name: icl_image_1 dtype: image - name: icl_image_2 dtype: image splits: - name: train num_bytes: 414307752 num_examples: 500 download_size: 392578566 dataset_size: 414307752 - config_name: task1_image features: - name: test_image dtype: image - name: grounding_image dtype: image - name: gt_image dtype: image - name: assembly_name dtype: string - name: assembly_description dtype: string - name: error_case dtype: string - name: normal_case_base dtype: string - name: normal_ref_images sequence: image - name: icl_ori_images sequence: image - name: icl_grounding_images sequence: image splits: - name: train num_bytes: 2798691043 num_examples: 451 download_size: 503903932 dataset_size: 2798691043 - config_name: task1_three_view features: - name: test_image dtype: image - name: gt_parts dtype: string - name: query_description dtype: string - name: scenario_name dtype: string - name: error_case dtype: string - name: normal_ref_images sequence: image - name: icl_images sequence: image - name: icl_gt_letters sequence: string splits: - name: train num_bytes: 284294342 num_examples: 496 download_size: 102854693 dataset_size: 284294342 - config_name: task2_three_view features: - name: test_image dtype: image - name: defect_type dtype: string - name: is_normal dtype: bool - name: component_type dtype: string - name: component_description dtype: string - name: normal_ref_images sequence: image - name: icl_images sequence: image - name: icl_defect_types sequence: string - name: icl_is_normal dtype: string splits: - name: train num_bytes: 3892767363 num_examples: 830 download_size: 2404052070 dataset_size: 3892767363 - config_name: task3_image features: - name: test_image dtype: image - name: grounding_image dtype: image - name: assembly_name dtype: string - name: assembly_description dtype: string - name: error_case dtype: string - name: normal_case_base dtype: string - name: n_normal_refs dtype: int32 - name: n_icl_examples dtype: int32 - name: ref_image_0 dtype: image - name: ref_image_1 dtype: image - name: ref_image_2 dtype: image - name: ref_image_3 dtype: image - name: ref_image_4 dtype: image - name: icl_ori_image_0 dtype: image - name: icl_grounding_image_0 dtype: image - name: icl_ori_image_1 dtype: image - name: icl_grounding_image_1 dtype: image - name: icl_ori_image_2 dtype: image - name: icl_grounding_image_2 dtype: image splits: - name: train num_bytes: 5289841601 num_examples: 857 download_size: 900918270 dataset_size: 5289841601 - config_name: task3_missing_part_image features: - name: test_image dtype: image - name: assembly_name dtype: string - name: assembly_description dtype: string - name: choices_text dtype: string - name: gt_letter dtype: string - name: gt_answer dtype: string - name: mcq_mapping dtype: string - name: error_case dtype: string - name: scenario dtype: string - name: n_normal_refs dtype: int32 - name: n_icl_examples dtype: int32 - name: icl_gt_letters dtype: string - name: ref_image_0 dtype: image - name: ref_image_1 dtype: image - name: ref_image_2 dtype: image - name: ref_image_3 dtype: image - name: ref_image_4 dtype: image - name: icl_image_0 dtype: image - name: icl_image_1 dtype: image - name: icl_image_2 dtype: image splits: - name: train num_bytes: 1145957667 num_examples: 240 download_size: 739016032 dataset_size: 1145957667 - config_name: task3_missing_part_three_view features: - name: test_image dtype: image - name: assembly_name dtype: string - name: assembly_description dtype: string - name: choices_text dtype: string - name: gt_letter dtype: string - name: gt_answer dtype: string - name: mcq_mapping dtype: string - name: error_case dtype: string - name: scenario dtype: string - name: n_normal_refs dtype: int32 - name: n_icl_examples dtype: int32 - name: icl_gt_letters dtype: string - name: ref_image_0 dtype: image - name: ref_image_1 dtype: image - name: ref_image_2 dtype: image - name: ref_image_3 dtype: image - name: ref_image_4 dtype: image - name: icl_image_0 dtype: image - name: icl_image_1 dtype: image - name: icl_image_2 dtype: image splits: - name: train num_bytes: 74622484 num_examples: 137 download_size: 38017522 dataset_size: 74622484 - config_name: task3_three_view features: - name: test_image dtype: image - name: gt_parts dtype: string - name: query_description dtype: string - name: scenario_name dtype: string - name: error_case dtype: string - name: n_normal_refs dtype: int32 - name: n_icl_examples dtype: int32 - name: icl_gt_letters dtype: string - name: ref_image_0 dtype: image - name: ref_image_1 dtype: image - name: ref_image_2 dtype: image - name: ref_image_3 dtype: image - name: ref_image_4 dtype: image - name: icl_image_0 dtype: image - name: icl_image_1 dtype: image - name: icl_image_2 dtype: image splits: - name: train num_bytes: 198269281 num_examples: 309 download_size: 76367968 dataset_size: 198269281 configs: - config_name: grounding_cross_coord_to_coord data_files: - split: train path: grounding_cross_coord_to_coord/train-* - config_name: grounding_cross_letter_to_letter data_files: - split: train path: grounding_cross_letter_to_letter/train-* - config_name: grounding_task_a_icl_outside data_files: - split: train path: grounding_task_a_icl_outside/train-* - config_name: grounding_task_a_icl_within data_files: - split: train path: grounding_task_a_icl_within/train-* - config_name: grounding_task_a_zero_shot data_files: - split: train path: grounding_task_a_zero_shot/train-* - config_name: grounding_task_b_icl_outside data_files: - split: train path: grounding_task_b_icl_outside/train-* - config_name: grounding_task_b_icl_within data_files: - split: train path: grounding_task_b_icl_within/train-* - config_name: grounding_task_b_zero_shot data_files: - split: train path: grounding_task_b_zero_shot/train-* - config_name: task1_image data_files: - split: train path: task1_image/train-* - config_name: task1_three_view data_files: - split: train path: task1_three_view/train-* - config_name: task2_three_view data_files: - split: train path: task2_three_view/train-* - config_name: task3_image data_files: - split: train path: task3_image/train-* - config_name: task3_missing_part_image data_files: - split: train path: task3_missing_part_image/train-* - config_name: task3_missing_part_three_view data_files: - split: train path: task3_missing_part_three_view/train-* - config_name: task3_three_view data_files: - split: train path: task3_three_view/train-* license: mit task_categories: - question-answering - image-text-to-text - visual-question-answering language: - en tags: - Manufacturing - 3D - Industry - Engineering pretty_name: Forge size_categories: - 1K<n<10K --- <div align="center"> <h1> <!-- <img src="forge_icon.png" alt="FORGE Logo" height="20" style="vertical-align:middle; margin-right:10px;" /> --> FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios </h1> </div> <p align="center"> &nbsp;&nbsp;🌐&nbsp;<a href="https://ai4manufacturing.github.io/forge-web/">Website</a>&nbsp;&nbsp; | &nbsp;&nbsp;📑&nbsp;<a href="https://arxiv.org/abs/2604.07413">Paper</a>&nbsp;&nbsp; | &nbsp;&nbsp;💻&nbsp;<a href="https://github.com/AI4Manufacturing/FORGE">Code</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤗&nbsp;<a href="https://huggingface.co/datasets/AI4Manufacturing/forge">Dataset</a>&nbsp;&nbsp; </p> <div align="center"> <img src="pipeline.png" width="100%" alt="FORGE Pipeline Overview"> </div> ## Quick Start ```python from datasets import load_dataset ds = load_dataset("AI4Manufacturing/forge", "task1_three_view", split="train") print(ds[0].keys()) ds[0]["test_image"] # PIL Image ``` ## Configs ### Core Tasks | Config | Cases | Task | Modality | |--------|------:|------|----------| | `task1_image` | 451 | Wrong model detection (MCQ) | Photo | | `task1_three_view` | 496 | Wrong model detection (letter) | Three-View | | `task2_three_view` | 830 | Anomaly classification (normal + defect type) | Three-View | | `task3_image` | 857 | Extra/wrong part detection (MCQ) | Photo | | `task3_three_view` | 309 | Extra/wrong part detection (letter) | Three-View | | `task3_missing_part_image` | 240 | Missing part identification (MCQ) | Photo | | `task3_missing_part_three_view` | 137 | Missing part identification (MCQ) | Three-View | ### Grounding Ablation (Single-Image) | Config | Cases | Description | |--------|------:|-------------| | `grounding_task_a_zero_shot` | 500 | Coord → Letter, zero-shot | | `grounding_task_a_icl_within` | 500 | Coord → Letter, ICL (same image) | | `grounding_task_a_icl_outside` | 500 | Coord → Letter, ICL (cross image) | | `grounding_task_b_zero_shot` | 500 | Letter → Coord, zero-shot | | `grounding_task_b_icl_within` | 500 | Letter → Coord, ICL (same image) | | `grounding_task_b_icl_outside` | 500 | Letter → Coord, ICL (cross image) | ### Grounding Ablation (Cross-Image) | Config | Cases | Description | |--------|------:|-------------| | `grounding_cross_letter_to_letter` | 513 | Match parts by letter across images | | `grounding_cross_coord_to_coord` | 513 | Match parts by coordinate across images | **Total: 6,846 cases across 15 configs** ## Data Fields Each row is self-contained with all images embedded. Unused image slots hold a 1x1 placeholder. Use `n_normal_refs` / `n_icl_examples` to know how many are real. **Task 1/3 Image** -- `test_image`, `grounding_image`, `assembly_name`, `assembly_description`, `error_case`, `ref_image_0..4`, `icl_ori_image_0..2`, `icl_grounding_image_0..2`, `n_normal_refs`, `n_icl_examples` **Task 1/3 Three-View** -- `test_image`, `gt_parts` (JSON), `query_description`, `scenario_name`, `error_case`, `ref_image_0..4`, `icl_image_0..2`, `icl_gt_letters` (JSON), `n_normal_refs`, `n_icl_examples` **Task 2 Three-View** -- `test_image`, `defect_type`, `is_normal`, `component_type`, `component_description`, `ref_image_0..4`, `icl_image_0..2`, `icl_metadata` (JSON), `n_normal_refs`, `n_icl_examples` **Missing Part** -- `test_image`, `assembly_name`, `assembly_description`, `choices_text`, `gt_letter`, `gt_answer`, `mcq_mapping` (JSON), `ref_image_0..4`, `icl_image_0..2`, `icl_gt_letters` (JSON), `n_normal_refs`, `n_icl_examples` **Grounding (single)** -- `test_image`, `target_coord` (JSON), `target_letter`, `choices` (JSON), `gt_choice_letter`, `icl_image_0..2`, `icl_metadata` (JSON), `n_icl_examples` **Grounding (cross)** -- `ref_image`, `test_image`, `ref_hint`, `ref_hint_coord` (JSON), `test_choices` (JSON), `test_mcq_options` (JSON), `gt_answer` ## Evaluation Code See the [FORGE GitHub repo](https://github.com/AI4Manufacturing/FORGE) for the full evaluation toolkit supporting OpenRouter, OpenAI, Anthropic, Google, and vLLM backends. ## Citation ```bibtex @misc{jian2026forge, title={FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios}, author={Xiangru Jian and Hao Xu and Wei Pang and Xinjian Zhao and Chengyu Tao and Qixin Zhang and Xikun Zhang and Chao Zhang and Guanzhi Deng and Alex Xue and Juan Du and Tianshu Yu and Garth Tarr and Linqi Song and Qiuzhuang Sun and Dacheng Tao}, year={2026}, eprint={2604.07413}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2604.07413}, } ```

<div align="center"> <h1> FORGE:面向制造场景的细粒度多模态评测基准 </h1> </div> <p align="center"> &nbsp;&nbsp;🌐&nbsp;<a href="https://ai4manufacturing.github.io/forge-web/">官方网站</a>&nbsp;&nbsp; | &nbsp;&nbsp;📑&nbsp;<a href="https://arxiv.org/abs/2604.07413">研究论文</a>&nbsp;&nbsp; | &nbsp;&nbsp;💻&nbsp;<a href="https://github.com/AI4Manufacturing/FORGE">代码仓库</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤗&nbsp;<a href="https://huggingface.co/datasets/AI4Manufacturing/forge">数据集主页</a>&nbsp;&nbsp; </p> <div align="center"> <img src="pipeline.png" width="100%" alt="FORGE 流水线概览"> </div> ## 快速上手 python from datasets import load_dataset # 加载AI4Manufacturing/forge数据集的task1_three_view配置的训练划分 ds = load_dataset("AI4Manufacturing/forge", "task1_three_view", split="train") # 打印第一条数据的所有键名 print(ds[0].keys()) # 查看第一条数据的test_image字段,为PIL图像对象 ds[0]["test_image"] ## 配置说明 ### 核心任务 | 配置名称 | 样本数 | 任务类型 | 模态 | |--------|------:|------|----------| | `task1_image` | 451 | 错误部件检测(多项选择题,MCQ) | 实拍照片 | | `task1_three_view` | 496 | 错误部件检测(字母匹配) | 三视图 | | `task2_three_view` | 830 | 异常分类(正常/缺陷类型) | 三视图 | | `task3_image` | 857 | 多余/错误部件检测(多项选择题,MCQ) | 实拍照片 | | `task3_three_view` | 309 | 多余/错误部件检测(字母匹配) | 三视图 | | `task3_missing_part_image` | 240 | 部件缺失识别(多项选择题,MCQ) | 实拍照片 | | `task3_missing_part_three_view` | 137 | 部件缺失识别(多项选择题,MCQ) | 三视图 | ### 定位消融实验(单图像) | 配置名称 | 样本数 | 任务描述 | |--------|------:|-------------| | `grounding_task_a_zero_shot` | 500 | 坐标→字母,零样本(Zero-shot)学习 | | `grounding_task_a_icl_within` | 500 | 坐标→字母,上下文学习(In-Context Learning,ICL,同图像内) | | `grounding_task_a_icl_outside` | 500 | 坐标→字母,上下文学习(In-Context Learning,ICL,跨图像) | | `grounding_task_b_zero_shot` | 500 | 字母→坐标,零样本(Zero-shot)学习 | | `grounding_task_b_icl_within` | 500 | 字母→坐标,上下文学习(In-Context Learning,ICL,同图像内) | | `grounding_task_b_icl_outside` | 500 | 字母→坐标,上下文学习(In-Context Learning,ICL,跨图像) | ### 定位消融实验(跨图像) | 配置名称 | 样本数 | 任务描述 | |--------|------:|-------------| | `grounding_cross_letter_to_letter` | 513 | 跨图像基于字母匹配部件 | | `grounding_cross_coord_to_coord` | 513 | 跨图像基于坐标匹配部件 | **总计:15个配置,共6846个样本** ## 数据字段说明 每条数据均内嵌所有所需图像,未使用的图像插槽将填充1×1占位图像。可通过`n_normal_refs`与`n_icl_examples`字段获知实际有效参考图像与上下文学习示例的数量。 1. **任务1/3(图像模态)**:包含字段`test_image`、`grounding_image`、`assembly_name`、`assembly_description`、`error_case`、`ref_image_0..4`、`icl_ori_image_0..2`、`icl_grounding_image_0..2`、`n_normal_refs`、`n_icl_examples` 2. **任务1/3(三视图模态)**:包含字段`test_image`、`gt_parts`(JSON格式)、`query_description`、`scenario_name`、`error_case`、`ref_image_0..4`、`icl_image_0..2`、`icl_gt_letters`(JSON格式)、`n_normal_refs`、`n_icl_examples` 3. **任务2(三视图模态)**:包含字段`test_image`、`defect_type`、`is_normal`、`component_type`、`component_description`、`ref_image_0..4`、`icl_image_0..2`、`icl_metadata`(JSON格式)、`n_normal_refs`、`n_icl_examples` 4. **部件缺失任务**:包含字段`test_image`、`assembly_name`、`assembly_description`、`choices_text`、`gt_letter`、`gt_answer`、`mcq_mapping`(JSON格式)、`ref_image_0..4`、`icl_image_0..2`、`icl_gt_letters`(JSON格式)、`n_normal_refs`、`n_icl_examples` 5. **单图像定位任务**:包含字段`test_image`、`target_coord`(JSON格式)、`target_letter`、`choices`(JSON格式)、`gt_choice_letter`、`icl_image_0..2`、`icl_metadata`(JSON格式)、`n_icl_examples` 6. **跨图像定位任务**:包含字段`ref_image`、`test_image`、`ref_hint`、`ref_hint_coord`(JSON格式)、`test_choices`(JSON格式)、`test_mcq_options`(JSON格式)、`gt_answer` ## 评测代码 完整的评测工具包可参见[FORGE官方代码仓库](https://github.com/AI4Manufacturing/FORGE),支持OpenRouter、OpenAI、Anthropic、Google及vLLM等多种推理后端。 ## 引用 bibtex @misc{jian2026forge, title={FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios}, author={Xiangru Jian and Hao Xu and Wei Pang and Xinjian Zhao and Chengyu Tao and Qixin Zhang and Xikun Zhang and Chao Zhang and Guanzhi Deng and Alex Xue and Juan Du and Tianshu Yu and Garth Tarr and Linqi Song and Qiuzhuang Sun and Dacheng Tao}, year={2026}, eprint={2604.07413}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2604.07413}, } ## 数据集元信息 ### 许可证:MIT许可证 ### 任务类别:问答、图像-文本到文本、视觉问答 ### 语言:英语 ### 标签:制造、3D、工业、工程 ### 样本规模:1000~10000
提供机构:
AI4Manufacturing
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作