five

allenai/MolmoWeb-SyntheticQA

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/allenai/MolmoWeb-SyntheticQA
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: image dtype: image - name: messages list: - name: question dtype: string - name: answer dtype: string - name: question_type dtype: string - name: question_form dtype: string - name: metadata struct: - name: website dtype: string - name: url dtype: string splits: - name: train configs: - config_name: default data_files: - split: train path: train/data-* license: odc-by --- # MolmoWeb-SyntheticQA A dataset of webpage screenshots paired with synthetic question-answer pairs, designed for visual question answering on web content. ## Dataset Usage ```python from datasets import load_dataset ds = load_dataset("allenai/MolmoWeb-SyntheticQA") ``` ## Dataset Structure ### Splits | Split | Description | |-------|-------------| | `train` | All QA examples, partitioned by website via `site_splits_2025-10-06.json` | ### Features | Field | Type | Description | |-------|------|-------------| | `image` | `Image` | Screenshot of the webpage | | `messages` | `list` | QA pairs associated with this screenshot (see below) | | `metadata.website` | `string` | Website name (domain) | | `metadata.url` | `string` | Full URL of the page | Each entry in `messages` contains: | Field | Type | Description | |-------|------|-------------| | `question` | `string` | The question about the screenshot | | `answer` | `string` | The answer to the question | | `question_type` | `string` | Category/type of the question (e.g. OCR, affordance, reasoning, etc.) | | `question_form` | `string` | Form of the question (i.e. first_person or third_person) | Multiple QA pairs sharing the same screenshot are grouped into a single row. ## License This dataset is licensed under ODC-BY 1.0. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use). Synthetic QA data was generated using GPT-4o and GPT-5, which are subject to [OpenAI's Terms of Use](https://openai.com/policies/row-terms-of-use/).
提供机构:
allenai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作