allenai/MolmoWeb-SyntheticQA
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/allenai/MolmoWeb-SyntheticQA
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: messages
list:
- name: question
dtype: string
- name: answer
dtype: string
- name: question_type
dtype: string
- name: question_form
dtype: string
- name: metadata
struct:
- name: website
dtype: string
- name: url
dtype: string
splits:
- name: train
configs:
- config_name: default
data_files:
- split: train
path: train/data-*
license: odc-by
---
# MolmoWeb-SyntheticQA
A dataset of webpage screenshots paired with synthetic question-answer pairs, designed for visual question answering on web content.
## Dataset Usage
```python
from datasets import load_dataset
ds = load_dataset("allenai/MolmoWeb-SyntheticQA")
```
## Dataset Structure
### Splits
| Split | Description |
|-------|-------------|
| `train` | All QA examples, partitioned by website via `site_splits_2025-10-06.json` |
### Features
| Field | Type | Description |
|-------|------|-------------|
| `image` | `Image` | Screenshot of the webpage |
| `messages` | `list` | QA pairs associated with this screenshot (see below) |
| `metadata.website` | `string` | Website name (domain) |
| `metadata.url` | `string` | Full URL of the page |
Each entry in `messages` contains:
| Field | Type | Description |
|-------|------|-------------|
| `question` | `string` | The question about the screenshot |
| `answer` | `string` | The answer to the question |
| `question_type` | `string` | Category/type of the question (e.g. OCR, affordance, reasoning, etc.) |
| `question_form` | `string` | Form of the question (i.e. first_person or third_person) |
Multiple QA pairs sharing the same screenshot are grouped into a single row.
## License
This dataset is licensed under ODC-BY 1.0. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use). Synthetic QA data was generated using GPT-4o and GPT-5, which are subject to [OpenAI's Terms of Use](https://openai.com/policies/row-terms-of-use/).
提供机构:
allenai



