WebGuard
收藏魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/osunlp/WebGuard
下载链接
链接失效反馈官方服务:
资源简介:
# WebGuard Annotation Dataset
WebGuard Dataset
This dataset contains web safety annotations for browser interactions. Each entry represents
an annotated action on a website with a risk level.
Fields:
- url: The URL where the action was performed
- description: Description of the action (may be null)
- tagHead: HTML tag type of the target element
- Screenshot: Google Drive link to screenshot view
- Annotation: Review classification (SAFE/UNSAFE/LOW/HIGH)
- website: Website name/category
## Dataset Summary
This dataset contains 5,999 web safety annotations for browser interactions.
## Data Fields
- `url`: The URL where the action was performed
- `description`: Description of the action (may be null)
- `tagHead`: HTML tag type of the target element
- `Screenshot`: Google Drive link to screenshot view
- `Annotation`: Review classification (SAFE/UNSAFE/LOW/HIGH)
- `website`: Website name/category
## Usage
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("osunlp/WebGuard")
# Access the data
for example in dataset["train"]:
print(f"URL: {example['url']}")
print(f"Description: {example['description']}")
print(f"Tag: {example['tagHead']}")
print(f"Screenshot: {example['Screenshot']}")
print(f"Annotation: {example['Annotation']}")
print(f"Website: {example['website']}")
print("---")
```
## Citation
```bibtex
@article{zheng2025webguard,
title={WebGuard: Building a Generalizable Guardrail for Web Agents},
author={Zheng, Boyuan and Liao, Zeyi and Salisbury, Scott and Liu, Zeyuan and Lin, Michael and Zheng, Qinyuan and Wang, Zifan and Deng, Xiang and Song, Dawn and Sun, Huan and others},
journal={arXiv preprint arXiv:2507.14293},
year={2025}
}
@inproceedings{zheng-etal-2024-webolympus,
title = "{W}eb{O}lympus: An Open Platform for Web Agents on Live Websites",
author = "Zheng, Boyuan and Gou, Boyu and Salisbury, Scott and Du, Zheng and Sun, Huan and Su, Yu",
editor = "Hernandez Farias, Delia Irazu and Hope, Tom and Li, Manling",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-demo.20",
pages = "187--197",
}
```
## License
Creative Commons Attribution-NonCommercial 4.0 International
# WebGuard标注数据集
WebGuard数据集
本数据集涵盖针对浏览器交互的网页安全标注内容,每条数据对应网站上一项附带风险等级的已标注操作。
### 字段说明
- url:执行该操作的网页URL
- description:操作描述(可为空值)
- tagHead:目标元素的HTML标签类型
- Screenshot:指向截图的谷歌云端硬盘(Google Drive)链接
- Annotation:审核分类(SAFE/UNSAFE/LOW/HIGH)
- website:网站名称/类别
## 数据集概览
本数据集共包含5999条浏览器交互的网页安全标注数据。
## 数据字段
- `url`:执行该操作的网页URL
- `description`:操作描述(可为空值)
- `tagHead`:目标元素的HTML标签类型
- `Screenshot`:指向截图的谷歌云端硬盘(Google Drive)链接
- `Annotation`:审核分类(SAFE/UNSAFE/LOW/HIGH)
- `website`:网站名称/类别
## 使用方法
python
from datasets import load_dataset
# 加载数据集
dataset = load_dataset("osunlp/WebGuard")
# 访问数据
for example in dataset["train"]:
print(f"URL: {example['url']}")
print(f"Description: {example['description']}")
print(f"Tag: {example['tagHead']}")
print(f"Screenshot: {example['Screenshot']}")
print(f"Annotation: {example['Annotation']}")
print(f"Website: {example['website']}")
print("---")
## 引用
bibtex
@article{zheng2025webguard,
title={WebGuard: Building a Generalizable Guardrail for Web Agents},
author={Zheng, Boyuan and Liao, Zeyi and Salisbury, Scott and Liu, Zeyuan and Lin, Michael and Zheng, Qinyuan and Wang, Zifan and Deng, Xiang and Song, Dawn and Sun, Huan and others},
journal={arXiv preprint arXiv:2507.14293},
year={2025}
}
@inproceedings{zheng-etal-2024-webolympus,
title = "{W}eb{O}lympus: An Open Platform for Web Agents on Live Websites",
author = "Zheng, Boyuan and Gou, Boyu and Salisbury, Scott and Du, Zheng and Sun, Huan and Su, Yu",
editor = "Hernandez Farias, Delia Irazu and Hope, Tom and Li, Manling",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-demo.20",
pages = "187--197",
}
## 许可协议
知识共享署名-非商业性使用4.0国际许可协议(Creative Commons Attribution-NonCommercial 4.0 International)
提供机构:
maas
创建时间:
2025-07-25



