firecrawl/scrape-content-dataset-v1
收藏Hugging Face2025-10-23 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/firecrawl/scrape-content-dataset-v1
下载链接
链接失效反馈官方服务:
资源简介:
Scrape Content Dataset v1是一个人工审核的基准数据集,用于评估网页抓取引擎在捕获核心内容的同时避免噪声(导航、广告、页脚等)的能力。该数据集包含1000个带有人工标注真实情况的网页,创建于2025年10月21日,可能会随着时间的推移而变得过时。
Scrape Content Dataset v1 is a human-curated benchmark dataset for evaluating web scraping engines on their ability to capture core content while avoiding noise such as navigation, ads, footers, etc. The dataset includes 1,000 web pages with human-annotated ground truth, created on 2025-10-21 and may become outdated over time.
提供机构:
firecrawl



