adameubanks/filtered_articles_by_year
收藏Hugging Face2025-08-28 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/adameubanks/filtered_articles_by_year
下载链接
链接失效反馈官方服务:
资源简介:
Filtered Articles by Year数据集是一个按年份分割的网页文章数据集,来源于FineWeb数据集。该数据集包含了从2005年到2025年的英文网页文章,旨在为时间语言分析、Word2Vec模型训练、语义变化研究等任务提供基础数据。每个文章条目包括URL、文本内容和内容哈希值。每年作为一个独立配置,所有文章都放在训练分割中。
The Filtered Articles by Year dataset is a yearly-segmented collection of web articles sourced from the FineWeb dataset. It includes English web articles from 2005 to 2025, designed to provide a foundation for temporal language analysis, Word2Vec model training, and semantic change research. Each article entry consists of a URL, text content, and a content hash. Each year is available as a separate configuration with all articles in the training split.
提供机构:
adameubanks



